How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

What is the difference between webzio and webzio-extended?

Webz.io runs two active crawlers as a pair they call the 'Webzio Duo'. The 'webzio' bot is their general-purpose crawler — it indexes web content and powers search integrations used by third-party software. Blocking it may disrupt traffic from search engines and discovery tools that license Webz.io's index. 'webzio-extended' is their dedicated AI training data crawler — this is the one you want to block if you only care about preventing AI training use. Webz.io explicitly designed webzio-extended so content owners can opt out of AI usage while allowing the standard webzio search index crawler to continue.

Which AI companies does Webz.io sell data to?

Webz.io does not publicly disclose its customer list. However, it markets itself as a 'content data platform for AI training' and its customers include AI model developers, enterprise LLM builders, and AI research organizations. Blocking Webz.io's crawlers stops a single data pipeline that may supply multiple downstream AI models.

Does blocking Webz.io affect Google or other search engines?

No. Webz.io operates entirely separately from Google, Bing, and other search engines. Blocking omgili, omgilibot, and webzio-extended in robots.txt has zero effect on Googlebot, Bingbot, or your search rankings. Each User-agent block in robots.txt is independent — you can block Webz.io's crawlers while leaving all legitimate search engine crawlers completely unrestricted.

Data Broker3 User AgentsAI Training

How to Block Webz.io & Omgili: The AI Data Broker Behind Three Crawlers

Webz.io operates under three different identities — omgili, omgilibot, and webzio-extended — selling web content to AI companies. One Disallow rule isn't enough. Here's how to stop all three.

Quick Block (60 seconds)

Add all three user agents to your robots.txt. Blocking only one leaves two active crawlers still harvesting your site.

# Block Webz.io AI training crawler (current)

User-agent: webzio-extended

Disallow: /

# Block Webzio general crawler (optional — see note below)

User-agent: webzio

Disallow: /

# Block legacy Omgili user agents

User-agent: omgili

Disallow: /

User-agent: omgilibot

Disallow: /

Note on webzio vs webzio-extended: If you only want to block AI training, add webzio-extended only. The standard webzio bot feeds search indexes used by third-party tools — blocking it may reduce referral traffic from platforms that license Webz.io's search data. When in doubt, block both.

Who Is Webz.io (and What Is Omgili)?

Webz.io is an Israeli content intelligence company that crawls the open web and sells structured data to businesses — including AI companies building training datasets. Omgili was an earlier brand and crawler they operated; Webz.io acquired and absorbed it, and the legacy omgili and omgilibot user agents remain active in server logs across the web.

In 2024, Webz.io introduced the “Webzio Duo” — a two-crawler system designed to give content owners more granular control:

Crawler	Purpose	Block to stop AI training?
webzio	General web index; powers search tools	Optional
webzio-extended	AI training data collection	Yes — block this one
omgilibot	Legacy crawler (pre-2024)	Yes — still appears in logs
omgili	Older legacy variant	Yes — belt-and-suspenders

Unlike first-party AI crawlers like GPTBot (OpenAI) or ClaudeBot (Anthropic) — which crawl for their own models — Webz.io is a third-party data broker. It sells structured web content to any paying customer, which may include multiple AI companies simultaneously. One block stops a pipeline that could feed many models at once.

What Does Webz.io Collect?

Webz.io markets itself as a “web content intelligence” platform. Its products include:

▸
News and article text
Full text, titles, publish dates, author metadata from news sites and blogs.
▸
Forum and review content
Discussion threads, product reviews, social commentary — structured for sentiment analysis.
▸
Dark web and open web intelligence
Webz.io sells a 'dark web monitor' product alongside its open web index.
▸
AI training datasets
Structured text data labeled and sold specifically for LLM and ML training pipelines.

Verify the Block

After updating your robots.txt, verify:

1. Google Search Console Robots.txt Tester

Enter omgilibot as the user agent and test your key URLs — it confirms your rules parse correctly.

2. Check server logs

Grep your access logs for existing Webz.io traffic:

grep -iE 'omgili|omgilibot|webzio' /var/log/nginx/access.log | tail -20

3. Fetch your robots.txt

curl https://yourdomain.com/robots.txt | grep -A2 -i webzio

Confirm Disallow: / appears under each user agent block.

Server-Level Blocking

For stricter enforcement — especially on high-value or paywalled content — add server-level rules that return 403 before the request is processed.

nginx

# In your server {} block

if ($http_user_agent ~* "(omgili|omgilibot|webzio)") {

return 403;

}

Apache (.htaccess)

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} (omgili|omgilibot|webzio) [NC]

RewriteRule .* - [F,L]

Cloudflare WAF Rule

Dashboard → Security → WAF → Custom Rules → Create rule:

Field: User Agent

Operator: contains (apply for each)

Values: omgili · omgilibot · webzio-extended

Action: Block

Or use a single regex rule: (omgili|omgilibot|webzio-extended)

Next.js Middleware

// middleware.ts

import { NextRequest, NextResponse } from 'next/server';

const BLOCKED = /omgili|omgilibot|webzio/i;

export function middleware(req: NextRequest) {

const ua = req.headers.get('user-agent') ?? '';

if (BLOCKED.test(ua)) return new NextResponse(null, { status: 403 });

return NextResponse.next();

}

Frequently Asked Questions

Does Webz.io respect robots.txt?+

Yes — Webz.io officially states that both Webzio and Webzio-Extended respect robots.txt Disallow directives. The legacy Omgilibot also documented robots.txt compliance. That said, Webz.io is a commercial data broker with paying clients, so for high-value or paywalled content, combining robots.txt with server-level blocking is advisable.

What's the difference between webzio and webzio-extended?+

Webz.io explicitly designed 'webzio-extended' as its AI training data crawler, separate from its general search index crawler ('webzio'). If you only want to prevent AI training use, block webzio-extended. Blocking 'webzio' may additionally reduce referral traffic from search tools that license Webz.io's data. For full protection, block both.

Should I block omgili, omgilibot, and webzio-extended — or just one?+

Block all three. Omgilibot is the legacy user agent (pre-2024) still appearing in crawl logs. Omgili is an even older variant. Webzio-extended is the current AI-specific crawler. Because Webz.io rotated user agent identities over time, you need all three Disallow directives for comprehensive coverage.

Is Webz.io the same as Omgili?+

Yes. Webz.io is the company behind Omgili. Omgili was an independent web content intelligence service that Webz.io acquired. The Omgilibot user agent is now legacy — replaced by the 'Webzio Duo' in 2024 — but legacy infrastructure still uses the old user agents.

Does blocking Webz.io affect Google or my SEO?+

No. Webz.io operates entirely separately from Google, Bing, and other search engines. Blocking omgili, omgilibot, and webzio-extended in robots.txt has zero effect on Googlebot or your search rankings.

Related Guides

How to Block Diffbot

Another AI data broker feeding Llama & Mistral

How to Block CCBot

One rule that stops 50+ AI models

How to Block GPTBot

Stop OpenAI's training crawler

robots.txt for AI Bots

The complete 2026 guide

See Which AI Bots Are Hitting Your Site

Open Shadow scans your site and shows you exactly which AI crawlers are active — so you can block the right ones, not just guess.

Run Free AI Bot Scan

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.