How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

ComparisonTools8 min read

AI Content Protection Tools
Compared for 2026

There are dozens of ways to protect your content from AI training — from a one-line robots.txt edit to enterprise bot management platforms. Here's an honest breakdown of what works, what doesn't, and what you actually need based on your site.

The AI content protection stack

Think of AI content protection as layers. Each layer catches threats the layer below misses. Most sites only need the first 2–3 layers.

Layer 1robots.txt• Free

Coverage: ~70% of AI crawlers

Blocks all major training crawlers (GPTBot, ClaudeBot, Google-Extended, CCBot, etc.) that respect the robots exclusion protocol.

Limitation: Advisory only — some bots ignore it. No effect on AI agents using headless browsers.

Layer 2Meta tags (noai, noimageai)• Free

Coverage: Per-page granularity

HTML meta tags and X-Robots-Tag headers that signal "do not use for AI training" on specific pages.

Limitation: Only as strong as the crawler's willingness to respect them. No legal enforcement mechanism (unlike TDMRep).

Layer 3CDN bot management• Free–$$$

Coverage: AI agents + non-compliant crawlers

Cloudflare AI Labyrinth (free), Bot Management (paid), or AWS WAF Bot Control. Catches headless browsers and AI agents that bypass robots.txt.

Limitation: Free tiers have limited detection. Full bot management requires Business/Enterprise plans.

Layer 4TDMRep (W3C standard)• Free

Coverage: Legal enforcement (EU)

Formally reserves your Text and Data Mining rights under the EU AI Act and CDSM Directive. Machine-readable rights declaration.

Limitation: Legal force limited to EU jurisdiction. Technical compliance from AI companies is still emerging.

Layer 5Server-level blocks• Free (DIY)

Coverage: IP-level, UA-level blocking

nginx/Apache rules that block known AI crawler IPs and user agents at the server level — before your application even sees the request.

Limitation: Requires server access. IP lists change frequently. Overkill for most sites.

Layer 6Application-level detection• Free (DIY)

Coverage: AI agents, headless browsers

Honeypot links, headless browser fingerprinting, behavioural analysis. Catches sophisticated agents that spoof browser identities.

Limitation: Arms race with agent frameworks. Requires ongoing maintenance. Risk of false positives.

Free tools

You can get surprisingly far with free tools. Here's what's available without spending anything.

👁️

Open Shadow ScannerFree

Scans your site's robots.txt, meta tags, HTTP headers, and llms.txt to show exactly which AI bots can access your content and which are blocked.

✓ Instant results — no signup required

✓ Checks 50+ AI bot tokens

✓ AI Readiness Score with actionable recommendations

✓ Detects llms.txt, TDMRep, and meta tag configuration

△ Point-in-time check — not continuous monitoring (Pro plan adds that)

Try it free →

🤖

robots.txt (manual or generator)Free

The foundation of AI content protection. A properly configured robots.txt blocks the majority of training crawlers.

✓ Blocks GPTBot, ClaudeBot, Google-Extended, CCBot, and 20+ AI crawlers

✓ Universally supported — every web server serves robots.txt

✓ 5-minute setup

△ Advisory — not all bots respect it (Bytespider documented ignoring it)

△ All-or-nothing per bot — can't allow a bot on some pages and block on others (use meta tags for that)

Generator tool →Full guide →

🏷️

noai / noimageai meta tagsFree

HTML meta tags that declare content as off-limits for AI training. Works at the page level, giving you granular control.

✓ Per-page granularity (protect premium content, allow blog posts)

✓ One HTML tag — works with any CMS or framework

✓ X-Robots-Tag header option for server-level deployment

△ Respect varies by AI company — not universally honoured

Implementation guide →

☁️

Cloudflare Free Tier + AI LabyrinthFree

Cloudflare's free plan includes AI Labyrinth — a feature that feeds AI agents fake content instead of your real pages. The only free tool that catches headless browser agents.

✓ Catches AI agents that bypass robots.txt (Firecrawl, browser-use, etc.)

✓ Wastes agent compute tokens with realistic fake content

✓ Basic bot analytics (automated vs human traffic split)

△ Limited bot classification on free tier (no per-bot breakdown)

△ Requires proxying your DNS through Cloudflare

⚖️

TDMRep (W3C standard)Free

Machine-readable rights reservation backed by EU law. Declares that you reserve text and data mining rights — giving you legal standing to challenge unauthorized AI training.

✓ Legal enforcement under EU AI Act and CDSM Directive

✓ Three implementation methods: JSON file, HTTP headers, or HTML meta

✓ 10-minute implementation

△ Legal force limited to EU jurisdiction

△ Technical adoption by AI companies is still early

TDMRep guide →

Paid tools & services

When free tools aren't enough — high-value content, paywalled publishers, or enterprise compliance requirements.

☁️

Cloudflare Bot ManagementBusiness+ ($200+/mo)

ML-based bot detection with JA3/JA4 TLS fingerprinting, per-bot analytics, and granular blocking rules.

✓ Best-in-class bot detection (ML scoring + fingerprinting)

✓ Catches headless browsers, residential proxies, and spoofed UAs

✓ Detailed per-bot analytics and traffic breakdown

✓ Challenge/block/allow rules per bot category

✗ Expensive — Business plan starts at ~$200/mo

✗ Enterprise features (like full bot score access) require Enterprise plan

Best for: Publishers, SaaS companies, e-commerce sites with high-value content

🛡️

AWS WAF Bot ControlPay-per-use (~$10/mo+)

AWS-native bot management with managed rule groups for common and targeted bot detection.

✓ Integrates with CloudFront, ALB, and API Gateway

✓ Pay-per-use pricing (cheaper for low-traffic sites)

✓ Targeted bot control for headless browser detection

△ Requires AWS infrastructure

△ Less AI-specific than Cloudflare (no AI Labyrinth equivalent)

Best for: Sites already on AWS infrastructure

🔒

Akamai Bot ManagerEnterprise

Enterprise-grade bot detection with device fingerprinting, behavioural analysis, and dedicated AI/ML models.

✓ Industry-leading detection rates

✓ Handles the most sophisticated agents and proxy networks

✗ Enterprise-only pricing (no self-serve)

✗ Complex setup and ongoing management

Best for: Major publishers, financial services, large e-commerce

What you actually need

Most sites are over-thinking this. Match your protection to your actual risk.

Personal blog or portfolioLow risk

robots.txt blocking all training crawlers + Open Shadow scan to verify. Done in 10 minutes.

Tools: robots.txt + Open Shadow (free)

Business website or SaaS docsMedium risk

robots.txt + noai meta tags on key pages + Cloudflare free tier with AI Labyrinth. Monitor with server logs or Cloudflare analytics.

Tools: robots.txt + meta tags + Cloudflare free + Open Shadow (free)

Content site or blog with monetised contentMedium-High risk

Everything above + TDMRep for EU legal coverage + dedicated AI bot log monitoring + consider Cloudflare Pro for bot analytics.

Tools: All free tools + TDMRep + Cloudflare Pro ($20/mo)

Paywalled publisher or premium contentHigh risk

Full stack: robots.txt + meta tags + TDMRep + Cloudflare Business (bot management) + application-level honeypots + server-level IP blocking for known bad actors.

Tools: All free tools + Cloudflare Business ($200+/mo)

News organisation or research databaseCritical risk

Enterprise bot management (Cloudflare Enterprise or Akamai) + legal TDMRep + dedicated AI bot monitoring infrastructure + content licensing strategy.

Tools: Enterprise solution (custom pricing)

Common mistakes

❌ Blocking everything including AI search bots

OAI-SearchBot, PerplexityBot, and DuckAssistBot drive traffic back to your site. Blocking them means disappearing from AI-powered search results — which is an increasingly large traffic source.

❌ Thinking robots.txt blocks everything

robots.txt only blocks crawlers that identify themselves and choose to respect it. AI agents using headless browsers (Firecrawl, browser-use) never check robots.txt. You need additional layers for comprehensive protection.

❌ Not monitoring after blocking

You need to verify your blocks are working. Some bots ignore robots.txt. New bots appear regularly. Without monitoring, you have no feedback loop.

❌ Paying for enterprise solutions on a blog

A $200/mo Cloudflare Business plan is overkill for a personal blog. robots.txt + Cloudflare free tier + an Open Shadow scan covers 95% of the threat surface for free.

❌ Ignoring the problem entirely

AI bot traffic is growing 50%+ year-over-year. Content that's unprotected today will be in training datasets for models deployed over the next 2-5 years. The longer you wait, the more content is already extracted.

Frequently asked questions

What is the best free tool to block AI bots?▾

The most effective free tool is a properly configured robots.txt file — it blocks the majority of AI training crawlers (GPTBot, ClaudeBot, Google-Extended, etc.) with zero cost and no technical complexity. Combine it with noai/noimageai meta tags for per-page control. Cloudflare's free tier adds AI Labyrinth, which actively misdirects AI agents. Open Shadow's free scanner tells you which bots can currently access your content so you know what to block.

Does Cloudflare AI Labyrinth actually work?▾

Yes — Cloudflare AI Labyrinth is effective against AI agents that browse your site using headless browsers (Firecrawl, browser-use, etc.). It serves them realistic but fake AI-generated content, wasting their compute tokens while protecting your real pages. It does NOT block traditional crawlers like GPTBot or ClaudeBot — those are handled by robots.txt. Think of AI Labyrinth as a complement to robots.txt, not a replacement. It's available on all Cloudflare plans including free.

Is robots.txt enough to protect my content from AI?▾

robots.txt is the minimum baseline and handles the majority of threat surface. Most major AI companies (OpenAI, Anthropic, Google, Meta) respect robots.txt Disallow rules for their training crawlers. However, robots.txt has three gaps: (1) AI agents using headless browsers don't check it, (2) some crawlers like Bytespider have been documented ignoring it, and (3) content already crawled before you added the block may still be in training datasets. For comprehensive protection, layer robots.txt with meta tags, Cloudflare bot management, and server-level blocks.

How do I know if AI bots are scraping my site right now?▾

Most site owners don't know because AI bots don't show up in Google Analytics (they don't execute JavaScript). To check: (1) Run a free Open Shadow scan to see which AI bots your current config allows, (2) Check your server access logs and grep for AI bot user agents like GPTBot, ClaudeBot, Bytespider, etc., (3) If you're on Cloudflare, check Security → Bots in your dashboard for automated traffic stats. Our monitoring guide covers 5 detailed methods for tracking AI bot traffic.

What is TDMRep and do I need it?▾

TDMRep (Text and Data Mining Reservation Protocol) is a W3C standard that lets you formally declare rights reservations over your content. Unlike robots.txt (which is a gentleman's agreement), TDMRep has legal backing under the EU AI Act and the Copyright in the Digital Single Market (CDSM) Directive. You need it if: you publish content in or targeting the EU market, you want legal standing to challenge unauthorized AI training, or you want to formally reserve TDM rights alongside your robots.txt blocks. Implementation takes 10 minutes — add a tdmrep.json file or HTTP headers.

Should I block all AI bots or just training crawlers?▾

Block training crawlers, think carefully about search bots. Training crawlers (GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider) extract your content to train AI models — you get zero traffic back. AI search bots (OAI-SearchBot, PerplexityBot, DuckAssistBot) index your site for AI-powered search results — blocking them removes you from those results, which is lost traffic. The right strategy: block all training crawlers, allow search bots that attribute and link back, and monitor traffic to see which bots actually drive value.

Start with a free scan

See which AI bots can access your content right now. Open Shadow checks your robots.txt, meta tags, headers, and more — instantly, for free.

Scan My Site — Free →

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

AI Content Protection ToolsCompared for 2026