How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

Is robots.txt enough to block AI crawlers?

robots.txt is the standard mechanism and is respected by all major AI companies. For extra protection, you can also use meta robots tags (noai, noimageai) and X-Robots-Tag HTTP headers. These approaches complement each other — robots.txt prevents the crawl, while meta tags give per-page control.

Guides/robots.txt for AI Bots

Updated March 2026· 8 min read

robots.txt for AI Bots: The Complete 2026 Guide

How to control GPTBot, ClaudeBot, PerplexityBot, Bytespider, and 46+ other AI crawlers using your robots.txt file. Includes ready-to-use configurations, per-bot examples, and the most common mistakes to avoid.

Prefer a visual tool?

Build your robots.txt interactively with per-bot toggles and one-click presets.

Open Generator →Check Existing

What is robots.txt?

robots.txt is a plain text file placed at the root of your website (e.g. https://yoursite.com/robots.txt) that tells web crawlers which pages they can and cannot access. It follows the Robots Exclusion Protocol (REP), first established in 1994.

In 2024–2026, robots.txt became the primary mechanism for controlling AI crawler access. Every major AI company — OpenAI, Anthropic, Google, Perplexity, Meta — has published official documentation on how their bots respect robots.txt rules.

Important: robots.txt is a protocol, not a technical barrier. Reputable AI companies honour it. Some less-scrupulous crawlers may not. See bots that ignore robots.txt below.

Two Types of AI Bots — and Why It Matters

Before writing a single robots.txt rule, understand the difference. Blocking the wrong bots has real consequences.

AI Training Bots

Collect your content to train large language models. Your text, code, and writing may appear in future AI model outputs.

GPTBot (OpenAI)

ClaudeBot (Anthropic)

CCBot (Common Crawl)

Bytespider (ByteDance)

cohere-ai (Cohere)

HuggingFaceBot

✓ Blocking these has no SEO impact

AI Search Bots

Index your content so users can find it through AI-powered search engines. Blocking removes you from those results.

PerplexityBot

OAI-SearchBot (SearchGPT)

Google-Extended (Gemini)

DuckAssistBot

YouBot

BraveBot

⚠️ Blocking these removes you from AI search

The common mistake: Blocking all bots with User-agent: * / Disallow: / kills your Google ranking along with the AI crawlers. Always specify individual bot names or use the template below.

Ready-to-Use Configurations

Copy the configuration that matches your needs. Place the file at https://yoursite.com/robots.txt.

1. Block all AI training bots (recommended starting point)

Blocks GPTBot, ClaudeBot, CCBot, Bytespider, Cohere, and other training crawlers while keeping AI search bots (PerplexityBot, Bingbot) and Googlebot active.

Customize →

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: MistralBot
Disallow: /

User-agent: HuggingFaceBot
Disallow: /

# Allow search engines (including AI-enhanced)
User-agent: *
Allow: /

2. Block ALL AI bots (maximum protection)

Blocks every known AI crawler — training and search. Your site will not appear in AI-powered search results (Perplexity, SearchGPT, etc.) or be used for training.

Customize →

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Gemini
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: MistralBot
Disallow: /

User-agent: HuggingFaceBot
Disallow: /

User-agent: Ai2Bot
Disallow: /

User-agent: Kangaroo Bot
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: DuckAssistBot
Disallow: /

User-agent: *
Allow: /

3. Allow AI search, block AI training

The most common balanced approach. AI search engines (Perplexity, SearchGPT, Bingbot) can index your content, but training crawlers cannot use it to train new models.

Customize →

# Allow AI search engines
User-agent: PerplexityBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: YouBot
Allow: /

User-agent: DuckAssistBot
Allow: /

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: HuggingFaceBot
Disallow: /

User-agent: MistralBot
Disallow: /

User-agent: *
Allow: /

4. Block AI bots from specific directories only

Protect private content (member areas, drafts, API docs) while keeping your public pages indexable by AI crawlers.

Customize →

# Block AI crawlers from private areas
User-agent: GPTBot
Disallow: /private/
Disallow: /members/
Disallow: /drafts/
Disallow: /api/

User-agent: ClaudeBot
Disallow: /private/
Disallow: /members/
Disallow: /drafts/
Disallow: /api/

User-agent: PerplexityBot
Disallow: /private/
Disallow: /members/

# All other bots: standard rules
User-agent: *
Disallow: /private/
Disallow: /drafts/

Per-Bot Quick Reference

The exact User-agent string to use for each major AI bot. User-agent matching in robots.txt is case-insensitive but must match the bot's declared name exactly (no wildcards within the name).

Bot	Operator	User-agent string	Type	Respects
GPTBot	OpenAI	GPTBot	Training	✓ Yes
ChatGPT-User	OpenAI	ChatGPT-User	Assistant	✓ Yes
OAI-SearchBot	OpenAI	OAI-SearchBot	AI Search	✓ Yes
ClaudeBot	Anthropic	ClaudeBot	Training	✓ Yes
PerplexityBot	Perplexity	PerplexityBot	AI Search	✓ Yes
Google-Extended	Google	Google-Extended	Training/Search	✓ Yes
Gemini	Google	Gemini	AI Search	✓ Yes
Bingbot	Microsoft	bingbot	Search	✓ Yes
CCBot	Common Crawl	CCBot	Training	✓ Yes
Bytespider	ByteDance	Bytespider	Training	✗ No
cohere-ai	Cohere	cohere-ai	Training	✓ Yes
xAI-Bot	xAI	xAI-Bot	Training	✓ Yes
MistralBot	Mistral AI	MistralBot	Training	✓ Yes
HuggingFaceBot	Hugging Face	HuggingFaceBot	Training	✓ Yes
YouBot	You.com	YouBot	AI Search	✓ Yes
DuckAssistBot	DuckDuckGo	DuckAssistBot	AI Search	✓ Yes

See the full AI Bot Directory for all 60+ bots.

Bots That Ignore robots.txt

⚠️

Bytespider (ByteDance / TikTok)

Bytespider is operated by ByteDance, the parent company of TikTok. Multiple independent researchers have documented it ignoring Disallow rules. It has also been observed using disguised user-agent strings to bypass detection. robots.txt alone may not be sufficient — consider IP-level blocking via your server firewall or Cloudflare WAF rules.

All other major AI companies (OpenAI, Anthropic, Google, Perplexity, Cohere, Mistral, xAI, Hugging Face) have published official compliance statements. Their bots check robots.txt before crawling and honour Disallow rules.

robots.txt vs. Meta Tags vs. HTTP Headers

You have three complementary tools. robots.txt operates at the crawl level. Meta tags and HTTP headers give per-page control even if the crawler has already retrieved the page.

robots.txtSite-wide or per-directory

Checked before the bot fetches any page

User-agent: GPTBot
Disallow: /

Best for: Blanket rules for whole site or large sections

<meta name="robots">Per page (HTML only)

Found inside the <head> of an HTML page

<meta name="robots" content="noai, noimageai">

Best for: Page-level overrides, dynamic CMS pages

X-Robots-TagPer page or file (HTTP header)

Returned in the HTTP response header

X-Robots-Tag: noai, noimageai

Best for: PDFs, images, API responses — non-HTML resources

Check your current meta tags → AI Meta Tags Checker

5 Common robots.txt Mistakes

Blocking Googlebot

Never put Googlebot (or bingbot) in a Disallow rule when targeting AI bots. A wildcard User-agent: * Disallow: / will kill your entire search presence.

❌ Avoid:

User-agent: *
Disallow: /

Misspelling user-agent names

User-agent matching is case-insensitive but the name must match exactly. "GPT-Bot" and "gptbot" both work; "GPT Bot" (with a space) does not.

❌ Avoid:

User-agent: GPT-Bot  # wrong — should be GPTBot
Disallow: /

Only blocking one OpenAI bot

OpenAI has three crawlers: GPTBot (training), ChatGPT-User (browsing), and OAI-SearchBot (search). Block all three if that is your intent.

❌ Avoid:

User-agent: GPTBot  # ChatGPT-User and OAI-SearchBot still active
Disallow: /

Assuming robots.txt stops all scrapers

robots.txt only controls crawlers that choose to honour it. Malicious scrapers and some commercial crawlers ignore it entirely. For sensitive content, use server-level authentication.

Not testing after changes

Always validate your updated robots.txt with the Analyzer tool before deploying. A syntax error can accidentally block all crawlers.

Frequently Asked Questions

Do AI bots respect robots.txt?

Yes — all major AI companies (OpenAI, Anthropic, Google, Perplexity, Cohere, Mistral) officially honour robots.txt. The notable exception is Bytespider (ByteDance), which has been documented bypassing disallow rules. robots.txt is a protocol, not a technical barrier.

Will blocking AI bots hurt my SEO?

Blocking AI training bots (GPTBot, ClaudeBot, CCBot) has zero impact on traditional SEO rankings. Blocking AI search bots (PerplexityBot, OAI-SearchBot, Google-Extended) will remove your site from those AI search results — similar to blocking Googlebot from traditional search.

How quickly do AI bots pick up robots.txt changes?

Most AI crawlers recache robots.txt every 24 hours. Some may take up to a week to fully stop crawling newly-disallowed content. If you've already been indexed for training, robots.txt prevents future access — it doesn't retroactively remove previously collected content.

Is there a retroactive opt-out from AI training?

OpenAI and Google offer forms to request removal of content already used for training, but results vary. robots.txt prevents future collection; retroactive removal requires contacting each company individually.

Should I block all AI bots or just some?

The most balanced approach: block AI training bots (GPTBot, ClaudeBot, CCBot, Bytespider) to protect your content from training datasets, while allowing AI search bots (PerplexityBot, OAI-SearchBot, Google-Extended) so you stay visible in AI-powered search results.

Ready to configure your robots.txt?

Use the free generator to build a configuration with per-bot toggles, one-click presets, and instant copy-paste output.

Open Generator →Check Existing robots.txt Full Site Scan

Related Guides

noai & noimageai Meta Tags

Page-level AI training opt-out

How to Block GPTBot

OpenAI's training crawler deep-dive

How to Block ClaudeBot

Anthropic's training crawler deep-dive

AI Content Protection Tools

Beyond robots.txt — full toolkit

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.