robots.txt for AI Bots: The Complete 2026 Guide
How to control GPTBot, ClaudeBot, PerplexityBot, Bytespider, and 46+ other AI crawlers using your robots.txt file. Includes ready-to-use configurations, per-bot examples, and the most common mistakes to avoid.
What is robots.txt?
robots.txt is a plain text file placed at the root of your website (e.g. https://yoursite.com/robots.txt) that tells web crawlers which pages they can and cannot access. It follows the Robots Exclusion Protocol (REP), first established in 1994.
In 2024–2026, robots.txt became the primary mechanism for controlling AI crawler access. Every major AI company — OpenAI, Anthropic, Google, Perplexity, Meta — has published official documentation on how their bots respect robots.txt rules.
Two Types of AI Bots — and Why It Matters
Before writing a single robots.txt rule, understand the difference. Blocking the wrong bots has real consequences.
Collect your content to train large language models. Your text, code, and writing may appear in future AI model outputs.
Index your content so users can find it through AI-powered search engines. Blocking removes you from those results.
User-agent: * / Disallow: / kills your Google ranking along with the AI crawlers. Always specify individual bot names or use the template below.Ready-to-Use Configurations
Copy the configuration that matches your needs. Place the file at https://yoursite.com/robots.txt.
# Block AI training crawlers User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: ClaudeBot Disallow: / User-agent: CCBot Disallow: / User-agent: Bytespider Disallow: / User-agent: cohere-ai Disallow: / User-agent: xAI-Bot Disallow: / User-agent: MistralBot Disallow: / User-agent: HuggingFaceBot Disallow: / # Allow search engines (including AI-enhanced) User-agent: * Allow: /
User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: Gemini Disallow: / User-agent: CCBot Disallow: / User-agent: Bytespider Disallow: / User-agent: cohere-ai Disallow: / User-agent: xAI-Bot Disallow: / User-agent: MistralBot Disallow: / User-agent: HuggingFaceBot Disallow: / User-agent: Ai2Bot Disallow: / User-agent: Kangaroo Bot Disallow: / User-agent: YouBot Disallow: / User-agent: DuckAssistBot Disallow: / User-agent: * Allow: /
# Allow AI search engines User-agent: PerplexityBot Allow: / User-agent: OAI-SearchBot Allow: / User-agent: Google-Extended Allow: / User-agent: YouBot Allow: / User-agent: DuckAssistBot Allow: / # Block AI training crawlers User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: CCBot Disallow: / User-agent: Bytespider Disallow: / User-agent: cohere-ai Disallow: / User-agent: xAI-Bot Disallow: / User-agent: HuggingFaceBot Disallow: / User-agent: MistralBot Disallow: / User-agent: * Allow: /
# Block AI crawlers from private areas User-agent: GPTBot Disallow: /private/ Disallow: /members/ Disallow: /drafts/ Disallow: /api/ User-agent: ClaudeBot Disallow: /private/ Disallow: /members/ Disallow: /drafts/ Disallow: /api/ User-agent: PerplexityBot Disallow: /private/ Disallow: /members/ # All other bots: standard rules User-agent: * Disallow: /private/ Disallow: /drafts/
Per-Bot Quick Reference
The exact User-agent string to use for each major AI bot. User-agent matching in robots.txt is case-insensitive but must match the bot's declared name exactly (no wildcards within the name).
| Bot | Operator | User-agent string | Type | Respects |
|---|---|---|---|---|
| GPTBot | OpenAI | GPTBot | Training | ✓ Yes |
| ChatGPT-User | OpenAI | ChatGPT-User | Assistant | ✓ Yes |
| OAI-SearchBot | OpenAI | OAI-SearchBot | AI Search | ✓ Yes |
| ClaudeBot | Anthropic | ClaudeBot | Training | ✓ Yes |
| PerplexityBot | Perplexity | PerplexityBot | AI Search | ✓ Yes |
| Google-Extended | Google-Extended | Training/Search | ✓ Yes | |
| Gemini | Gemini | AI Search | ✓ Yes | |
| Bingbot | Microsoft | bingbot | Search | ✓ Yes |
| CCBot | Common Crawl | CCBot | Training | ✓ Yes |
| Bytespider | ByteDance | Bytespider | Training | ✗ No |
| cohere-ai | Cohere | cohere-ai | Training | ✓ Yes |
| xAI-Bot | xAI | xAI-Bot | Training | ✓ Yes |
| MistralBot | Mistral AI | MistralBot | Training | ✓ Yes |
| HuggingFaceBot | Hugging Face | HuggingFaceBot | Training | ✓ Yes |
| YouBot | You.com | YouBot | AI Search | ✓ Yes |
| DuckAssistBot | DuckDuckGo | DuckAssistBot | AI Search | ✓ Yes |
See the full AI Bot Directory for all 49 bots.
Bots That Ignore robots.txt
Bytespider is operated by ByteDance, the parent company of TikTok. Multiple independent researchers have documented it ignoring Disallow rules. It has also been observed using disguised user-agent strings to bypass detection. robots.txt alone may not be sufficient — consider IP-level blocking via your server firewall or Cloudflare WAF rules.
All other major AI companies (OpenAI, Anthropic, Google, Perplexity, Cohere, Mistral, xAI, Hugging Face) have published official compliance statements. Their bots check robots.txt before crawling and honour Disallow rules.
robots.txt vs. Meta Tags vs. HTTP Headers
You have three complementary tools. robots.txt operates at the crawl level. Meta tags and HTTP headers give per-page control even if the crawler has already retrieved the page.
robots.txtSite-wide or per-directoryChecked before the bot fetches any page
User-agent: GPTBot Disallow: /
Best for: Blanket rules for whole site or large sections
<meta name="robots">Per page (HTML only)Found inside the <head> of an HTML page
<meta name="robots" content="noai, noimageai">
Best for: Page-level overrides, dynamic CMS pages
X-Robots-TagPer page or file (HTTP header)Returned in the HTTP response header
X-Robots-Tag: noai, noimageai
Best for: PDFs, images, API responses — non-HTML resources
Check your current meta tags → AI Meta Tags Checker
5 Common robots.txt Mistakes
Never put Googlebot (or bingbot) in a Disallow rule when targeting AI bots. A wildcard User-agent: * Disallow: / will kill your entire search presence.
User-agent: * Disallow: /
User-agent matching is case-insensitive but the name must match exactly. "GPT-Bot" and "gptbot" both work; "GPT Bot" (with a space) does not.
User-agent: GPT-Bot # wrong — should be GPTBot Disallow: /
OpenAI has three crawlers: GPTBot (training), ChatGPT-User (browsing), and OAI-SearchBot (search). Block all three if that is your intent.
User-agent: GPTBot # ChatGPT-User and OAI-SearchBot still active Disallow: /
robots.txt only controls crawlers that choose to honour it. Malicious scrapers and some commercial crawlers ignore it entirely. For sensitive content, use server-level authentication.
Always validate your updated robots.txt with the Analyzer tool before deploying. A syntax error can accidentally block all crawlers.
Frequently Asked Questions
Yes — all major AI companies (OpenAI, Anthropic, Google, Perplexity, Cohere, Mistral) officially honour robots.txt. The notable exception is Bytespider (ByteDance), which has been documented bypassing disallow rules. robots.txt is a protocol, not a technical barrier.
Blocking AI training bots (GPTBot, ClaudeBot, CCBot) has zero impact on traditional SEO rankings. Blocking AI search bots (PerplexityBot, OAI-SearchBot, Google-Extended) will remove your site from those AI search results — similar to blocking Googlebot from traditional search.
Most AI crawlers recache robots.txt every 24 hours. Some may take up to a week to fully stop crawling newly-disallowed content. If you've already been indexed for training, robots.txt prevents future access — it doesn't retroactively remove previously collected content.
OpenAI and Google offer forms to request removal of content already used for training, but results vary. robots.txt prevents future collection; retroactive removal requires contacting each company individually.
The most balanced approach: block AI training bots (GPTBot, ClaudeBot, CCBot, Bytespider) to protect your content from training datasets, while allowing AI search bots (PerplexityBot, OAI-SearchBot, Google-Extended) so you stay visible in AI-powered search results.
Ready to configure your robots.txt?
Use the free generator to build a configuration with per-bot toggles, one-click presets, and instant copy-paste output.
Related Guides
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.
Scan My Site Free →