How to Opt Out of AI Training: Every Method, Ranked
Six ways to stop AI companies from training on your content — ranked by effectiveness, coverage, and effort. What actually works, what's performative, and what to do first.
Updated April 2026
Before you start: what's actually possible
You cannot retroactively remove your content from deployed AI models. If your content was crawled before today and used in a training run, it exists in the model's weights. There is no technical mechanism to extract specific training examples from a deployed neural network.
You can stop future training. Blocking crawlers today means your content won't enter the next batch of training data. As models get retrained and new versions are released, your block takes effect for all future model versions.
Not all AI companies respect opt-outs equally. Most major labs (OpenAI, Anthropic, Google, Meta) reliably respect robots.txt. Some do not. This guide is honest about the gap.
All 6 Opt-Out Methods, Ranked
The primary, industry-standard opt-out mechanism. Works for OpenAI, Anthropic, Google, Meta, Mistral, Common Crawl, and most responsible AI companies.
noai & noimageai meta tags
ModerateHTML meta tags that signal opt-out preference. Limited adoption — not as widely respected as robots.txt, but adds a signal for crawlers that check page-level permissions.
llms.txt file
ModerateA structured file telling AI assistants and LLM tools how to interact with your site. Not for blocking training crawlers, but controls AI agent behaviour and surfaces preferred content.
TDMRep (W3C standard)
ModerateThe W3C Text & Data Mining Rights protocol. Declares your rights machine-readably. Has legal weight in the EU under the DSM Directive. Growing adoption among European AI platforms.
Company-specific opt-out forms
Retroactive removal onlyAnthropic, OpenAI, and some others offer web forms to request content removal from training data. The only option that addresses already-crawled content — but effectiveness is limited and retroactive removal from deployed models is technically impossible.
Block AI crawlers at the HTTP layer — 403 before the request hits your content. Best for crawlers with documented robots.txt non-compliance (e.g., Bytespider). More reliable than robots.txt, but requires maintenance as user agents change.
The 5-Minute Complete Block
If you want to block all major AI training crawlers right now, add this to your robots.txt:
# Common Crawl — feeds 50+ open-source AI models User-agent: CCBot Disallow: / # OpenAI — trains GPT models User-agent: GPTBot Disallow: / # Anthropic — trains Claude models User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / # Google — trains Gemini (separate from Googlebot search crawler) User-agent: Google-Extended Disallow: / # Meta — trains Llama models User-agent: meta-externalagent Disallow: / # Mistral AI User-agent: MistralBot Disallow: / # ByteDance — ignores robots.txt, but worth adding User-agent: Bytespider Disallow: / # Allen Institute AI User-agent: AI2Bot Disallow: /
⚠ This does NOT block search engine crawlers
Googlebot (search), Bingbot (search), and other SEO crawlers are completely separate user agents. Adding these rules has zero effect on your Google or Bing search rankings.
The 8 AI Training Crawlers You Need to Know
These are the highest-impact crawlers to block, based on how many AI models use their data:
| User Agent | Company |
|---|---|
| CCBot | Common Crawl |
| GPTBot | OpenAI |
| ClaudeBot | Anthropic |
| Google-Extended | |
| meta-externalagent | Meta |
| MistralBot | Mistral AI |
| Bytespider | ByteDance |
| AI2Bot | Allen Institute |
Company-Specific Opt-Out Forms
For content that was already crawled and used in training, a small number of companies offer removal request forms. These are limited and retroactive removal from deployed model weights is technically not possible — but they can affect future training runs:
Most AI companies (Google, Meta, Mistral, ByteDance) do not offer public opt-out forms for training data removal. Blocking their crawlers via robots.txt is the primary recourse.
Verify Your Opt-Out Is Working
After making changes, verify your opt-out configuration is correctly formed:
Run a free scan at openshadow.io/check — verifies your robots.txt, meta tags, and overall AI readiness score in one shot.
Tests that specific user agents (e.g., GPTBot) are correctly disallowed. Access via Search Console → Settings → robots.txt inspector.
Monitor for AI bot user agents in your access logs. After blocking, requests from those user agents should stop (or return 403 if server-level blocked).
Frequently Asked Questions
Does blocking AI crawlers affect my Google rankings?
No. AI training crawlers (GPTBot, ClaudeBot, CCBot, etc.) are completely separate from search engine crawlers (Googlebot, Bingbot). Blocking AI training bots has zero effect on your SEO. One important distinction: Google-Extended is Google's AI training agent — blocking it does NOT affect Googlebot or your Google search rankings.
What happens if an AI company ignores my robots.txt?
Most major companies respect robots.txt. Bytespider (ByteDance) has documented cases of ignoring it — for this crawler, server-level blocking via nginx or Cloudflare WAF provides stronger enforcement. You can return 403 for requests matching the Bytespider user agent string.
I blocked everything — why is ChatGPT still discussing my content?
ChatGPT's knowledge comes from training data already collected before your block. Deployed models don't update dynamically — they use a fixed snapshot. Your block prevents your content from appearing in future training runs, not current deployed models. As OpenAI trains GPT-5 and beyond, your blocked pages won't be included.
Should I block AI search crawlers like OAI-SearchBot and PerplexityBot?
That depends on your goals. AI search crawlers (OAI-SearchBot, PerplexityBot) are used for AI search products, not AI model training. Blocking them means your content won't appear in ChatGPT Search or Perplexity results — reducing potential referral traffic. For training data protection specifically, focus on GPTBot, ClaudeBot, CCBot, Google-Extended, and meta-externalagent.
Individual Bot Blocking Guides
Detailed guides for each major AI training crawler:
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.
Scan My Site Free →