ClaudeBot is Anthropic's training crawler for Claude AI models. Here's how to opt out — plus what Anthropic actually collects, and how to request content removal.
ClaudeBot crawls publicly available web pages to build training datasets for Anthropic's Claude models. It focuses on text content — articles, documentation, blog posts, and other written material that improves Claude's factual knowledge, writing quality, and reasoning ability.
Anthropic began more aggressive web crawling in late 2023 as it scaled training for Claude 2, Claude 3, and subsequent model families. Unlike some AI companies that rely primarily on licensed datasets, Anthropic uses web crawl data as a significant component of its training pipeline.
Anthropic uses two user agent tokens that publishers should be aware of: ClaudeBot (the primary one) and anthropic-ai (used in some contexts). A complete block requires both.
robots.txt (Recommended)User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: /
Block both tokens — Anthropic has used anthropic-ai as an alternate identifier.
# Block ClaudeBot from original/paid content User-agent: ClaudeBot Disallow: /articles/ Disallow: /premium/ Disallow: /research/ User-agent: anthropic-ai Disallow: /articles/ Disallow: /premium/ Disallow: /research/
# Block all AI training crawlers User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / # Normal search indexing — unaffected User-agent: Googlebot Allow: / User-agent: Bingbot Allow: /
import { MetadataRoute } from 'next';
export default function robots(): MetadataRoute.Robots {
return {
rules: [
{ userAgent: 'ClaudeBot', disallow: ['/'] },
{ userAgent: 'anthropic-ai', disallow: ['/'] },
{ userAgent: 'GPTBot', disallow: ['/'] },
{ userAgent: 'Google-Extended', disallow: ['/'] },
{ userAgent: 'Googlebot', allow: ['/'] },
{ userAgent: '*', allow: ['/'] },
],
sitemap: 'https://yoursite.com/sitemap.xml',
};
}Since Anthropic reliably respects robots.txt, server-level blocking is generally not needed. Use it only if you want hard 403 enforcement regardless of robots.txt.
if ($http_user_agent ~* "(ClaudeBot|anthropic-ai)") {
return 403;
}(http.user_agent contains "ClaudeBot") or (http.user_agent contains "anthropic-ai") → Action: Block
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';
const ANTHROPIC_BOTS = ['ClaudeBot', 'anthropic-ai'];
export function middleware(request: NextRequest) {
const ua = request.headers.get('user-agent') ?? '';
if (ANTHROPIC_BOTS.some(bot => ua.includes(bot))) {
return new NextResponse('Forbidden', { status: 403 });
}
return NextResponse.next();
}
export const config = {
matcher: ['/((?!_next/static|_next/image|favicon.ico).*)'],
};If your content has already been crawled, you can request that Anthropic exclude it from future training runs via their privacy portal. This is forward-looking — it cannot remove content from models already trained.
https://yoursite.com/robots.txt
Confirm both ClaudeBot and anthropic-ai appear with Disallow: /.
curl -A "ClaudeBot" -I https://yoursite.com/robots.txt # Expect 200 — then no further requests from ClaudeBot
grep -i "claudebot|anthropic" /var/log/nginx/access.log | tail -20 # After block: only /robots.txt requests, nothing else
| ClaudeBot | Claude.ai browsing | |
|---|---|---|
| Triggered by | Anthropic's automated training pipeline | A user asking Claude to visit a URL |
| Purpose | Building training datasets | Real-time information retrieval |
| User agent | ClaudeBot / anthropic-ai | Varies (often headless browser UA) |
| Blocked by robots.txt? | Yes ✓ | Partially (behavior varies) |
| Frequency | Systematic, periodic sweeps | On-demand, triggered by users |
Free AI visibility check — see which training bots have access to your content and generate a custom robots.txt to block them.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.
Scan My Site Free →