How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

Can I request removal of my content from existing Claude training data?

Anthropic provides a removal request form at privacy.anthropic.com/en/policies/usage-policy. You can submit URLs or entire domains for review. Note that removal from training data does not guarantee removal from deployed model weights — trained parameters can't simply be 'unlearned' once a model is deployed. The request affects future training runs.

Open Shadow ← All Guides

AI Training · Anthropic

How to Block ClaudeBot

ClaudeBot is Anthropic's training crawler for Claude AI models. Here's how to opt out — plus what Anthropic actually collects, and how to request content removal.

✓ Respects robots.txt

Anthropic reliably honors Disallow directives — no server-level block required

Removal form available

Anthropic offers a content removal request form — rare among AI companies

2 user agent tokens

Block both ClaudeBot and anthropic-ai for full coverage

What Does ClaudeBot Collect?

ClaudeBot crawls publicly available web pages to build training datasets for Anthropic's Claude models. It focuses on text content — articles, documentation, blog posts, and other written material that improves Claude's factual knowledge, writing quality, and reasoning ability.

Anthropic began more aggressive web crawling in late 2023 as it scaled training for Claude 2, Claude 3, and subsequent model families. Unlike some AI companies that rely primarily on licensed datasets, Anthropic uses web crawl data as a significant component of its training pipeline.

Anthropic uses two user agent tokens that publishers should be aware of: ClaudeBot (the primary one) and anthropic-ai (used in some contexts). A complete block requires both.

Option 1: Block via `robots.txt` (Recommended)

Block entire site — both Anthropic user agentsRecommended

robots.txt

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

Block both tokens — Anthropic has used anthropic-ai as an alternate identifier.

Block specific paths (protect premium content)

robots.txt

# Block ClaudeBot from original/paid content
User-agent: ClaudeBot
Disallow: /articles/
Disallow: /premium/
Disallow: /research/

User-agent: anthropic-ai
Disallow: /articles/
Disallow: /premium/
Disallow: /research/

Block all major AI training crawlers

robots.txt

# Block all AI training crawlers
User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

# Normal search indexing — unaffected
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

Option 2: Next.js App Router

app/robots.ts

import { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      { userAgent: 'ClaudeBot', disallow: ['/'] },
      { userAgent: 'anthropic-ai', disallow: ['/'] },
      { userAgent: 'GPTBot', disallow: ['/'] },
      { userAgent: 'Google-Extended', disallow: ['/'] },
      { userAgent: 'Googlebot', allow: ['/'] },
      { userAgent: '*', allow: ['/'] },
    ],
    sitemap: 'https://yoursite.com/sitemap.xml',
  };
}

Option 3: Server-Level Block

Since Anthropic reliably respects robots.txt, server-level blocking is generally not needed. Use it only if you want hard 403 enforcement regardless of robots.txt.

nginx

if ($http_user_agent ~* "(ClaudeBot|anthropic-ai)") {
    return 403;
}

Cloudflare WAF Custom Rule

(http.user_agent contains "ClaudeBot") or
(http.user_agent contains "anthropic-ai")
→ Action: Block

Next.js Middleware

import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';

const ANTHROPIC_BOTS = ['ClaudeBot', 'anthropic-ai'];

export function middleware(request: NextRequest) {
  const ua = request.headers.get('user-agent') ?? '';
  if (ANTHROPIC_BOTS.some(bot => ua.includes(bot))) {
    return new NextResponse('Forbidden', { status: 403 });
  }
  return NextResponse.next();
}

export const config = {
  matcher: ['/((?!_next/static|_next/image|favicon.ico).*)'],
};

Request Removal of Existing Content

Anthropic offers a content removal request form — most AI companies don't

If your content has already been crawled, you can request that Anthropic exclude it from future training runs via their privacy portal. This is forward-looking — it cannot remove content from models already trained.

1.Visit privacy.anthropic.com and submit a data removal request

2.Provide your domain or specific URLs you want excluded

3.Also add robots.txt rules — the removal form and robots.txt work independently

Verify Your Block

1. Check your live robots.txt

https://yoursite.com/robots.txt

Confirm both ClaudeBot and anthropic-ai appear with Disallow: /.

2. Simulate ClaudeBot with curl

curl -A "ClaudeBot" -I https://yoursite.com/robots.txt
# Expect 200 — then no further requests from ClaudeBot

3. Grep server logs

grep -i "claudebot|anthropic" /var/log/nginx/access.log | tail -20
# After block: only /robots.txt requests, nothing else

4. Use Open Shadow's robots.txt checker

→ Check your robots.txt now

ClaudeBot vs Claude.ai Browsing — Know the Difference

	ClaudeBot	Claude.ai browsing
Triggered by	Anthropic's automated training pipeline	A user asking Claude to visit a URL
Purpose	Building training datasets	Real-time information retrieval
User agent	ClaudeBot / anthropic-ai	Varies (often headless browser UA)
Blocked by robots.txt?	Yes ✓	Partially (behavior varies)
Frequency	Systematic, periodic sweeps	On-demand, triggered by users

Frequently Asked Questions

Does Anthropic respect robots.txt for ClaudeBot?▼

Yes. Anthropic has publicly committed to respecting robots.txt Disallow directives for ClaudeBot. Unlike Bytespider, ClaudeBot reliably honors opt-out requests. Anthropic also provides a dedicated removal form at privacy.anthropic.com for content already crawled.

What is the difference between ClaudeBot and Claude.ai browsing?▼

ClaudeBot is Anthropic's background training crawler — it systematically indexes the web to build training datasets. Claude.ai's web browsing is different: it fetches pages in real time when a user asks Claude to visit a specific URL. Blocking ClaudeBot in robots.txt does not reliably prevent on-demand browsing.

Can I request removal of my content from Claude's training data?▼

Anthropic provides a removal request form at privacy.anthropic.com. You can submit URLs or domains for review. Note: removal affects future training runs but cannot remove content from models already trained and deployed.

Does blocking ClaudeBot affect my site's appearance in Claude's answers?▼

Not immediately. Claude's knowledge comes from training data already collected. Blocking ClaudeBot stops future crawls but doesn't erase existing knowledge. Over time, as new Claude versions are trained, blocked content will be progressively excluded.

What user agent does ClaudeBot use?▼

ClaudeBot's full user agent is: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +https://www.anthropic.com/bot). In robots.txt, use the token 'ClaudeBot' — you don't need the full string. Also block 'anthropic-ai' as a second token Anthropic has used.