How to Block AI Bots on Cloudflare Workers: Complete 2026 Guide
Cloudflare Workers run at the edge — before your origin server ever sees the request. This makes Workers the most efficient place to block AI crawlers: zero origin load, global coverage across 300+ PoPs, and sub-millisecond latency. Workers have no file system, so robots.txt must be served via Workers Static Assets or embedded as a string constant. This guide covers the ES module fetch handler, Hono middleware, KV-based dynamic rules, Pages Functions middleware, and the relationship to Cloudflare's built-in Bot Fight Mode.
Workers Runtime · 2026
Examples use the ES module syntax (export default { async fetch() }) — the current standard. The legacy addEventListener('fetch') Service Worker syntax still works but is not recommended for new projects. TypeScript is supported natively via Wrangler with no extra config.
Methods at a glance
| Method | What it does | Blocks JS-less bots? |
|---|---|---|
| Workers Assets robots.txt | Signals crawlers to stay out | Signal only |
| GET /robots.txt in fetch handler | Dynamic robots.txt, no file system needed | Signal only |
| fetch handler UA check | Hard 403 at edge before origin sees request | ✓ |
| Hono app.use("*") middleware | Hard 403 with framework-style middleware | ✓ |
| KV-based dynamic rules | Update blocked UAs without redeploying | ✓ |
| X-Robots-Tag header | noai on all responses via header mutation | ✓ (header) |
| Pages Functions _middleware.ts | Edge blocking co-deployed with Pages site | ✓ |
| CF Bot Fight Mode | Cloudflare network-layer block (no robots.txt) | ✓ (blunt) |
1. robots.txt — Workers Static Assets
Workers have no file system at runtime — you cannot do fs.readFileSync('robots.txt'). Use Workers Static Assets (the [assets] binding in wrangler.toml) to serve static files from a directory. Cloudflare serves assets before your Worker runs — zero Worker invocation for static file requests.
wrangler.toml
name = "my-site"
main = "src/index.ts"
compatibility_date = "2024-09-23"
# Serve everything in ./public as static assets
[assets]
directory = "./public"
public/robots.txt
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: *
Allow: /With [assets] configured, GET /robots.txt is served directly from Cloudflare's edge cache — your Worker code never runs for this request. If you need dynamic robots.txt (environment-based rules), handle it explicitly in your fetch handler instead:
Dynamic robots.txt in fetch handler
// src/index.ts
const ROBOTS_TXT = `User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: *
Allow: /`;
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const url = new URL(request.url);
if (url.pathname === '/robots.txt') {
return new Response(ROBOTS_TXT, {
headers: { 'Content-Type': 'text/plain; charset=utf-8' },
});
}
// ... rest of handler
return new Response('Hello World');
},
};Embed robots.txt as a module-level string constant — compiled into the Worker bundle at deploy time. No file system reads at runtime.
2. Hard block — ES module fetch handler
The canonical Workers pattern: compile a regex at module scope (once, not per request), check the User-Agent header, exempt /robots.txt so crawlers can always read your directives, then return 403 for matched bots.
// src/index.ts
const BLOCKED_UAS = /GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|Diffbot|ImagesiftBot|Omgili|omgilibot|facebookexternalhit.*AI/i;
const ROBOTS_TXT = `User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: *
Allow: /`;
export interface Env {
// KV namespace (optional — see section 4)
BLOCKED_UAS_KV?: KVNamespace;
}
export default {
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
const url = new URL(request.url);
// Always allow robots.txt through
if (url.pathname === '/robots.txt') {
return new Response(ROBOTS_TXT, {
headers: { 'Content-Type': 'text/plain; charset=utf-8' },
});
}
// Block AI bots
const ua = request.headers.get('user-agent') ?? '';
if (BLOCKED_UAS.test(ua)) {
return new Response('Forbidden', {
status: 403,
headers: { 'Content-Type': 'text/plain' },
});
}
// Pass through to origin (or generate response)
return fetch(request);
},
};Module scope: BLOCKED_UAS is compiled once per Worker isolate startup — not on every request. V8 isolates are reused across requests on the same PoP.
fetch(request): Passing the original Request object to the global fetch() proxies the request to the origin URL. For a standalone Worker, replace this with your actual response logic.
Legacy Service Worker syntax (reference only)
// Legacy — avoid for new projects
addEventListener('fetch', (event) => {
event.respondWith(handleRequest(event.request));
});
async function handleRequest(request: Request): Promise<Response> {
const ua = request.headers.get('user-agent') ?? '';
if (BLOCKED_UAS.test(ua)) {
return new Response('Forbidden', { status: 403 });
}
return fetch(request);
}The Service Worker syntax predates ES modules on Workers. It works, but export default { fetch } is the current standard and required for newer features (Durable Objects, RPC, etc.).
3. Hono middleware on Workers
Hono is the de-facto framework for Cloudflare Workers — ultra-lightweight, Workers-native, and the exported app is a valid fetch handler. Register bot-blocking middleware with app.use('*') before any route, and register the /robots.txt route first so it is exempt.
Install
npm create cloudflare@latest my-site -- --template hono
# or add to existing project:
npm install honosrc/index.ts
import { Hono } from 'hono';
type Bindings = {
BLOCKED_UAS_KV?: KVNamespace;
};
const app = new Hono<{ Bindings: Bindings }>();
const BLOCKED_UAS =
/GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot/i;
const ROBOTS_TXT = `User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: *
Allow: /`;
// 1. robots.txt — exempt before bot-blocking middleware
app.get('/robots.txt', (c) =>
c.text(ROBOTS_TXT, 200, { 'Content-Type': 'text/plain; charset=utf-8' }),
);
// 2. Bot-blocking middleware — registered AFTER robots.txt route
// Hono matches routes in registration order; app.use('*') runs for
// all paths that haven't already returned a response.
app.use('*', async (c, next) => {
const ua = c.req.header('user-agent') ?? '';
if (BLOCKED_UAS.test(ua)) {
return c.text('Forbidden', 403);
}
await next();
});
// 3. Add X-Robots-Tag to every response
app.use('*', async (c, next) => {
await next();
c.header('X-Robots-Tag', 'noai, noimageai');
});
// 4. Your routes
app.get('/', (c) => c.text('Hello World'));
app.get('/api/data', (c) => c.json({ ok: true }));
export default app;Route order matters: In Hono, routes and middleware are processed in registration order. The /robots.txt GET handler is registered first — matching requests return immediately without hitting the bot-blocking middleware.
After-next middleware: The X-Robots-Tag middleware calls await next() first, then sets the header after the route handler runs — this is Hono's equivalent of Express's post-next logic.
4. KV-based dynamic rules
Workers KV lets you update your blocked UA list without redeploying. Store patterns in KV and read them on each request. KV is eventually consistent — global propagation takes up to 60 seconds. That is acceptable for bot rule updates.
wrangler.toml — add KV binding
[[kv_namespaces]]
binding = "BLOCKED_UAS_KV"
id = "your-kv-namespace-id" # wrangler kv:namespace create BLOCKED_UAS_KVSet a value via Wrangler CLI
wrangler kv:key put --binding BLOCKED_UAS_KV "blocked-pattern" "GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Google-Extended|Bytespider|CCBot|PerplexityBot"src/index.ts — KV-aware fetch handler
export interface Env {
BLOCKED_UAS_KV: KVNamespace;
}
// Module-level cache — survives across requests in the same isolate
let cachedPattern: RegExp | null = null;
let cacheExpiry = 0;
const CACHE_TTL_MS = 60_000; // refresh KV value every 60 seconds
async function getBlockedPattern(env: Env): Promise<RegExp> {
const now = Date.now();
if (cachedPattern && now < cacheExpiry) return cachedPattern;
const raw = await env.BLOCKED_UAS_KV.get('blocked-pattern');
if (raw) {
cachedPattern = new RegExp(raw, 'i');
cacheExpiry = now + CACHE_TTL_MS;
}
return cachedPattern ?? /GPTBot|ClaudeBot/i; // fallback
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const url = new URL(request.url);
if (url.pathname === '/robots.txt') {
return new Response('User-agent: *\nAllow: /', {
headers: { 'Content-Type': 'text/plain' },
});
}
const pattern = await getBlockedPattern(env);
const ua = request.headers.get('user-agent') ?? '';
if (pattern.test(ua)) {
return new Response('Forbidden', { status: 403 });
}
return fetch(request);
},
};The module-level cache avoids a KV read on every request. Isolates are long-lived on busy PoPs, so the cache is effective. On cold-start isolates the cache is empty — the first request pays the KV latency (~5 ms).
5. X-Robots-Tag on all responses
To opt out of AI training on a per-page basis without modifying HTML, inject X-Robots-Tag: noai, noimageai on every response. In a raw Worker, mutate the response headers before returning:
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const url = new URL(request.url);
// robots.txt pass-through — no X-Robots-Tag needed
if (url.pathname === '/robots.txt') {
return new Response(ROBOTS_TXT, { headers: { 'Content-Type': 'text/plain' } });
}
// Block bots
const ua = request.headers.get('user-agent') ?? '';
if (BLOCKED_UAS.test(ua)) {
return new Response('Forbidden', { status: 403 });
}
// Fetch from origin and clone response to add header
const originResponse = await fetch(request);
const response = new Response(originResponse.body, originResponse);
response.headers.set('X-Robots-Tag', 'noai, noimageai');
return response;
},
};Response objects are immutable in Workers — you cannot set headers on an existing response. Clone it with new Response(body, init) first, then mutate response.headers.
6. Pages Functions — _middleware.ts
If your site is deployed on Cloudflare Pages rather than a standalone Worker, use Pages Functions for bot blocking. Create functions/_middleware.ts at the project root — it runs before every request on your Pages site, including static asset requests (except those served directly from the edge cache).
functions/_middleware.ts
import type { PagesFunction } from '@cloudflare/workers-types';
const BLOCKED_UAS =
/GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot/i;
export const onRequest: PagesFunction = async ({ request, next }) => {
const url = new URL(request.url);
// Always allow robots.txt through
if (url.pathname === '/robots.txt') {
return next();
}
const ua = request.headers.get('user-agent') ?? '';
if (BLOCKED_UAS.test(ua)) {
return new Response('Forbidden', { status: 403 });
}
const response = await next();
// Add X-Robots-Tag to all non-blocked responses
const newResponse = new Response(response.body, response);
newResponse.headers.set('X-Robots-Tag', 'noai, noimageai');
return newResponse;
};Deployment: Pages Functions are bundled and deployed automatically with wrangler pages deploy or via the Pages Git integration. No separate Worker deploy step needed. KV bindings are configured in the Pages project settings, not in wrangler.toml.
Scoped middleware (route-specific)
Pages Functions supports middleware scoping — create functions/api/_middleware.ts to only run on /api/* routes:
// functions/api/_middleware.ts — only runs on /api/* paths
export const onRequest: PagesFunction = async ({ request, next }) => {
const ua = request.headers.get('user-agent') ?? '';
if (BLOCKED_UAS.test(ua)) {
return Response.json({ error: 'Forbidden' }, { status: 403 });
}
return next();
};7. wrangler.toml — full configuration
name = "my-site-bot-blocker"
main = "src/index.ts"
compatibility_date = "2024-09-23"
# Route: custom domain
[[routes]]
pattern = "example.com/*"
zone_name = "example.com"
# Static assets served before Worker runs
[assets]
directory = "./public"
binding = "ASSETS"
# KV namespace for dynamic blocked UA list
[[kv_namespaces]]
binding = "BLOCKED_UAS_KV"
id = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Environment variables
[vars]
ENVIRONMENT = "production"
# Development overrides
[env.dev.vars]
ENVIRONMENT = "development"
# Local dev: use preview KV namespace
[[env.dev.kv_namespaces]]
binding = "BLOCKED_UAS_KV"
id = "yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy"
preview_id = "yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy"Local development
# Start local dev server (hot reload)
wrangler dev
# Set KV value in preview namespace
wrangler kv:key put --binding BLOCKED_UAS_KV "blocked-pattern" "GPTBot|ClaudeBot" --local
# Deploy to production
wrangler deploy8. Cloudflare's built-in Bot Fight Mode
Cloudflare's dashboard has a native “AI Scrapers and Crawlers” toggle under Security → Bots. It blocks known AI bots at the network layer before your Worker runs. Compare it to a Worker-based approach:
| Feature | CF Bot Fight Mode | Worker UA check |
|---|---|---|
| Setup | One toggle in dashboard | Code + deploy |
| robots.txt honoured | ✗ Hard blocks regardless | ✓ You control exemptions |
| Granular path control | ✗ All or nothing | ✓ Per-path logic |
| Allow some AI bots | ✗ Block all or none | ✓ Allowlist specific UAs |
| Custom response | ✗ Cloudflare default page | ✓ Any response/status |
| Requires Cloudflare proxy | ✓ Yes (orange cloud) | ✓ Yes (Worker runs on CF) |
| Cost | Free (Pro+ plans) | Workers Free: 100K req/day |
You can use both simultaneously — Bot Fight Mode as a coarse first layer, your Worker as a fine-grained second layer. Bot Fight Mode blocks before your Worker runs, so matched bots never reach your Worker code.
9. Workers vs nginx vs framework edge middleware
| Approach | Where it runs | Latency added | Origin load |
|---|---|---|---|
| CF Bot Fight Mode | CF network layer | None | None |
| Cloudflare Worker | CF edge (300+ PoPs) | < 1 ms | None |
| nginx map block | Your VPS/server | < 1 ms local | Server receives |
| Next.js middleware | Vercel edge / Node.js | 1–5 ms (Vercel edge) | None (Vercel edge) |
| App-level middleware | Origin server process | N/A (same process) | Full origin load |
10. Workers pricing
| Plan | Requests | CPU time | Cost |
|---|---|---|---|
| Workers Free | 100K req/day | 10 ms / req | Free |
| Workers Paid | 10M req/month incl. | 30 ms / req | $5/month |
| Workers Paid (overage) | Beyond 10M | 30 ms / req | $0.50 / 1M req |
| KV reads (Free) | 100K reads/day | — | Free |
| KV reads (Paid) | 10M reads/month incl. | — | $5/month base |
Bot-blocking Workers are CPU-light (regex match + header read) — well within the 10 ms Free tier CPU limit. The 100K req/day Free limit is per account, not per Worker. Most personal sites operate comfortably on the Free tier.
FAQ
How do I serve robots.txt in a Cloudflare Worker?
Two options: (1) Workers Static Assets — add [assets] to wrangler.toml pointing at a public/ directory. Cloudflare serves it before your Worker runs. (2) Explicit route — handle GET /robots.txt in your fetch handler and return the content as a string constant. Workers have no file system at runtime.
How do I block AI bots in a Cloudflare Worker?
Read request.headers.get('user-agent') and match it against a module-level regex. Return new Response('Forbidden', { status: 403 }) for matched bots. Exempt /robots.txt so crawlers can still read your directives.
What is the difference between Workers and Pages Functions?
Workers are standalone scripts deployed with wrangler deploy on a custom domain via Routes. Pages Functions (functions/_middleware.ts) are co-deployed with a Pages static site — no separate Worker deploy. Both run the same V8 runtime with identical APIs. Use Pages Functions if your site is already on Cloudflare Pages.
Can I update the blocked UA list without redeploying?
Yes, with Workers KV. Store the blocked pattern string in KV and read it in your Worker. Add a module-level cache with a 60-second TTL to avoid KV reads on every request. Update via wrangler kv:key put — changes propagate globally within 60 seconds.
Does Cloudflare already block AI bots automatically?
Cloudflare's “AI Scrapers and Crawlers” toggle (Security → Bots) blocks known AI bots at the network layer. But it does not honour your robots.txt — it hard-blocks regardless of crawl directives. A Worker gives granular control: allow some AI bots, block others, and serve different responses by path.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.