Skip to content
Cloudflare Workers · Edge·9 min read

How to Block AI Bots on Cloudflare Workers: Complete 2026 Guide

Cloudflare Workers run at the edge — before your origin server ever sees the request. This makes Workers the most efficient place to block AI crawlers: zero origin load, global coverage across 300+ PoPs, and sub-millisecond latency. Workers have no file system, so robots.txt must be served via Workers Static Assets or embedded as a string constant. This guide covers the ES module fetch handler, Hono middleware, KV-based dynamic rules, Pages Functions middleware, and the relationship to Cloudflare's built-in Bot Fight Mode.

Workers Runtime · 2026

Examples use the ES module syntax (export default { async fetch() }) — the current standard. The legacy addEventListener('fetch') Service Worker syntax still works but is not recommended for new projects. TypeScript is supported natively via Wrangler with no extra config.

Methods at a glance

MethodWhat it doesBlocks JS-less bots?
Workers Assets robots.txtSignals crawlers to stay outSignal only
GET /robots.txt in fetch handlerDynamic robots.txt, no file system neededSignal only
fetch handler UA checkHard 403 at edge before origin sees request
Hono app.use("*") middlewareHard 403 with framework-style middleware
KV-based dynamic rulesUpdate blocked UAs without redeploying
X-Robots-Tag headernoai on all responses via header mutation✓ (header)
Pages Functions _middleware.tsEdge blocking co-deployed with Pages site
CF Bot Fight ModeCloudflare network-layer block (no robots.txt)✓ (blunt)

1. robots.txt — Workers Static Assets

Workers have no file system at runtime — you cannot do fs.readFileSync('robots.txt'). Use Workers Static Assets (the [assets] binding in wrangler.toml) to serve static files from a directory. Cloudflare serves assets before your Worker runs — zero Worker invocation for static file requests.

wrangler.toml

name = "my-site"
main = "src/index.ts"
compatibility_date = "2024-09-23"

# Serve everything in ./public as static assets
[assets]
directory = "./public"

public/robots.txt

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: *
Allow: /

With [assets] configured, GET /robots.txt is served directly from Cloudflare's edge cache — your Worker code never runs for this request. If you need dynamic robots.txt (environment-based rules), handle it explicitly in your fetch handler instead:

Dynamic robots.txt in fetch handler

// src/index.ts
const ROBOTS_TXT = `User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: *
Allow: /`;

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const url = new URL(request.url);

    if (url.pathname === '/robots.txt') {
      return new Response(ROBOTS_TXT, {
        headers: { 'Content-Type': 'text/plain; charset=utf-8' },
      });
    }

    // ... rest of handler
    return new Response('Hello World');
  },
};

Embed robots.txt as a module-level string constant — compiled into the Worker bundle at deploy time. No file system reads at runtime.

2. Hard block — ES module fetch handler

The canonical Workers pattern: compile a regex at module scope (once, not per request), check the User-Agent header, exempt /robots.txt so crawlers can always read your directives, then return 403 for matched bots.

// src/index.ts

const BLOCKED_UAS = /GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|Diffbot|ImagesiftBot|Omgili|omgilibot|facebookexternalhit.*AI/i;

const ROBOTS_TXT = `User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: *
Allow: /`;

export interface Env {
  // KV namespace (optional — see section 4)
  BLOCKED_UAS_KV?: KVNamespace;
}

export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    const url = new URL(request.url);

    // Always allow robots.txt through
    if (url.pathname === '/robots.txt') {
      return new Response(ROBOTS_TXT, {
        headers: { 'Content-Type': 'text/plain; charset=utf-8' },
      });
    }

    // Block AI bots
    const ua = request.headers.get('user-agent') ?? '';
    if (BLOCKED_UAS.test(ua)) {
      return new Response('Forbidden', {
        status: 403,
        headers: { 'Content-Type': 'text/plain' },
      });
    }

    // Pass through to origin (or generate response)
    return fetch(request);
  },
};

Module scope: BLOCKED_UAS is compiled once per Worker isolate startup — not on every request. V8 isolates are reused across requests on the same PoP.

fetch(request): Passing the original Request object to the global fetch() proxies the request to the origin URL. For a standalone Worker, replace this with your actual response logic.

Legacy Service Worker syntax (reference only)

// Legacy — avoid for new projects
addEventListener('fetch', (event) => {
  event.respondWith(handleRequest(event.request));
});

async function handleRequest(request: Request): Promise<Response> {
  const ua = request.headers.get('user-agent') ?? '';
  if (BLOCKED_UAS.test(ua)) {
    return new Response('Forbidden', { status: 403 });
  }
  return fetch(request);
}

The Service Worker syntax predates ES modules on Workers. It works, but export default { fetch } is the current standard and required for newer features (Durable Objects, RPC, etc.).

3. Hono middleware on Workers

Hono is the de-facto framework for Cloudflare Workers — ultra-lightweight, Workers-native, and the exported app is a valid fetch handler. Register bot-blocking middleware with app.use('*') before any route, and register the /robots.txt route first so it is exempt.

Install

npm create cloudflare@latest my-site -- --template hono
# or add to existing project:
npm install hono

src/index.ts

import { Hono } from 'hono';

type Bindings = {
  BLOCKED_UAS_KV?: KVNamespace;
};

const app = new Hono<{ Bindings: Bindings }>();

const BLOCKED_UAS =
  /GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot/i;

const ROBOTS_TXT = `User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: *
Allow: /`;

// 1. robots.txt — exempt before bot-blocking middleware
app.get('/robots.txt', (c) =>
  c.text(ROBOTS_TXT, 200, { 'Content-Type': 'text/plain; charset=utf-8' }),
);

// 2. Bot-blocking middleware — registered AFTER robots.txt route
//    Hono matches routes in registration order; app.use('*') runs for
//    all paths that haven't already returned a response.
app.use('*', async (c, next) => {
  const ua = c.req.header('user-agent') ?? '';
  if (BLOCKED_UAS.test(ua)) {
    return c.text('Forbidden', 403);
  }
  await next();
});

// 3. Add X-Robots-Tag to every response
app.use('*', async (c, next) => {
  await next();
  c.header('X-Robots-Tag', 'noai, noimageai');
});

// 4. Your routes
app.get('/', (c) => c.text('Hello World'));
app.get('/api/data', (c) => c.json({ ok: true }));

export default app;

Route order matters: In Hono, routes and middleware are processed in registration order. The /robots.txt GET handler is registered first — matching requests return immediately without hitting the bot-blocking middleware.

After-next middleware: The X-Robots-Tag middleware calls await next() first, then sets the header after the route handler runs — this is Hono's equivalent of Express's post-next logic.

4. KV-based dynamic rules

Workers KV lets you update your blocked UA list without redeploying. Store patterns in KV and read them on each request. KV is eventually consistent — global propagation takes up to 60 seconds. That is acceptable for bot rule updates.

wrangler.toml — add KV binding

[[kv_namespaces]]
binding = "BLOCKED_UAS_KV"
id = "your-kv-namespace-id"  # wrangler kv:namespace create BLOCKED_UAS_KV

Set a value via Wrangler CLI

wrangler kv:key put --binding BLOCKED_UAS_KV "blocked-pattern"   "GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Google-Extended|Bytespider|CCBot|PerplexityBot"

src/index.ts — KV-aware fetch handler

export interface Env {
  BLOCKED_UAS_KV: KVNamespace;
}

// Module-level cache — survives across requests in the same isolate
let cachedPattern: RegExp | null = null;
let cacheExpiry = 0;
const CACHE_TTL_MS = 60_000; // refresh KV value every 60 seconds

async function getBlockedPattern(env: Env): Promise<RegExp> {
  const now = Date.now();
  if (cachedPattern && now < cacheExpiry) return cachedPattern;

  const raw = await env.BLOCKED_UAS_KV.get('blocked-pattern');
  if (raw) {
    cachedPattern = new RegExp(raw, 'i');
    cacheExpiry = now + CACHE_TTL_MS;
  }
  return cachedPattern ?? /GPTBot|ClaudeBot/i; // fallback
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const url = new URL(request.url);
    if (url.pathname === '/robots.txt') {
      return new Response('User-agent: *\nAllow: /', {
        headers: { 'Content-Type': 'text/plain' },
      });
    }

    const pattern = await getBlockedPattern(env);
    const ua = request.headers.get('user-agent') ?? '';
    if (pattern.test(ua)) {
      return new Response('Forbidden', { status: 403 });
    }

    return fetch(request);
  },
};

The module-level cache avoids a KV read on every request. Isolates are long-lived on busy PoPs, so the cache is effective. On cold-start isolates the cache is empty — the first request pays the KV latency (~5 ms).

5. X-Robots-Tag on all responses

To opt out of AI training on a per-page basis without modifying HTML, inject X-Robots-Tag: noai, noimageai on every response. In a raw Worker, mutate the response headers before returning:

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const url = new URL(request.url);

    // robots.txt pass-through — no X-Robots-Tag needed
    if (url.pathname === '/robots.txt') {
      return new Response(ROBOTS_TXT, { headers: { 'Content-Type': 'text/plain' } });
    }

    // Block bots
    const ua = request.headers.get('user-agent') ?? '';
    if (BLOCKED_UAS.test(ua)) {
      return new Response('Forbidden', { status: 403 });
    }

    // Fetch from origin and clone response to add header
    const originResponse = await fetch(request);
    const response = new Response(originResponse.body, originResponse);
    response.headers.set('X-Robots-Tag', 'noai, noimageai');
    return response;
  },
};

Response objects are immutable in Workers — you cannot set headers on an existing response. Clone it with new Response(body, init) first, then mutate response.headers.

6. Pages Functions — _middleware.ts

If your site is deployed on Cloudflare Pages rather than a standalone Worker, use Pages Functions for bot blocking. Create functions/_middleware.ts at the project root — it runs before every request on your Pages site, including static asset requests (except those served directly from the edge cache).

functions/_middleware.ts

import type { PagesFunction } from '@cloudflare/workers-types';

const BLOCKED_UAS =
  /GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot/i;

export const onRequest: PagesFunction = async ({ request, next }) => {
  const url = new URL(request.url);

  // Always allow robots.txt through
  if (url.pathname === '/robots.txt') {
    return next();
  }

  const ua = request.headers.get('user-agent') ?? '';
  if (BLOCKED_UAS.test(ua)) {
    return new Response('Forbidden', { status: 403 });
  }

  const response = await next();

  // Add X-Robots-Tag to all non-blocked responses
  const newResponse = new Response(response.body, response);
  newResponse.headers.set('X-Robots-Tag', 'noai, noimageai');
  return newResponse;
};

Deployment: Pages Functions are bundled and deployed automatically with wrangler pages deploy or via the Pages Git integration. No separate Worker deploy step needed. KV bindings are configured in the Pages project settings, not in wrangler.toml.

Scoped middleware (route-specific)

Pages Functions supports middleware scoping — create functions/api/_middleware.ts to only run on /api/* routes:

// functions/api/_middleware.ts — only runs on /api/* paths
export const onRequest: PagesFunction = async ({ request, next }) => {
  const ua = request.headers.get('user-agent') ?? '';
  if (BLOCKED_UAS.test(ua)) {
    return Response.json({ error: 'Forbidden' }, { status: 403 });
  }
  return next();
};

7. wrangler.toml — full configuration

name = "my-site-bot-blocker"
main = "src/index.ts"
compatibility_date = "2024-09-23"

# Route: custom domain
[[routes]]
pattern = "example.com/*"
zone_name = "example.com"

# Static assets served before Worker runs
[assets]
directory = "./public"
binding = "ASSETS"

# KV namespace for dynamic blocked UA list
[[kv_namespaces]]
binding = "BLOCKED_UAS_KV"
id = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Environment variables
[vars]
ENVIRONMENT = "production"

# Development overrides
[env.dev.vars]
ENVIRONMENT = "development"

# Local dev: use preview KV namespace
[[env.dev.kv_namespaces]]
binding = "BLOCKED_UAS_KV"
id = "yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy"
preview_id = "yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy"

Local development

# Start local dev server (hot reload)
wrangler dev

# Set KV value in preview namespace
wrangler kv:key put --binding BLOCKED_UAS_KV "blocked-pattern"   "GPTBot|ClaudeBot" --local

# Deploy to production
wrangler deploy

8. Cloudflare's built-in Bot Fight Mode

Cloudflare's dashboard has a native “AI Scrapers and Crawlers” toggle under Security → Bots. It blocks known AI bots at the network layer before your Worker runs. Compare it to a Worker-based approach:

FeatureCF Bot Fight ModeWorker UA check
SetupOne toggle in dashboardCode + deploy
robots.txt honoured✗ Hard blocks regardless✓ You control exemptions
Granular path control✗ All or nothing✓ Per-path logic
Allow some AI bots✗ Block all or none✓ Allowlist specific UAs
Custom response✗ Cloudflare default page✓ Any response/status
Requires Cloudflare proxy✓ Yes (orange cloud)✓ Yes (Worker runs on CF)
CostFree (Pro+ plans)Workers Free: 100K req/day

You can use both simultaneously — Bot Fight Mode as a coarse first layer, your Worker as a fine-grained second layer. Bot Fight Mode blocks before your Worker runs, so matched bots never reach your Worker code.

9. Workers vs nginx vs framework edge middleware

ApproachWhere it runsLatency addedOrigin load
CF Bot Fight ModeCF network layerNoneNone
Cloudflare WorkerCF edge (300+ PoPs)< 1 msNone
nginx map blockYour VPS/server< 1 ms localServer receives
Next.js middlewareVercel edge / Node.js1–5 ms (Vercel edge)None (Vercel edge)
App-level middlewareOrigin server processN/A (same process)Full origin load

10. Workers pricing

PlanRequestsCPU timeCost
Workers Free100K req/day10 ms / reqFree
Workers Paid10M req/month incl.30 ms / req$5/month
Workers Paid (overage)Beyond 10M30 ms / req$0.50 / 1M req
KV reads (Free)100K reads/dayFree
KV reads (Paid)10M reads/month incl.$5/month base

Bot-blocking Workers are CPU-light (regex match + header read) — well within the 10 ms Free tier CPU limit. The 100K req/day Free limit is per account, not per Worker. Most personal sites operate comfortably on the Free tier.

FAQ

How do I serve robots.txt in a Cloudflare Worker?

Two options: (1) Workers Static Assets — add [assets] to wrangler.toml pointing at a public/ directory. Cloudflare serves it before your Worker runs. (2) Explicit route — handle GET /robots.txt in your fetch handler and return the content as a string constant. Workers have no file system at runtime.

How do I block AI bots in a Cloudflare Worker?

Read request.headers.get('user-agent') and match it against a module-level regex. Return new Response('Forbidden', { status: 403 }) for matched bots. Exempt /robots.txt so crawlers can still read your directives.

What is the difference between Workers and Pages Functions?

Workers are standalone scripts deployed with wrangler deploy on a custom domain via Routes. Pages Functions (functions/_middleware.ts) are co-deployed with a Pages static site — no separate Worker deploy. Both run the same V8 runtime with identical APIs. Use Pages Functions if your site is already on Cloudflare Pages.

Can I update the blocked UA list without redeploying?

Yes, with Workers KV. Store the blocked pattern string in KV and read it in your Worker. Add a module-level cache with a 60-second TTL to avoid KV reads on every request. Update via wrangler kv:key put — changes propagate globally within 60 seconds.

Does Cloudflare already block AI bots automatically?

Cloudflare's “AI Scrapers and Crawlers” toggle (Security → Bots) blocks known AI bots at the network layer. But it does not honour your robots.txt — it hard-blocks regardless of crawl directives. A Worker gives granular control: allow some AI bots, block others, and serve different responses by path.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

Related Guides