How to Block AI Bots on Docusaurus: Complete 2026 Guide
Docusaurus generates a static React site — like MkDocs, it has no server process. Bot blocking splits across the content layer (robots.txt in static/, noai meta via headTags in docusaurus.config.js) and the hosting platform layer (X-Robots-Tag headers, hard 403 via Edge Functions). The headTags config approach is the simplest — no theme swizzling required.
headTags is all you need for noai meta — no swizzling
Docusaurus v2/v3 supports a headTags array in docusaurus.config.js that injects arbitrary HTML tags into every page's <head>. This is simpler than swizzling the Root or Layout component. Reserve swizzling for per-page conditional logic that can't be done with a static config entry.
Methods at a glance
| Method | What it does | Where it lives |
|---|---|---|
| static/robots.txt | Signals bots which paths are off-limits | static/ → build/ |
| headTags in docusaurus.config.js | noai meta on every page | Config file |
| <head> in MDX front matter | noai meta on specific pages | Individual .md/.mdx files |
| netlify.toml / vercel.json / _headers | X-Robots-Tag on all responses | Hosting platform |
| Edge Function | Hard 403 on known AI User-Agents | Netlify / Cloudflare |
| Swizzled Root/Layout | Per-page conditional meta | src/theme/ (advanced) |
1. robots.txt — static/ directory
Place robots.txt in the static/ directory at the root of your Docusaurus project. Docusaurus copies the entire static/ directory into build/ unchanged. No config needed.
# static/robots.txt
# Copied to build/robots.txt by Docusaurus — no config needed
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: *
Allow: /# Verify after build:
npm run build
ls build/robots.txt # should existstatic/ vs docs/ vs src/
Only files in static/ are copied to the build output as-is. Files in docs/ are Markdown content processed by Docusaurus. Files in src/ are React components. robots.txt goes in static/ — not in docs/ or src/.
2. noai meta tag — headTags config
The headTags array in docusaurus.config.js (or docusaurus.config.ts) injects HTML tags into the <head> of every generated page. This is the cleanest approach — no component swizzling required.
// docusaurus.config.js (or .ts)
import type { Config } from '@docusaurus/types';
const config: Config = {
title: 'My Documentation',
url: 'https://docs.example.com',
baseUrl: '/',
// Inject meta tags into every page's <head>
headTags: [
{
tagName: 'meta',
attributes: {
name: 'robots',
content: 'noai, noimageai',
},
},
],
// ... rest of config
};
export default config;headTags is available in Docusaurus v2.4+ and v3.x. For older versions, use the theme swizzling approach (Section 4).
3. Per-page meta — MDX head block
In any .md or .mdx file, add a <head> block directly in the file to inject page-specific meta tags. These override or supplement the global headTags config.
---
# docs/some-page.mdx
title: My Page
description: Page description
---
<head>
{/* Block AI indexing on this specific page */}
<meta name="robots" content="noindex, noai, noimageai" />
</head>
# My Page
Content here...The <head> block in MDX uses JSX syntax — self-closing tags need />. This is processed by Docusaurus's MDX pipeline, not HTML directly.
4. Theme swizzling — conditional per-page meta
For conditional logic (e.g. different robots values based on doc category or front matter), swizzle the Root component to wrap every page in a custom component that injects the correct meta tag.
# Eject the Root component (safe — wraps the whole app):
npx docusaurus swizzle @docusaurus/theme-classic Root --eject// src/theme/Root.tsx — after swizzling
import React from 'react';
import Head from '@docusaurus/Head';
import { useLocation } from '@docusaurus/router';
import OriginalRoot from '@theme-original/Root';
// Pages that should not be indexed
const NOINDEX_PATHS = ['/internal/', '/draft/'];
export default function Root(props) {
const { pathname } = useLocation();
const isNoIndex = NOINDEX_PATHS.some((p) => pathname.startsWith(p));
return (
<>
<Head>
<meta
name="robots"
content={isNoIndex ? 'noindex, noai, noimageai' : 'noai, noimageai'}
/>
</Head>
<OriginalRoot {...props} />
</>
);
}Swizzle with --eject (full copy) not --wrap for Root — wrapping works but ejecting gives more control. Keep the original import and delegate to it.
5. X-Robots-Tag — hosting platform
Docusaurus produces a static site — HTTP headers come from your hosting platform.
Netlify — netlify.toml
# netlify.toml
[build]
command = "npm run build"
publish = "build"
[[headers]]
for = "/*"
[headers.values]
X-Robots-Tag = "noai, noimageai"Vercel — vercel.json
{
"buildCommand": "npm run build",
"outputDirectory": "build",
"headers": [
{
"source": "/(.*)",
"headers": [
{ "key": "X-Robots-Tag", "value": "noai, noimageai" }
]
}
]
}Cloudflare Pages — _headers file
# static/_headers — copied to build/_headers by Docusaurus
/*
X-Robots-Tag: noai, noimageaiGitHub Pages — no custom headers
GitHub Pages does not support custom HTTP headers. Use the headTags noai meta approach (Section 2) as your only option, or migrate to Cloudflare Pages for header + edge function support.
6. Hard 403 — edge functions
For hard User-Agent blocking before content is served:
Netlify Edge Function
// netlify/edge-functions/block-ai-bots.js
const BLOCKED_UA = /GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai|CCBot|Google-Extended|PerplexityBot|Amazonbot|Bytespider|YouBot|DuckAssistBot|meta-externalagent|MistralAI-Spider|oai-searchbot/i;
export default async function handler(request, context) {
const ua = request.headers.get("user-agent") || "";
const path = new URL(request.url).pathname;
if (path !== "/robots.txt" && BLOCKED_UA.test(ua)) {
return new Response("Forbidden", { status: 403 });
}
return context.next();
}
export const config = { path: "/*" };# netlify.toml — declare edge function
[[edge_functions]]
path = "/*"
function = "block-ai-bots"
[build]
command = "npm run build"
publish = "build"Cloudflare Pages — _middleware.js
// static/functions/_middleware.js — copied to build/functions/
const BLOCKED_UA = /GPTBot|ClaudeBot|CCBot|PerplexityBot|Amazonbot|Bytespider/i;
export async function onRequest(context) {
const { request, next } = context;
const ua = request.headers.get("user-agent") || "";
const path = new URL(request.url).pathname;
if (path !== "/robots.txt" && BLOCKED_UA.test(ua)) {
return new Response("Forbidden", { status: 403 });
}
return next();
}7. Full docusaurus.config.js example
Complete config with headTags for noai meta, robots.txt reference, and standard Docusaurus v3 structure.
// docusaurus.config.ts
import type { Config } from '@docusaurus/types';
import type * as Preset from '@docusaurus/preset-classic';
const config: Config = {
title: 'My Documentation',
tagline: 'Project docs',
url: 'https://docs.example.com',
baseUrl: '/',
onBrokenLinks: 'throw',
onBrokenMarkdownLinks: 'warn',
favicon: 'img/favicon.ico',
// ── AI training opt-out ────────────────────────────────────────────
// Injects into <head> of every generated page
headTags: [
{
tagName: 'meta',
attributes: {
name: 'robots',
content: 'noai, noimageai',
},
},
],
presets: [
[
'classic',
{
docs: {
sidebarPath: './sidebars.ts',
routeBasePath: '/',
},
blog: false,
theme: {
customCss: './src/css/custom.css',
},
} satisfies Preset.Options,
],
],
themeConfig: {
navbar: {
title: 'My Docs',
items: [
{ to: '/', label: 'Docs', position: 'left' },
],
},
} satisfies Preset.ThemeConfig,
};
export default config;8. Deployment quick reference
| Platform | Build command | Publish dir | Headers |
|---|---|---|---|
| Netlify | npm run build | build | netlify.toml [[headers]] |
| Vercel | npm run build | build | vercel.json headers |
| Cloudflare Pages | npm run build | build | static/_headers file |
| GitHub Pages | npm run build | build | ❌ no custom headers |
| AWS S3 + CloudFront | npm run build | build | CloudFront response headers policy |
Frequently asked questions
How do I add robots.txt to a Docusaurus site?
Place robots.txt in the static/ directory — Docusaurus copies everything from static/ to build/ unchanged. Do not put it in docs/ or src/ — only static/ is copied as-is.
How do I add noai meta tags to Docusaurus?
Use headTags in docusaurus.config.js: headTags: [{ tagName: "meta", attributes: { name: "robots", content: "noai, noimageai" } }]. This injects the meta tag into every page with no swizzling. Available in Docusaurus v2.4+ and v3.
What is theme swizzling and when should I use it?
Swizzling copies a Docusaurus theme component into your project so you can modify it. For a global noai meta tag, use headTags in config — no swizzling needed. Swizzle only when you need per-page conditional logic (e.g. different robots values on /internal/ paths). Use --eject on the Root component for the safest swizzling target.
How do I add X-Robots-Tag headers to a Docusaurus site?
Headers come from your host, not Docusaurus. Netlify: [[headers]] in netlify.toml. Vercel: headers in vercel.json. Cloudflare Pages: _headers file in static/ (copied to build/). GitHub Pages: not supported — use noai meta tag.
Does Docusaurus support per-page robots meta tags?
Yes. In any MDX file, add a <head> block with <meta name="robots" content="noindex, noai, noimageai" />. For programmatic control across many pages, swizzle the Root component and conditionally set the robots value based on the current path.
Can I block AI bots on GitHub Pages with Docusaurus?
GitHub Pages doesn't support custom headers or edge functions. Use headTags for the noai meta tag as your primary defense. For hard 403 blocking, migrate to Netlify or Cloudflare Pages (both free, both support edge functions). Docusaurus deploys cleanly to both with no config changes beyond the build command and publish directory.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.