How to Block AI Bots on MkDocs: Complete 2026 Guide
MkDocs generates a static site — there is no server process to add middleware to. Bot blocking happens at two layers: the content layer (robots.txt in docs/, noai meta via custom HTML overrides) and the hosting platform layer (X-Robots-Tag headers and hard 403 blocking via Netlify, Vercel, Cloudflare Pages, or nginx). GitHub Pages can only use the content layer — it doesn't support custom headers.
MkDocs is static — headers come from your host
MkDocs builds to a site/ directory of HTML, CSS, and JS files. It has no server process. The mkdocs serve dev server is not used in production. Everything you can do in MkDocs itself (robots.txt, noai meta tags) is static — for hard UA blocking and custom HTTP headers, you need your hosting platform's configuration.
Methods at a glance
| Method | What it does | Where it lives |
|---|---|---|
| docs/robots.txt | Signals bots which paths are off-limits | docs/ directory → site/ |
| noai meta via custom_dir override | AI training opt-out on every page | overrides/main.html |
| netlify.toml [[headers]] | X-Robots-Tag on all responses | Netlify hosting |
| vercel.json headers | X-Robots-Tag on all responses | Vercel hosting |
| _headers file | X-Robots-Tag on all responses | Cloudflare Pages |
| Edge Function / hook | Hard 403 on known UA patterns | Netlify/Cloudflare Edge |
1. robots.txt — docs/ directory
Place robots.txt in your docs/ directory. MkDocs copies all files from docs/ to site/ during build — your robots.txt will be at the root of the generated site with no additional configuration.
# docs/robots.txt
# This file is automatically copied to site/robots.txt by MkDocs
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: *
Allow: /Verify the file is included in your build output:
mkdocs build
ls site/robots.txt # Should exist after buildIf you use exclude_docs patterns in mkdocs.yml, ensure robots.txt is not accidentally excluded.
2. noai meta tag — custom_dir override
MkDocs supports HTML template overrides via the custom_dir setting. Create an override for the extrahead block to inject meta tags into every page's <head>.
Step 1 — configure mkdocs.yml:
# mkdocs.yml
site_name: My Documentation
theme:
name: material # or 'mkdocs' for the default theme
custom_dir: overrides # path relative to mkdocs.yml
# Optional: link to robots.txt in the extra section
extra:
robots: noai, noimageaiStep 2 — create the override template:
{# overrides/main.html — MkDocs Material theme #}
{% extends "base.html" %}
{% block extrahead %}
{{ super() }}
{# noai meta tag — added to every page head #}
<meta name="robots" content="noai, noimageai">
{# Per-page override: set 'robots' in page front matter to change value #}
{# e.g. ---\nrobots: index, follow, noai, noimageai\n--- #}
{% if page and page.meta and page.meta.robots %}
<meta name="robots" content="{{ page.meta.robots }}" id="robots-override">
{% endif %}
{% endblock %}For the default MkDocs theme (not Material):
{# overrides/main.html — default mkdocs theme #}
{% extends "base.html" %}
{% block extrahead %}
<meta name="robots" content="{{ page.meta.robots if page and page.meta and page.meta.robots else 'noai, noimageai' }}">
{% endblock %}The {{ super() }} call is important
Always call {{ super() }} in the extrahead block when using MkDocs Material. Without it, Material's built-in head content (favicons, theme color, Open Graph tags) is dropped. Put your meta tags after {{ super() }} to append rather than replace.
Per-page override via front matter:
---
# In any .md page — override robots for this page only
title: Public API Reference
robots: index, follow, noai, noimageai
---
# Public API Reference
...3. Hook — dynamic robots.txt generation
MkDocs hooks let you run Python code at build lifecycle events. Use the on_post_build hook to generate robots.txt after the site is built — useful for staging vs production environments.
# hooks.py — place next to mkdocs.yml
import os
AI_BOTS = [
"GPTBot", "ChatGPT-User", "ClaudeBot", "Claude-Web",
"anthropic-ai", "CCBot", "Google-Extended", "PerplexityBot",
"Amazonbot", "Bytespider", "YouBot", "DuckAssistBot",
"meta-externalagent", "MistralAI-Spider", "oai-searchbot",
]
def on_post_build(config, **kwargs):
"""Generate robots.txt after every build."""
site_dir = config["site_dir"]
robots_path = os.path.join(site_dir, "robots.txt")
# Check env var to allow crawling on staging
if os.environ.get("MKDOCS_ENV") == "staging":
content = "User-agent: *\nAllow: /\n"
else:
lines = []
for bot in AI_BOTS:
lines.append(f"User-agent: {bot}")
lines.append("Disallow: /")
lines.append("")
lines.append("User-agent: *")
lines.append("Allow: /")
content = "\n".join(lines)
with open(robots_path, "w") as f:
f.write(content)
print(f"Generated robots.txt at {robots_path}")# mkdocs.yml — register the hook
hooks:
- hooks.py4. X-Robots-Tag — hosting platform config
MkDocs produces static files. HTTP headers come from your hosting platform, not MkDocs. Here is the configuration for the most common hosts:
Netlify — netlify.toml
# netlify.toml (place next to mkdocs.yml — deployed with your site)
[build]
command = "mkdocs build"
publish = "site"
[[headers]]
for = "/*"
[headers.values]
X-Robots-Tag = "noai, noimageai"
X-Frame-Options = "SAMEORIGIN"
X-Content-Type-Options = "nosniff"Vercel — vercel.json
{
"buildCommand": "mkdocs build",
"outputDirectory": "site",
"headers": [
{
"source": "/(.*)",
"headers": [
{ "key": "X-Robots-Tag", "value": "noai, noimageai" }
]
}
]
}Cloudflare Pages — _headers file
# docs/_headers — copied to site/_headers during build
/*
X-Robots-Tag: noai, noimageai
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniffGitHub Pages — headers not supported
GitHub Pages does not allow custom HTTP headers. You cannot add X-Robots-Tag or set up hard 403 blocking. Use the noai meta tag (Section 2) as your only option, or migrate to Cloudflare Pages (free) for full header control.
5. Hard 403 — edge functions
For hard User-Agent blocking (returning 403 before serving any content), use an Edge Function on your hosting platform:
Netlify Edge Function
// netlify/edge-functions/block-ai-bots.js
const BLOCKED_UA = /GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai|CCBot|Google-Extended|PerplexityBot|Amazonbot|Bytespider|YouBot|DuckAssistBot|meta-externalagent|MistralAI-Spider|oai-searchbot/i;
export default async function handler(request, context) {
const ua = request.headers.get("user-agent") || "";
const path = new URL(request.url).pathname;
// Always allow robots.txt
if (path === "/robots.txt") {
return context.next();
}
if (BLOCKED_UA.test(ua)) {
return new Response("Forbidden", { status: 403 });
}
return context.next();
}
export const config = { path: "/*" };# netlify.toml — declare the edge function
[[edge_functions]]
path = "/*"
function = "block-ai-bots"
[build]
command = "mkdocs build"
publish = "site"Cloudflare Pages _middleware.ts
// functions/_middleware.ts (in your docs source, copied to site/)
const BLOCKED_UA = /GPTBot|ClaudeBot|CCBot|PerplexityBot|Amazonbot|Bytespider/i;
export async function onRequest(context) {
const { request, next } = context;
const ua = request.headers.get("user-agent") ?? "";
const path = new URL(request.url).pathname;
if (path !== "/robots.txt" && BLOCKED_UA.test(ua)) {
return new Response("Forbidden", { status: 403 });
}
return next();
}6. MkDocs Material — full mkdocs.yml
Complete mkdocs.yml for a MkDocs Material site with custom HTML overrides, hooks, and Netlify deployment.
# mkdocs.yml
site_name: My Documentation
site_url: https://docs.example.com
docs_dir: docs
site_dir: site
theme:
name: material
custom_dir: overrides # HTML template overrides
features:
- navigation.tabs
- navigation.instant
- search.highlight
# Custom Python hooks
hooks:
- hooks.py # generates robots.txt on build
# Extra CSS/JS (optional)
extra_css:
- stylesheets/extra.css
# Plugins
plugins:
- search
- minify:
minify_html: true# File structure
.
├── mkdocs.yml
├── hooks.py # on_post_build hook
├── netlify.toml # headers + edge function config
├── netlify/
│ └── edge-functions/
│ └── block-ai-bots.ts
├── overrides/
│ └── main.html # custom_dir template override
└── docs/
├── index.md
└── ... # No robots.txt needed — hook generates itFrequently asked questions
How do I add robots.txt to a MkDocs site?
Place robots.txt in the docs/ directory — MkDocs copies all files from docs/ to site/ during build. No config needed. For dynamic generation (staging vs production), use an on_post_build hook in a Python hooks file registered under hooks: in mkdocs.yml.
How do I add noai meta tags to MkDocs?
Use a custom_dir override. Set custom_dir: overrides in mkdocs.yml, create overrides/main.html extending the base theme, and add your meta tag in the {% block extrahead %} section. Always call {{ super() }} first when using MkDocs Material to preserve its built-in head content.
How do I add X-Robots-Tag to a MkDocs site?
MkDocs generates static files — headers come from your host. Netlify: [[headers]] in netlify.toml. Vercel: headers in vercel.json. Cloudflare Pages: _headers file in docs/ (copied to site/ by MkDocs). GitHub Pages: not supported — use noai meta tag instead.
Can I block AI bots on GitHub Pages with MkDocs?
GitHub Pages doesn't support custom HTTP headers, so X-Robots-Tag and hard 403 blocking are impossible. Your only option is the noai meta tag via a custom_dir HTML override. For anything stronger, migrate to Cloudflare Pages (free tier, supports _headers and _middleware.ts).
Does MkDocs Material support meta tag customisation?
Yes. Set custom_dir: overrides in mkdocs.yml, create overrides/main.html with {% extends "base.html" %} and a {% block extrahead %} section. Use {{ super() }} to preserve Material's built-in head content. Per-page overrides via front matter robots: key and Jinja conditionals.
How do I use a MkDocs hook to generate robots.txt?
Create a Python file (e.g. hooks.py) with a def on_post_build(config, **kwargs) function. Write robots.txt to config["site_dir"]. Register it in mkdocs.yml under hooks: - hooks.py. The hook runs after every mkdocs build — use environment variables to generate different content for staging vs production.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.