Skip to content
MkDocs · Python · Static Site·9 min read

How to Block AI Bots on MkDocs: Complete 2026 Guide

MkDocs generates a static site — there is no server process to add middleware to. Bot blocking happens at two layers: the content layer (robots.txt in docs/, noai meta via custom HTML overrides) and the hosting platform layer (X-Robots-Tag headers and hard 403 blocking via Netlify, Vercel, Cloudflare Pages, or nginx). GitHub Pages can only use the content layer — it doesn't support custom headers.

MkDocs is static — headers come from your host

MkDocs builds to a site/ directory of HTML, CSS, and JS files. It has no server process. The mkdocs serve dev server is not used in production. Everything you can do in MkDocs itself (robots.txt, noai meta tags) is static — for hard UA blocking and custom HTTP headers, you need your hosting platform's configuration.

Methods at a glance

MethodWhat it doesWhere it lives
docs/robots.txtSignals bots which paths are off-limitsdocs/ directory → site/
noai meta via custom_dir overrideAI training opt-out on every pageoverrides/main.html
netlify.toml [[headers]]X-Robots-Tag on all responsesNetlify hosting
vercel.json headersX-Robots-Tag on all responsesVercel hosting
_headers fileX-Robots-Tag on all responsesCloudflare Pages
Edge Function / hookHard 403 on known UA patternsNetlify/Cloudflare Edge

1. robots.txt — docs/ directory

Place robots.txt in your docs/ directory. MkDocs copies all files from docs/ to site/ during build — your robots.txt will be at the root of the generated site with no additional configuration.

# docs/robots.txt
# This file is automatically copied to site/robots.txt by MkDocs

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: *
Allow: /

Verify the file is included in your build output:

mkdocs build
ls site/robots.txt   # Should exist after build

If you use exclude_docs patterns in mkdocs.yml, ensure robots.txt is not accidentally excluded.

2. noai meta tag — custom_dir override

MkDocs supports HTML template overrides via the custom_dir setting. Create an override for the extrahead block to inject meta tags into every page's <head>.

Step 1 — configure mkdocs.yml:

# mkdocs.yml
site_name: My Documentation
theme:
  name: material          # or 'mkdocs' for the default theme
  custom_dir: overrides   # path relative to mkdocs.yml

# Optional: link to robots.txt in the extra section
extra:
  robots: noai, noimageai

Step 2 — create the override template:

{# overrides/main.html — MkDocs Material theme #}
{% extends "base.html" %}

{% block extrahead %}
  {{ super() }}
  {# noai meta tag — added to every page head #}
  <meta name="robots" content="noai, noimageai">
  {# Per-page override: set 'robots' in page front matter to change value #}
  {# e.g. ---\nrobots: index, follow, noai, noimageai\n--- #}
  {% if page and page.meta and page.meta.robots %}
  <meta name="robots" content="{{ page.meta.robots }}" id="robots-override">
  {% endif %}
{% endblock %}

For the default MkDocs theme (not Material):

{# overrides/main.html — default mkdocs theme #}
{% extends "base.html" %}

{% block extrahead %}
  <meta name="robots" content="{{ page.meta.robots if page and page.meta and page.meta.robots else 'noai, noimageai' }}">
{% endblock %}

The {{ super() }} call is important

Always call {{ super() }} in the extrahead block when using MkDocs Material. Without it, Material's built-in head content (favicons, theme color, Open Graph tags) is dropped. Put your meta tags after {{ super() }} to append rather than replace.

Per-page override via front matter:

---
# In any .md page — override robots for this page only
title: Public API Reference
robots: index, follow, noai, noimageai
---

# Public API Reference
...

3. Hook — dynamic robots.txt generation

MkDocs hooks let you run Python code at build lifecycle events. Use the on_post_build hook to generate robots.txt after the site is built — useful for staging vs production environments.

# hooks.py — place next to mkdocs.yml
import os

AI_BOTS = [
    "GPTBot", "ChatGPT-User", "ClaudeBot", "Claude-Web",
    "anthropic-ai", "CCBot", "Google-Extended", "PerplexityBot",
    "Amazonbot", "Bytespider", "YouBot", "DuckAssistBot",
    "meta-externalagent", "MistralAI-Spider", "oai-searchbot",
]

def on_post_build(config, **kwargs):
    """Generate robots.txt after every build."""
    site_dir = config["site_dir"]
    robots_path = os.path.join(site_dir, "robots.txt")

    # Check env var to allow crawling on staging
    if os.environ.get("MKDOCS_ENV") == "staging":
        content = "User-agent: *\nAllow: /\n"
    else:
        lines = []
        for bot in AI_BOTS:
            lines.append(f"User-agent: {bot}")
            lines.append("Disallow: /")
            lines.append("")
        lines.append("User-agent: *")
        lines.append("Allow: /")
        content = "\n".join(lines)

    with open(robots_path, "w") as f:
        f.write(content)
    print(f"Generated robots.txt at {robots_path}")
# mkdocs.yml — register the hook
hooks:
  - hooks.py

4. X-Robots-Tag — hosting platform config

MkDocs produces static files. HTTP headers come from your hosting platform, not MkDocs. Here is the configuration for the most common hosts:

Netlify — netlify.toml

# netlify.toml (place next to mkdocs.yml — deployed with your site)
[build]
  command = "mkdocs build"
  publish = "site"

[[headers]]
  for = "/*"
  [headers.values]
    X-Robots-Tag = "noai, noimageai"
    X-Frame-Options = "SAMEORIGIN"
    X-Content-Type-Options = "nosniff"

Vercel — vercel.json

{
  "buildCommand": "mkdocs build",
  "outputDirectory": "site",
  "headers": [
    {
      "source": "/(.*)",
      "headers": [
        { "key": "X-Robots-Tag", "value": "noai, noimageai" }
      ]
    }
  ]
}

Cloudflare Pages — _headers file

# docs/_headers — copied to site/_headers during build
/*
  X-Robots-Tag: noai, noimageai
  X-Frame-Options: SAMEORIGIN
  X-Content-Type-Options: nosniff

GitHub Pages — headers not supported

GitHub Pages does not allow custom HTTP headers. You cannot add X-Robots-Tag or set up hard 403 blocking. Use the noai meta tag (Section 2) as your only option, or migrate to Cloudflare Pages (free) for full header control.

5. Hard 403 — edge functions

For hard User-Agent blocking (returning 403 before serving any content), use an Edge Function on your hosting platform:

Netlify Edge Function

// netlify/edge-functions/block-ai-bots.js
const BLOCKED_UA = /GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai|CCBot|Google-Extended|PerplexityBot|Amazonbot|Bytespider|YouBot|DuckAssistBot|meta-externalagent|MistralAI-Spider|oai-searchbot/i;

export default async function handler(request, context) {
  const ua = request.headers.get("user-agent") || "";
  const path = new URL(request.url).pathname;

  // Always allow robots.txt
  if (path === "/robots.txt") {
    return context.next();
  }

  if (BLOCKED_UA.test(ua)) {
    return new Response("Forbidden", { status: 403 });
  }

  return context.next();
}

export const config = { path: "/*" };
# netlify.toml — declare the edge function
[[edge_functions]]
  path = "/*"
  function = "block-ai-bots"

[build]
  command = "mkdocs build"
  publish = "site"

Cloudflare Pages _middleware.ts

// functions/_middleware.ts (in your docs source, copied to site/)
const BLOCKED_UA = /GPTBot|ClaudeBot|CCBot|PerplexityBot|Amazonbot|Bytespider/i;

export async function onRequest(context) {
  const { request, next } = context;
  const ua = request.headers.get("user-agent") ?? "";
  const path = new URL(request.url).pathname;

  if (path !== "/robots.txt" && BLOCKED_UA.test(ua)) {
    return new Response("Forbidden", { status: 403 });
  }

  return next();
}

6. MkDocs Material — full mkdocs.yml

Complete mkdocs.yml for a MkDocs Material site with custom HTML overrides, hooks, and Netlify deployment.

# mkdocs.yml
site_name: My Documentation
site_url: https://docs.example.com
docs_dir: docs
site_dir: site

theme:
  name: material
  custom_dir: overrides    # HTML template overrides
  features:
    - navigation.tabs
    - navigation.instant
    - search.highlight

# Custom Python hooks
hooks:
  - hooks.py               # generates robots.txt on build

# Extra CSS/JS (optional)
extra_css:
  - stylesheets/extra.css

# Plugins
plugins:
  - search
  - minify:
      minify_html: true
# File structure
.
├── mkdocs.yml
├── hooks.py               # on_post_build hook
├── netlify.toml           # headers + edge function config
├── netlify/
│   └── edge-functions/
│       └── block-ai-bots.ts
├── overrides/
│   └── main.html          # custom_dir template override
└── docs/
    ├── index.md
    └── ...                # No robots.txt needed — hook generates it

Frequently asked questions

How do I add robots.txt to a MkDocs site?

Place robots.txt in the docs/ directory — MkDocs copies all files from docs/ to site/ during build. No config needed. For dynamic generation (staging vs production), use an on_post_build hook in a Python hooks file registered under hooks: in mkdocs.yml.

How do I add noai meta tags to MkDocs?

Use a custom_dir override. Set custom_dir: overrides in mkdocs.yml, create overrides/main.html extending the base theme, and add your meta tag in the {% block extrahead %} section. Always call {{ super() }} first when using MkDocs Material to preserve its built-in head content.

How do I add X-Robots-Tag to a MkDocs site?

MkDocs generates static files — headers come from your host. Netlify: [[headers]] in netlify.toml. Vercel: headers in vercel.json. Cloudflare Pages: _headers file in docs/ (copied to site/ by MkDocs). GitHub Pages: not supported — use noai meta tag instead.

Can I block AI bots on GitHub Pages with MkDocs?

GitHub Pages doesn't support custom HTTP headers, so X-Robots-Tag and hard 403 blocking are impossible. Your only option is the noai meta tag via a custom_dir HTML override. For anything stronger, migrate to Cloudflare Pages (free tier, supports _headers and _middleware.ts).

Does MkDocs Material support meta tag customisation?

Yes. Set custom_dir: overrides in mkdocs.yml, create overrides/main.html with {% extends "base.html" %} and a {% block extrahead %} section. Use {{ super() }} to preserve Material's built-in head content. Per-page overrides via front matter robots: key and Jinja conditionals.

How do I use a MkDocs hook to generate robots.txt?

Create a Python file (e.g. hooks.py) with a def on_post_build(config, **kwargs) function. Write robots.txt to config["site_dir"]. Register it in mkdocs.yml under hooks: - hooks.py. The hook runs after every mkdocs build — use environment variables to generate different content for staging vs production.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.