How to Block AI Bots on Sphinx: Complete 2026 Guide
Sphinx generates a static HTML documentation site — no server process at runtime. Used by CPython, Django, NumPy, and thousands of open-source projects, it's the de facto standard for Python documentation. Bot blocking splits across the Sphinx build layer (conf.py, templates, _static/) and the hosting platform layer.
Contents
robots.txt via html_extra_path
Sphinx does not place files from _static/ at the output root — it copies them to _build/html/_static/. To get robots.txt at the root of your built site, use html_extra_path in conf.py.
robots.txt in source/_static/ puts it at _build/html/_static/robots.txt — not at _build/html/robots.txt. Crawlers look for it at the root. Use html_extra_path instead.conf.py
# conf.py
html_extra_path = ['robots.txt']
# Alternative: point to a directory
# html_extra_path = ['_extra'] # then create _extra/robots.txtCreate robots.txt in the same directory as conf.py (usually docs/robots.txt or source/robots.txt):
robots.txt
User-agent: *
Allow: /
# Block AI training crawlers
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: YouBot
Disallow: /
Sitemap: https://docs.example.com/sitemap.xmlAfter make html or sphinx-build -b html source _build/html, verify the file exists at _build/html/robots.txt.
noai meta tag via _templates/layout.html
The most reliable way to add custom meta tags to every Sphinx page is to override the theme's base layout template. Create source/_templates/layout.html (the path must be relative to your Sphinx source directory, which must be registered in conf.py).
Register the templates directory in conf.py
# conf.py
templates_path = ['_templates']source/_templates/layout.html (sphinx_rtd_theme / most themes)
{% extends "!layout.html" %}
{% block extrahead %}
{{ super() }}
<meta name="robots" content="noai, noimageai">
{% endblock %}! prefix is critical: {% extends "!layout.html" %} tells Sphinx to use the original theme's layout as the parent. Without the !, Sphinx looks for layout.html in your _templates/ directory, causing infinite recursion. Always use the ! prefix when overriding theme templates.{{ super() }}: Always call {{ super() }} in the extrahead block to preserve the theme's existing head content (favicons, OG tags, theme stylesheets). Omitting it may remove critical theme assets.Furo theme — layout.html
{% extends "!layout.html" %}
{% block extrahead %}
{{ super() }}
<meta name="robots" content="noai, noimageai">
{% endblock %}Same syntax — Furo also uses the extrahead block name. This template override works with sphinx_rtd_theme, Furo, PyData Sphinx Theme, and the default Sphinx alabaster theme.
After adding the template
make html
# or
sphinx-build -b html source _build/html
# Verify the meta tag is present in the output
grep 'noai' _build/html/index.htmlGlobal meta via html_meta in conf.py
Sphinx supports a html_meta dict in conf.py to inject meta tags into all pages. Support varies by theme, but it works with most modern themes.
conf.py
# conf.py
html_meta = {
'robots': 'noai, noimageai',
}html_meta is processed by Sphinx core and added to the page's <head> via the theme's metatags block. It works reliably with alabaster and PyData Sphinx Theme. For sphinx_rtd_theme, the _templates/layout.html override is more reliable. Test with grep 'noai' _build/html/index.html to confirm.Per-page robots directives
For per-page control, use the .. meta:: directive in individual RST files or the :robots: front matter key in MyST Markdown files.
RST — per-page meta directive
.. meta::
:robots: noai, noimageai
My Page Title
=============
Page content here.MyST Markdown — front matter (with myst-parser)
---
myst:
html_meta:
robots: "noai, noimageai"
---
# My Page Title
Page content here.Override to allow everything on a specific page
.. meta::
:robots: index, follow
Public Page
===========html_meta (conf.py) or _templates/layout.html, then use .. meta:: directives to override on specific pages. The per-page directive replaces the global value for that page.X-Robots-Tag via hosting platform
X-Robots-Tag is an HTTP response header. Sphinx outputs static HTML files — no server to inject headers at runtime. Add the header at the hosting layer.
Netlify — netlify.toml
[build]
command = "make html"
publish = "_build/html"
[[headers]]
for = "/*"
[headers.values]
X-Robots-Tag = "noai, noimageai"Vercel — vercel.json
{
"buildCommand": "make html",
"outputDirectory": "_build/html",
"headers": [
{
"source": "/(.*)",
"headers": [
{
"key": "X-Robots-Tag",
"value": "noai, noimageai"
}
]
}
]
}Cloudflare Pages — _extra/_headers (via html_extra_path)
Cloudflare Pages reads a _headers file from the root of the published directory. Use html_extra_path to copy it there:
# conf.py
html_extra_path = ['robots.txt', '_headers']Create _headers alongside conf.py:
/*
X-Robots-Tag: noai, noimageaiRead the Docs specifics
Read the Docs (RTD) is the most common hosting platform for Sphinx documentation. It has specific capabilities and limitations for bot blocking.
.readthedocs.yaml
# .readthedocs.yaml
version: 2
build:
os: ubuntu-22.04
tools:
python: "3.12"
sphinx:
configuration: docs/conf.py
python:
install:
- requirements: docs/requirements.txtRTD capabilities
| Feature | RTD Free | RTD Business |
|---|---|---|
| noai meta tag (via template) | ✅ Yes | ✅ Yes |
| robots.txt via html_extra_path | ✅ Yes | ✅ Yes |
| Custom HTTP headers (X-Robots-Tag) | 🚫 No | ✅ Yes |
| Hard 403 UA blocking | 🚫 No | ⚠️ Limited |
| Custom domain | ✅ Yes | ✅ Yes |
_templates/layout.html for the noai meta tag and html_extra_path for robots.txt. These are your only options at the free tier. For X-Robots-Tag or hard 403 blocking, migrate to Netlify or Cloudflare Pages.RTD addons — inject meta tags without template override
RTD Business accounts can inject custom HTML via the RTD addons system in .readthedocs.yaml. For free-tier projects, the template override is the only option.
Hard 403 via edge functions
For hard UA-based blocking (403 before any content is served), use an edge function. This requires hosting on Netlify or Cloudflare Pages.
Netlify Edge Function
Create netlify/edge-functions/block-ai-bots.ts in your project root (not inside the docs/ or Sphinx source directory):
import type { Context } from '@netlify/edge-functions';
const AI_BOTS = [
'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
'Google-Extended', 'AhrefsBot', 'Bytespider',
'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
'PerplexityBot', 'YouBot',
];
export default async function handler(
request: Request,
context: Context
): Promise<Response> {
const ua = request.headers.get('user-agent') || '';
const isBot = AI_BOTS.some((bot) =>
ua.toLowerCase().includes(bot.toLowerCase())
);
if (isBot) {
return new Response('Forbidden', { status: 403 });
}
return context.next();
}
export const config = { path: '/*' };Register it in netlify.toml:
[build]
command = "make html"
publish = "_build/html"
[[edge_functions]]
path = "/*"
function = "block-ai-bots"
[[headers]]
for = "/*"
[headers.values]
X-Robots-Tag = "noai, noimageai"Cloudflare Pages Functions
Create functions/_middleware.ts in your project root:
import type { PagesFunction } from '@cloudflare/workers-types';
const AI_BOTS = [
'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
'Google-Extended', 'AhrefsBot', 'Bytespider',
'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
'PerplexityBot', 'YouBot',
];
export const onRequest: PagesFunction = async (context) => {
const ua = context.request.headers.get('user-agent') || '';
const isBot = AI_BOTS.some((bot) =>
ua.toLowerCase().includes(bot.toLowerCase())
);
if (isBot) {
return new Response('Forbidden', { status: 403 });
}
return context.next();
};Cloudflare Pages build config (dashboard)
Build command: make html
Build output directory: _build/htmlDeployment quick-reference
| Platform | Build command | Publish dir | Custom headers | Edge functions |
|---|---|---|---|---|
| Read the Docs (free) | Auto (RTD builds) | Auto | 🚫 No | 🚫 No |
| Read the Docs (Business) | Auto (RTD builds) | Auto | ✅ Yes | ⚠️ Limited |
| Netlify | make html | _build/html | ✅ netlify.toml | ✅ netlify/edge-functions/ |
| Vercel | make html | _build/html | ✅ vercel.json | ⚠️ Next.js required |
| Cloudflare Pages | make html | _build/html | ✅ _headers via html_extra_path | ✅ functions/_middleware.ts |
| GitHub Pages | CI: make html | _build/html | 🚫 No | 🚫 No |
Full conf.py example
# conf.py
import os
import sys
project = 'My Project'
author = 'My Team'
release = '1.0.0'
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.viewcode',
'myst_parser', # if using Markdown
]
templates_path = ['_templates']
html_extra_path = ['robots.txt'] # copied to _build/html/robots.txt
html_theme = 'furo' # or 'sphinx_rtd_theme', 'pydata_sphinx_theme'
# Global meta tags (works with most themes)
html_meta = {
'robots': 'noai, noimageai',
}
# Theme options (theme-specific)
html_theme_options = {}
# Static files (CSS, JavaScript, images) — goes to _build/html/_static/
html_static_path = ['_static']Makefile (standard)
# Minimal Makefile for Sphinx documentation
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = _build
.PHONY: help Makefile
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
html: Makefile
@$(SPHINXBUILD) -b html "$(SOURCEDIR)" "$(BUILDDIR)/html"
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
clean:
rm -rf $(BUILDDIR)/*FAQ
How do I add robots.txt to a Sphinx site?
Use html_extra_path in conf.py: html_extra_path = ['robots.txt']. Create robots.txt alongside conf.py. This copies it to _build/html/robots.txt — the root of your deployed site. Do not place it in _static/ — that copies to _build/html/_static/robots.txt, which crawlers will not find.
How do I add the noai meta tag to every Sphinx page?
Create source/_templates/layout.html with {% extends "!layout.html" %} and a extrahead block containing <meta name="robots" content="noai, noimageai">. Always call {{ super() }} in the block to preserve theme assets. Register with templates_path = ['_templates'] in conf.py.
Can I add the noai meta tag without overriding the theme layout?
Yes — use html_meta = {"robots": "noai, noimageai"} in conf.py. Works with most modern themes. Test by checking the built HTML: grep noai _build/html/index.html.
How do I add X-Robots-Tag on a Sphinx site hosted on Read the Docs?
RTD free tier does not support custom HTTP headers. Use the noai meta tag (via template override or html_meta) as your primary protection. For X-Robots-Tag, upgrade to RTD Business or migrate to Netlify, Vercel, or Cloudflare Pages.
How do I block AI bots with hard 403 on a Sphinx site?
Use a Netlify Edge Function or Cloudflare Pages functions/_middleware.ts that checks User-Agent and returns 403 for known AI crawlers. Not available on Read the Docs or GitHub Pages.
Does the Sphinx html_meta conf.py option add noai tags?
Yes, but with caveats. html_meta works reliably with alabaster and PyData Sphinx Theme. For sphinx_rtd_theme, the _templates/layout.html override is more reliable. Always verify with grep noai _build/html/index.html after building.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.