How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

Sphinx · Python · Documentation9 min read

How to Block AI Bots on Sphinx: Complete 2026 Guide

Sphinx generates a static HTML documentation site — no server process at runtime. Used by CPython, Django, NumPy, and thousands of open-source projects, it's the de facto standard for Python documentation. Bot blocking splits across the Sphinx build layer (conf.py, templates, _static/) and the hosting platform layer.

robots.txt via html_extra_path
noai meta tag via _templates/layout.html
Global meta via html_meta in conf.py
Per-page robots directives
X-Robots-Tag via hosting platform
Read the Docs specifics
Hard 403 via edge functions
Deployment quick-reference
FAQ

robots.txt via html_extra_path

Sphinx does not place files from _static/ at the output root — it copies them to _build/html/_static/. To get robots.txt at the root of your built site, use html_extra_path in conf.py.

Common mistake: Placing robots.txt in source/_static/ puts it at _build/html/_static/robots.txt — not at _build/html/robots.txt. Crawlers look for it at the root. Use html_extra_path instead.

conf.py

# conf.py
html_extra_path = ['robots.txt']

# Alternative: point to a directory
# html_extra_path = ['_extra']  # then create _extra/robots.txt

Create robots.txt in the same directory as conf.py (usually docs/robots.txt or source/robots.txt):

robots.txt

User-agent: *
Allow: /

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /

Sitemap: https://docs.example.com/sitemap.xml

After make html or sphinx-build -b html source _build/html, verify the file exists at _build/html/robots.txt.

noai meta tag via _templates/layout.html

The most reliable way to add custom meta tags to every Sphinx page is to override the theme's base layout template. Create source/_templates/layout.html (the path must be relative to your Sphinx source directory, which must be registered in conf.py).

Register the templates directory in conf.py

# conf.py
templates_path = ['_templates']

source/_templates/layout.html (sphinx_rtd_theme / most themes)

{% extends "!layout.html" %}

{% block extrahead %}
  {{ super() }}
  <meta name="robots" content="noai, noimageai">
{% endblock %}

The ! prefix is critical: {% extends "!layout.html" %} tells Sphinx to use the original theme's layout as the parent. Without the !, Sphinx looks for layout.html in your _templates/ directory, causing infinite recursion. Always use the ! prefix when overriding theme templates.

Call {{ super() }}: Always call {{ super() }} in the extrahead block to preserve the theme's existing head content (favicons, OG tags, theme stylesheets). Omitting it may remove critical theme assets.

Furo theme — layout.html

{% extends "!layout.html" %}

{% block extrahead %}
  {{ super() }}
  <meta name="robots" content="noai, noimageai">
{% endblock %}

Same syntax — Furo also uses the extrahead block name. This template override works with sphinx_rtd_theme, Furo, PyData Sphinx Theme, and the default Sphinx alabaster theme.

After adding the template

make html
# or
sphinx-build -b html source _build/html

# Verify the meta tag is present in the output
grep 'noai' _build/html/index.html

Global meta via html_meta in conf.py

Sphinx supports a html_meta dict in conf.py to inject meta tags into all pages. Support varies by theme, but it works with most modern themes.

conf.py

# conf.py
html_meta = {
    'robots': 'noai, noimageai',
}

Theme compatibility: html_meta is processed by Sphinx core and added to the page's <head> via the theme's metatags block. It works reliably with alabaster and PyData Sphinx Theme. For sphinx_rtd_theme, the _templates/layout.html override is more reliable. Test with grep 'noai' _build/html/index.html to confirm.

Per-page robots directives

For per-page control, use the .. meta:: directive in individual RST files or the :robots: front matter key in MyST Markdown files.

RST — per-page meta directive

.. meta::
   :robots: noai, noimageai

My Page Title
=============

Page content here.

MyST Markdown — front matter (with myst-parser)

---
myst:
  html_meta:
    robots: "noai, noimageai"
---

# My Page Title

Page content here.

Override to allow everything on a specific page

.. meta::
   :robots: index, follow

Public Page
===========

Combining global + per-page: Set the global default in html_meta (conf.py) or _templates/layout.html, then use .. meta:: directives to override on specific pages. The per-page directive replaces the global value for that page.

X-Robots-Tag via hosting platform

X-Robots-Tag is an HTTP response header. Sphinx outputs static HTML files — no server to inject headers at runtime. Add the header at the hosting layer.

Netlify — netlify.toml

[build]
  command = "make html"
  publish = "_build/html"

[[headers]]
  for = "/*"
  [headers.values]
    X-Robots-Tag = "noai, noimageai"

Vercel — vercel.json

{
  "buildCommand": "make html",
  "outputDirectory": "_build/html",
  "headers": [
    {
      "source": "/(.*)",
      "headers": [
        {
          "key": "X-Robots-Tag",
          "value": "noai, noimageai"
        }
      ]
    }
  ]
}

Cloudflare Pages — _extra/_headers (via html_extra_path)

Cloudflare Pages reads a _headers file from the root of the published directory. Use html_extra_path to copy it there:

# conf.py
html_extra_path = ['robots.txt', '_headers']

Create _headers alongside conf.py:

/*
  X-Robots-Tag: noai, noimageai

Read the Docs specifics

Read the Docs (RTD) is the most common hosting platform for Sphinx documentation. It has specific capabilities and limitations for bot blocking.

.readthedocs.yaml

# .readthedocs.yaml
version: 2

build:
  os: ubuntu-22.04
  tools:
    python: "3.12"

sphinx:
  configuration: docs/conf.py

python:
  install:
    - requirements: docs/requirements.txt

RTD capabilities

Feature	RTD Free	RTD Business
noai meta tag (via template)	✅ Yes	✅ Yes
robots.txt via html_extra_path	✅ Yes	✅ Yes
Custom HTTP headers (X-Robots-Tag)	🚫 No	✅ Yes
Hard 403 UA blocking	🚫 No	⚠️ Limited
Custom domain	✅ Yes	✅ Yes

RTD recommendation: For free-tier RTD hosting, use _templates/layout.html for the noai meta tag and html_extra_path for robots.txt. These are your only options at the free tier. For X-Robots-Tag or hard 403 blocking, migrate to Netlify or Cloudflare Pages.

RTD addons — inject meta tags without template override

RTD Business accounts can inject custom HTML via the RTD addons system in .readthedocs.yaml. For free-tier projects, the template override is the only option.

Hard 403 via edge functions

For hard UA-based blocking (403 before any content is served), use an edge function. This requires hosting on Netlify or Cloudflare Pages.

Netlify Edge Function

Create netlify/edge-functions/block-ai-bots.ts in your project root (not inside the docs/ or Sphinx source directory):

import type { Context } from '@netlify/edge-functions';

const AI_BOTS = [
  'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
  'Google-Extended', 'AhrefsBot', 'Bytespider',
  'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
  'PerplexityBot', 'YouBot',
];

export default async function handler(
  request: Request,
  context: Context
): Promise<Response> {
  const ua = request.headers.get('user-agent') || '';
  const isBot = AI_BOTS.some((bot) =>
    ua.toLowerCase().includes(bot.toLowerCase())
  );
  if (isBot) {
    return new Response('Forbidden', { status: 403 });
  }
  return context.next();
}

export const config = { path: '/*' };

[build]
  command = "make html"
  publish = "_build/html"

[[edge_functions]]
  path = "/*"
  function = "block-ai-bots"

[[headers]]
  for = "/*"
  [headers.values]
    X-Robots-Tag = "noai, noimageai"

Cloudflare Pages Functions

Create functions/_middleware.ts in your project root:

import type { PagesFunction } from '@cloudflare/workers-types';

const AI_BOTS = [
  'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
  'Google-Extended', 'AhrefsBot', 'Bytespider',
  'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
  'PerplexityBot', 'YouBot',
];

export const onRequest: PagesFunction = async (context) => {
  const ua = context.request.headers.get('user-agent') || '';
  const isBot = AI_BOTS.some((bot) =>
    ua.toLowerCase().includes(bot.toLowerCase())
  );
  if (isBot) {
    return new Response('Forbidden', { status: 403 });
  }
  return context.next();
};

Cloudflare Pages build config (dashboard)

Build command: make html
Build output directory: _build/html

Deployment quick-reference

Platform	Build command	Publish dir	Custom headers	Edge functions
Read the Docs (free)	Auto (RTD builds)	Auto	🚫 No	🚫 No
Read the Docs (Business)	Auto (RTD builds)	Auto	✅ Yes	⚠️ Limited
Netlify	`make html`	`_build/html`	✅ netlify.toml	✅ netlify/edge-functions/
Vercel	`make html`	`_build/html`	✅ vercel.json	⚠️ Next.js required
Cloudflare Pages	`make html`	`_build/html`	✅ _headers via html_extra_path	✅ functions/_middleware.ts
GitHub Pages	CI: `make html`	`_build/html`	🚫 No	🚫 No

Full conf.py example

# conf.py
import os
import sys

project = 'My Project'
author = 'My Team'
release = '1.0.0'

extensions = [
    'sphinx.ext.autodoc',
    'sphinx.ext.viewcode',
    'myst_parser',          # if using Markdown
]

templates_path = ['_templates']
html_extra_path = ['robots.txt']   # copied to _build/html/robots.txt

html_theme = 'furo'                # or 'sphinx_rtd_theme', 'pydata_sphinx_theme'

# Global meta tags (works with most themes)
html_meta = {
    'robots': 'noai, noimageai',
}

# Theme options (theme-specific)
html_theme_options = {}

# Static files (CSS, JavaScript, images) — goes to _build/html/_static/
html_static_path = ['_static']

Makefile (standard)

# Minimal Makefile for Sphinx documentation

SPHINXOPTS    ?=
SPHINXBUILD   ?= sphinx-build
SOURCEDIR     = source
BUILDDIR      = _build

.PHONY: help Makefile

%: Makefile
	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

html: Makefile
	@$(SPHINXBUILD) -b html "$(SOURCEDIR)" "$(BUILDDIR)/html"
	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."

clean:
	rm -rf $(BUILDDIR)/*

FAQ

How do I add robots.txt to a Sphinx site?

Use html_extra_path in conf.py: html_extra_path = ['robots.txt']. Create robots.txt alongside conf.py. This copies it to _build/html/robots.txt — the root of your deployed site. Do not place it in _static/ — that copies to _build/html/_static/robots.txt, which crawlers will not find.

How do I add the noai meta tag to every Sphinx page?

Create source/_templates/layout.html with {% extends "!layout.html" %} and a extrahead block containing <meta name="robots" content="noai, noimageai">. Always call {{ super() }} in the block to preserve theme assets. Register with templates_path = ['_templates'] in conf.py.

Can I add the noai meta tag without overriding the theme layout?

Yes — use html_meta = {"robots": "noai, noimageai"} in conf.py. Works with most modern themes. Test by checking the built HTML: grep noai _build/html/index.html.

How do I add X-Robots-Tag on a Sphinx site hosted on Read the Docs?

RTD free tier does not support custom HTTP headers. Use the noai meta tag (via template override or html_meta) as your primary protection. For X-Robots-Tag, upgrade to RTD Business or migrate to Netlify, Vercel, or Cloudflare Pages.

How do I block AI bots with hard 403 on a Sphinx site?

Use a Netlify Edge Function or Cloudflare Pages functions/_middleware.ts that checks User-Agent and returns 403 for known AI crawlers. Not available on Read the Docs or GitHub Pages.

Does the Sphinx html_meta conf.py option add noai tags?

Yes, but with caveats. html_meta works reliably with alabaster and PyData Sphinx Theme. For sphinx_rtd_theme, the _templates/layout.html override is more reliable. Always verify with grep noai _build/html/index.html after building.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

How to Block AI Bots on Sphinx: Complete 2026 Guide

Contents

robots.txt via html_extra_path

conf.py

robots.txt

noai meta tag via _templates/layout.html

Register the templates directory in conf.py

source/_templates/layout.html (sphinx_rtd_theme / most themes)

Furo theme — layout.html

After adding the template

Global meta via html_meta in conf.py

conf.py

Per-page robots directives

RST — per-page meta directive

MyST Markdown — front matter (with myst-parser)

Override to allow everything on a specific page

X-Robots-Tag via hosting platform

Netlify — netlify.toml

Vercel — vercel.json

Cloudflare Pages — _extra/_headers (via html_extra_path)

Read the Docs specifics

.readthedocs.yaml

RTD capabilities

RTD addons — inject meta tags without template override

Hard 403 via edge functions

Netlify Edge Function

Cloudflare Pages Functions

Cloudflare Pages build config (dashboard)

Deployment quick-reference

Full conf.py example

Makefile (standard)

FAQ

How do I add robots.txt to a Sphinx site?

How do I add the noai meta tag to every Sphinx page?

Can I add the noai meta tag without overriding the theme layout?

How do I add X-Robots-Tag on a Sphinx site hosted on Read the Docs?

How do I block AI bots with hard 403 on a Sphinx site?

Does the Sphinx html_meta conf.py option add noai tags?