How to Block AI Bots on Zola (Rust SSG)
Zola is a fast, opinionated static site generator written in Rust. It ships as a single binary with no dependencies — no Node.js, no Ruby, no Go toolchain. Zola uses Tera templates, TOML front matter, and outputs static HTML to public/. Because there is no runtime server in production, AI bot protection combines robots.txt, noai meta tags, host-level response headers, and Edge Functions at the hosting layer.
1. robots.txt
Zola copies everything in the static/ directory to public/ during the build — unchanged, no processing. Place your robots.txt here and it will be served at the root of your deployed site.
Static robots.txt
Create static/robots.txt:
# Block all AI training crawlers
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: meta-externalagent
Disallow: /
User-agent: Bytespider
Disallow: /
# Allow legitimate search crawlers
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
User-agent: *
Allow: /Zola's build copies this verbatim to public/robots.txt. No config.toml entry is needed — unlike some SSGs that require explicit copy directives, Zola's static/ directory is always copied in full.
static/ through Tera templates. If you need a dynamic robots.txt that changes based on environment, you cannot use Zola's template engine for this file. Instead, use a build script that generates the file before zola build, or handle it at the hosting layer (e.g., Netlify's _redirects or Edge Functions).Build script approach for environment-aware robots.txt
Since Zola cannot template files in static/, use a shell script that runs before the build:
#!/bin/bash
# build.sh — generate robots.txt then build
if [ "$DEPLOY_ENV" = "production" ]; then
cat > static/robots.txt << 'EOF'
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: *
Allow: /
EOF
else
cat > static/robots.txt << 'EOF'
# Staging — block all crawlers
User-agent: *
Disallow: /
EOF
fi
zola buildSet your build command to bash build.sh on your hosting platform and set DEPLOY_ENV=production as an environment variable.
2. noai meta tags in Tera templates
The noai and noimageai meta values signal to AI crawlers that the page content and images should not be used for training. Add them to your base template so every page is covered by default.
Base template
Zola uses Tera as its template engine. Edit templates/base.html:
<!DOCTYPE html>
<html lang="{{ lang }}">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>{% block title %}{{ config.title }}{% endblock %}</title>
<!-- AI bot protection: default noai, allow override per-page -->
{% if page.extra.robots %}
<meta name="robots" content="{{ page.extra.robots }}">
{% elif section.extra.robots %}
<meta name="robots" content="{{ section.extra.robots }}">
{% elif config.extra.default_robots %}
<meta name="robots" content="{{ config.extra.default_robots }}">
{% else %}
<meta name="robots" content="noai, noimageai">
{% endif %}
{% block head %}{% endblock %}
</head>
<body>
{% block content %}{% endblock %}
</body>
</html>page.html templates receive a page variable. section.html templates receive a section variable. The base template may be rendered in either context, so check both page.extra and section.extra. Using Tera's default() filter on a non-existent variable will error — use {% if %} guards instead.Simplified with Tera default filter
If you only render the base template from page contexts (not section contexts), you can use a simpler one-liner:
<meta name="robots"
content="{{ page.extra.robots | default(value=config.extra.default_robots | default(value='noai, noimageai')) }}">default() filter uses named parameter syntax: default(value="fallback"). This is different from Jinja2's default("fallback"). Positional arguments will cause a template error.Per-page override via [extra] front matter
Zola uses TOML front matter delimited by +++. Custom fields must go in the [extra] table — they cannot be top-level:
+++
title = "About"
date = 2026-04-18
[extra]
robots = "index, follow"
+++
About page content here...This overrides the template default for that page only. The meta tag will render content="index, follow" instead of content="noai, noimageai".
robots = "index, follow" at the top level of the front matter (outside [extra]) will cause a Zola build error — Zola's front matter schema is strict, and unknown top-level keys are rejected. Always use [extra] for custom fields.Section-level override
Sections in Zola use _index.md files. To allow all blog posts to be indexed, set the override in content/blog/_index.md:
+++
title = "Blog"
sort_by = "date"
paginate_by = 10
[extra]
robots = "index, follow"
+++Access section-level extras in section.html via section.extra.robots. Individual pages within the section can still override with their own [extra] values.
3. Site-wide defaults via config.toml
Unlike Hugo's _default/baseof.html cascade or Jekyll's defaults: in _config.yml, Zola has no built-in front matter defaults system. The recommended pattern is to define defaults in config.toml under [extra] and reference them with fallbacks in your templates.
# config.toml
base_url = "https://yoursite.com"
title = "Your Site"
compile_sass = true
build_search_index = false
generate_feeds = true
[extra]
# Default robots value — used as fallback in base.html template
default_robots = "noai, noimageai"The template from Section 2 checks page.extra.robots first, then falls back to config.extra.default_robots, then to the hardcoded noai, noimageai string. This three-tier cascade gives you:
- Per-page control: Set
[extra] robotsin any page's front matter - Site-wide default: Set
default_robotsin config.toml[extra] - Ultimate fallback: Hardcoded in the template as a safety net
4. X-Robots-Tag via host headers
Zola outputs static HTML — there is no application server adding HTTP headers in production. Set X-Robots-Tag at your hosting layer.
Netlify
In netlify.toml at the project root:
[build]
command = "zola build"
publish = "public"
[build.environment]
ZOLA_VERSION = "0.19.2"
[[headers]]
for = "/*"
[headers.values]
X-Robots-Tag = "noai, noimageai"
X-Content-Type-Options = "nosniff"
X-Frame-Options = "SAMEORIGIN"ZOLA_VERSION in [build.environment] to pin the version. The publish directory is public (Zola's default output directory).Vercel
In vercel.json at the project root:
{
"buildCommand": "zola build",
"outputDirectory": "public",
"headers": [
{
"source": "/(.*)",
"headers": [
{
"key": "X-Robots-Tag",
"value": "noai, noimageai"
}
]
}
]
}vercel.json or project settings, and ensure the Zola binary is available in your build environment (install via a build script or use a Docker build).Cloudflare Pages
Create static/_headers. Zola copies all files in static/ to public/, placing this at public/_headers where Cloudflare Pages reads it:
/*
X-Robots-Tag: noai, noimageaizola build and output directory to public in your Pages project settings. The _headers file in static/ is automatically placed at the output root.GitHub Pages
GitHub Pages does not support custom HTTP headers. Use the noai meta tag approach (Section 2) for GitHub Pages deployments. For header-level control, use Netlify, Vercel, or Cloudflare Pages.
5. Hard 403 via Edge Functions
A hard 403 blocks the AI bot before it reads any content — more effective than signals that a crawler can choose to ignore. Requires server-side execution at the edge.
Netlify Edge Function
Create netlify/edge-functions/bot-block.ts:
import type { Config, Context } from "@netlify/edge-functions";
const AI_BOTS = [
"GPTBot", "ClaudeBot", "Claude-Web", "anthropic-ai",
"CCBot", "Google-Extended", "PerplexityBot",
"Applebot-Extended", "Amazonbot", "meta-externalagent",
"Bytespider", "DuckAssistBot", "YouBot",
];
export default async function handler(req: Request, _ctx: Context) {
const ua = req.headers.get("user-agent") ?? "";
const isBot = AI_BOTS.some((bot) => ua.includes(bot));
if (isBot) {
return new Response("Forbidden", {
status: 403,
headers: { "content-type": "text/plain" },
});
}
}
export const config: Config = {
path: "/*",
};Register in netlify.toml:
[build]
command = "zola build"
publish = "public"
[build.environment]
ZOLA_VERSION = "0.19.2"
[[edge_functions]]
path = "/*"
function = "bot-block"Vercel middleware
Create middleware.ts at the project root (same level as vercel.json, not inside public/ or templates/):
import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";
const AI_BOTS = [
"GPTBot", "ClaudeBot", "Claude-Web", "anthropic-ai",
"CCBot", "Google-Extended", "PerplexityBot",
"Applebot-Extended", "Amazonbot", "meta-externalagent",
"Bytespider",
];
export function middleware(request: NextRequest) {
const ua = request.headers.get("user-agent") ?? "";
const isBot = AI_BOTS.some((bot) => ua.includes(bot));
if (isBot) {
return new NextResponse("Forbidden", { status: 403 });
}
return NextResponse.next();
}
export const config = {
matcher: ["/((?!_next/static|favicon.ico).*)"],
};Cloudflare Pages middleware
Create functions/_middleware.ts at the project root (the functions/ directory is separate from Zola's source and is not processed by the build):
// functions/_middleware.ts
const AI_BOTS = [
"GPTBot", "ClaudeBot", "CCBot", "Google-Extended",
"PerplexityBot", "Applebot-Extended", "Amazonbot",
"meta-externalagent", "Bytespider",
];
export async function onRequest(context: EventContext<any, any, any>) {
const ua = context.request.headers.get("user-agent") ?? "";
const isBot = AI_BOTS.some((bot) => ua.includes(bot));
if (isBot) {
return new Response("Forbidden", { status: 403 });
}
return context.next();
}6. Full config.toml example
A complete Zola config.toml with AI bot protection defaults and standard settings:
# config.toml
base_url = "https://yoursite.com"
title = "Your Site"
description = "Your site description"
# Build settings
compile_sass = true
build_search_index = false
generate_feeds = true
feed_filenames = ["atom.xml"]
# Minification
minify_html = true
# Taxonomies (optional — common setup)
taxonomies = [
{ name = "tags", feed = true },
{ name = "categories" },
]
[markdown]
highlight_code = true
highlight_theme = "css"
[extra]
# AI bot protection default — referenced in templates/base.html
default_robots = "noai, noimageai"
# Your site-specific extras
author = "Your Name"
twitter = "@yourhandle"Pair this with the base template from Section 2. The template checks page.extra.robots → section.extra.robots → config.extra.default_robots → hardcoded fallback, giving you granular control at every level.
Zola project structure with AI protection
yoursite/
├── config.toml # [extra] default_robots
├── content/
│ ├── _index.md # Homepage
│ └── blog/
│ ├── _index.md # [extra] robots = "index, follow"
│ └── first-post.md # Inherits from section or overrides
├── static/
│ ├── robots.txt # Copied to public/robots.txt
│ └── _headers # For Cloudflare Pages
├── templates/
│ ├── base.html # noai meta tag with cascade
│ ├── index.html # Homepage template
│ ├── page.html # Content pages
│ └── section.html # Section listings
├── netlify.toml # X-Robots-Tag + Edge Function config
└── netlify/
└── edge-functions/
└── bot-block.ts # Hard 403 for AI bots7. Deployment comparison
Zola's build command is zola build and its output directory is public. Here is how each host handles AI bot protection:
| Host | robots.txt | noai meta | X-Robots-Tag | Hard 403 |
|---|---|---|---|---|
| Netlify | static/robots.txt → public/ ✓ | Tera template ✓ | netlify.toml [[headers]] ✓ | Edge Function ✓ |
| Vercel | static/robots.txt → public/ ✓ | Tera template ✓ | vercel.json headers ✓ | middleware.ts ✓ |
| Cloudflare Pages | static/robots.txt → public/ ✓ | Tera template ✓ | static/_headers → public/ ✓ | functions/_middleware.ts ✓ |
| GitHub Pages | static/robots.txt → public/ ✓ | Tera template ✓ | Not supported ✗ | Not supported ✗ |
| Fly.io | static/robots.txt → public/ ✓ | Tera template ✓ | Dockerfile static server ✓ | Custom server ✓ |
For full protection — robots.txt + meta tags + X-Robots-Tag + hard 403 — deploy to Netlify (best native support), Cloudflare Pages, or Vercel. GitHub Pages lacks header-level and edge-level bot blocking.
FAQ
How do I add robots.txt to a Zola site?
Place robots.txt in your static/ directory. Zola copies everything in static/ to the public/ output directory during build — no configuration required. The file will be available at yoursite.com/robots.txt automatically.
How do I add noai meta tags to Zola templates?
In your Tera base template (templates/base.html), use a conditional chain:
{% if page.extra.robots %}
<meta name="robots" content="{{ page.extra.robots }}">
{% elif config.extra.default_robots %}
<meta name="robots" content="{{ config.extra.default_robots }}">
{% else %}
<meta name="robots" content="noai, noimageai">
{% endif %}The page.extra object contains fields from the [extra] section in TOML front matter. Use config.extra for site-wide defaults defined in config.toml.
How do I set a site-wide robots default?
Add to config.toml:
[extra]
default_robots = "noai, noimageai"Access it in templates via config.extra.default_robots. Unlike Jekyll or Hugo, Zola has no built-in front matter defaults cascade — the config.extra approach combined with Tera template fallbacks is the standard pattern.
What is the [extra] section in Zola front matter?
Zola's front matter is strict TOML between +++ delimiters. Standard fields (title, date, description, taxonomies) go at the top level. Custom fields like robots must go inside the [extra] table — putting them at the top level will cause a build error. Access them in templates as page.extra.field_name.
How is Zola different from Hugo for AI bot blocking?
Key differences:
- Zola uses Tera templates (
{{ page.extra.robots }}); Hugo uses Go templates ({{ .Params.robots }}) - Zola has no front matter defaults cascade — define defaults in
config.toml [extra]and use Tera'sdefault()filter - Both use
static/for static files copied to the output directory - Both output to
public/by default - Zola is a single binary — no Go toolchain needed, simpler CI setup
Will blocking AI bots affect my SEO?
Blocking AI-specific crawlers (GPTBot, ClaudeBot, CCBot, Google-Extended) does not affect standard search engine indexing. Googlebot and Bingbot are separate user agents from Google-Extended and are not blocked by the configurations in this guide. Always include explicit Allow rules for Googlebot and Bingbot in your robots.txt to make your intent unambiguous.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.