How to Block AI Bots on Bridgetown (Ruby SSG)
Bridgetown is a fast, modern static site generator built on Ruby — often described as “Jekyll evolved.” It outputs static HTML to _site/ and supports ERB, Liquid, Serbea, and Markdown. Because Bridgetown produces static files with no runtime server, AI bot protection combines robots.txt, noai meta tags, host-level response headers, and Edge Functions at the hosting layer.
1. robots.txt
Bridgetown copies every file in src/ to _site/ during the build. Place robots.txt directly in src/ and it will appear at the root of your deployed site — no configuration required.
Static robots.txt
Create src/robots.txt:
# Block all AI training crawlers
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: meta-externalagent
Disallow: /
User-agent: Bytespider
Disallow: /
# Allow legitimate search crawlers
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
User-agent: *
Allow: /Bridgetown's build copies this verbatim to _site/robots.txt. No bridgetown.config.yml entry is needed — unlike Lume's site.copy(), Bridgetown copies all files in src/ automatically.
Dynamic robots.txt (environment-aware)
To serve a different robots.txt in staging vs production, create a template file. Bridgetown renders any file with a recognised template extension:
# src/robots.txt.erb
---
permalink: /robots.txt
---
<% if Bridgetown.env.production? %>
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: *
Allow: /
<% else %>
# Staging — block all crawlers
User-agent: *
Disallow: /
<% end %>The permalink: /robots.txt front matter tells Bridgetown to output this file as _site/robots.txt rather than _site/robots.txt/index.html. Set BRIDGETOWN_ENV=production in your host environment variables to enable production mode.
src/robots.txt (static) and src/robots.txt.erb (template) exist, Bridgetown will raise a conflict error during build. Use one or the other.2. noai meta tags in layouts
The noai and noimageai meta values signal to AI crawlers that the page content and images should not be used for training. Add them to your base layout so every page is covered by default.
ERB layout (default in Bridgetown)
Bridgetown's default layout engine is ERB. Edit src/_layouts/default.erb:
<!DOCTYPE html>
<html lang="<%= site.metadata.locale || 'en' %>">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title><%= resource.data.title || site.metadata.title %></title>
<!-- AI bot protection: default noai, allow override per-page -->
<meta name="robots" content="<%= resource.data.robots || 'noai, noimageai' %>">
<%= liquid_render "head" %>
</head>
<body>
<%= liquid_render "navbar" %>
<%= yield %>
<%= liquid_render "footer" %>
</body>
</html>resource.data.field_name, not {{ page.field_name }}. The resource object is Bridgetown's equivalent of Jekyll's page.Liquid layout
If you use a Liquid layout (src/_layouts/default.liquid or .html):
<meta name="robots"
content="{{ resource.data.robots | default: 'noai, noimageai' }}">resource.data.robots, not page.robots. The resource object is always the correct accessor in Bridgetown templates regardless of engine.Serbea layout
Bridgetown also supports Serbea (.serb), a Ruby-native template format:
<meta name="robots"
content="{{ resource.data.robots || 'noai, noimageai' }}">Per-page override
To allow a specific page to be indexed normally, set in its front matter:
---
title: About
robots: "index, follow"
---This overrides the layout default for that page only. The meta tag will render content="index, follow" instead of content="noai, noimageai".
3. front_matter_defaults in bridgetown.config.yml
Rather than relying on a layout fallback, you can declare a site-wide default for the robots field in bridgetown.config.yml. This ensures the value is always present in resource.data even if the layout uses a strict resource.data.robots accessor with no fallback.
# bridgetown.config.yml
url: "https://yoursite.com"
title: "Your Site"
front_matter_defaults:
- scope:
path: "" # applies to all content in src/
values:
robots: "noai, noimageai"
# Allow the blog index and individual posts to be indexed
- scope:
path: "_posts"
values:
robots: "index, follow"With this config, every page and post gets robots: noai, noimageai by default, and posts under _posts/ get index, follow instead. Individual pages can still override with their own front matter.
path: "" matches all content. path: "_posts" matches only files inside src/_posts/. More specific scopes take precedence over broader ones. Front matter in individual files always wins.Using a Bridgetown plugin (Builder API)
Bridgetown's Builder API lets you inject data into every resource programmatically. Create plugins/builders/robots_defaults.rb:
# plugins/builders/robots_defaults.rb
class RobotsDefaults < SiteBuilder
def build
hook :resources, :pre_render do |resource|
# Set default only if not already set in front matter
resource.data.robots ||= "noai, noimageai"
end
end
endBridgetown auto-loads builders in plugins/builders/. This approach works for any resource type (pages, posts, collections) and does not require editing bridgetown.config.yml.
4. X-Robots-Tag via host headers
Bridgetown outputs static HTML — there is no application server adding HTTP headers in production. Set X-Robots-Tag at your hosting layer.
Netlify
In netlify.toml at the project root (not inside _site/):
[build]
command = "bin/bridgetown build"
publish = "_site"
[[headers]]
for = "/*"
[headers.values]
X-Robots-Tag = "noai, noimageai"
X-Content-Type-Options = "nosniff"
X-Frame-Options = "SAMEORIGIN"Vercel
In vercel.json at the project root. Set the build command to bin/bridgetown build and output directory to _site:
{
"buildCommand": "bin/bridgetown build",
"outputDirectory": "_site",
"headers": [
{
"source": "/(.*)",
"headers": [
{
"key": "X-Robots-Tag",
"value": "noai, noimageai"
}
]
}
]
}Cloudflare Pages
Create src/_headers (note the leading underscore). Bridgetown copies all files from src/ to _site/, so this will be placed at _site/_headers:
/*
X-Robots-Tag: noai, noimageai_headers file in Bridgetown — it's copied automatically because everything in src/ is output. In Lume, files starting with _ are skipped unless you call site.copy("_headers").GitHub Pages
GitHub Pages does not support custom HTTP headers. The best you can do is the noai meta tag approach (Section 2). For header-level control, use a supported host (Netlify, Vercel, Cloudflare Pages).
5. Hard 403 via Edge Functions
A hard 403 blocks the AI bot before it reads any content — more effective than signals that a crawler can ignore. Requires server-side execution at the edge.
Netlify Edge Function
Create netlify/edge-functions/bot-block.ts (outside src/):
import type { Config, Context } from "@netlify/edge-functions";
const AI_BOTS = [
"GPTBot", "ClaudeBot", "Claude-Web", "anthropic-ai",
"CCBot", "Google-Extended", "PerplexityBot",
"Applebot-Extended", "Amazonbot", "meta-externalagent",
"Bytespider", "DuckAssistBot", "YouBot",
];
export default async function handler(req: Request, _ctx: Context) {
const ua = req.headers.get("user-agent") ?? "";
const isBot = AI_BOTS.some((bot) => ua.includes(bot));
if (isBot) {
return new Response("Forbidden", {
status: 403,
headers: { "content-type": "text/plain" },
});
}
}
export const config: Config = {
path: "/*",
};Register in netlify.toml:
[build]
command = "bin/bridgetown build"
publish = "_site"
[[edge_functions]]
path = "/*"
function = "bot-block"Vercel middleware
Create middleware.ts at the project root (same level as vercel.json, not inside _site/):
import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";
const AI_BOTS = [
"GPTBot", "ClaudeBot", "Claude-Web", "anthropic-ai",
"CCBot", "Google-Extended", "PerplexityBot",
"Applebot-Extended", "Amazonbot", "meta-externalagent",
"Bytespider",
];
export function middleware(request: NextRequest) {
const ua = request.headers.get("user-agent") ?? "";
const isBot = AI_BOTS.some((bot) => ua.includes(bot));
if (isBot) {
return new NextResponse("Forbidden", { status: 403 });
}
return NextResponse.next();
}
export const config = {
matcher: ["/((?!_next/static|favicon.ico).*)"],
};middleware.ts file must be at the project root — not inside _site/ or src/.Cloudflare Pages middleware
Create functions/_middleware.ts at the project root (the functions/ directory is outside src/ and is never compiled into _site/):
// functions/_middleware.ts
const AI_BOTS = [
"GPTBot", "ClaudeBot", "CCBot", "Google-Extended",
"PerplexityBot", "Applebot-Extended", "Amazonbot",
"meta-externalagent", "Bytespider",
];
export async function onRequest(context: EventContext<any, any, any>) {
const ua = context.request.headers.get("user-agent") ?? "";
const isBot = AI_BOTS.some((bot) => ua.includes(bot));
if (isBot) {
return new Response("Forbidden", { status: 403 });
}
return context.next();
}6. Full bridgetown.config.yml
A complete bridgetown.config.yml with AI bot protection defaults, scoped overrides, and relevant settings:
# bridgetown.config.yml
url: "https://yoursite.com"
title: "Your Site"
# Default all content to noai — crawlers cannot use it for training
front_matter_defaults:
- scope:
path: ""
values:
robots: "noai, noimageai"
# Allow legitimate indexing of blog posts
- scope:
path: "_posts"
values:
robots: "index, follow"
# Allow sitemap.xml to be read by all crawlers
- scope:
path: "sitemap.xml"
values:
robots: "index, follow"
# Standard Bridgetown config
permalink: pretty
timezone: UTC
# Exclude from build output (not needed — these are not in src/)
exclude:
- node_modules
- vendor
- Gemfile
- Gemfile.lock
- netlify.toml
- vercel.json7. Deployment comparison
Bridgetown's build command is bin/bridgetown build and its publish directory is _site. Here is how each host handles AI bot protection:
| Host | robots.txt | noai meta | X-Robots-Tag | Hard 403 |
|---|---|---|---|---|
| Netlify | src/robots.txt → _site/robots.txt ✓ | ERB/Liquid layout ✓ | netlify.toml [[headers]] ✓ | Edge Function ✓ |
| Vercel | src/robots.txt → _site/robots.txt ✓ | ERB/Liquid layout ✓ | vercel.json headers ✓ | middleware.ts at project root ✓ |
| Cloudflare Pages | src/robots.txt → _site/robots.txt ✓ | ERB/Liquid layout ✓ | src/_headers → _site/_headers ✓ | functions/_middleware.ts ✓ |
| GitHub Pages | src/robots.txt → _site/robots.txt ✓ | ERB/Liquid layout ✓ | Not supported ✗ | Not supported ✗ |
| Render | src/robots.txt → _site/robots.txt ✓ | ERB/Liquid layout ✓ | render.yaml headers ✓ | Not native ✗ |
For full protection — robots.txt + meta tags + X-Robots-Tag + hard 403 — deploy to Netlify, Vercel, or Cloudflare Pages. GitHub Pages and Render lack edge-level bot blocking capability.
FAQ
How do I add robots.txt to a Bridgetown site?
Place src/robots.txt in your source directory. Bridgetown copies all files from src/ to _site/ automatically — no configuration required. For an environment-aware robots.txt, create src/robots.txt.erb with permalink: /robots.txt in front matter and conditionally render content based on Bridgetown.env.production?.
How do I add noai meta tags to a Bridgetown ERB layout?
In your base layout at src/_layouts/default.erb:
<meta name="robots"
content="<%= resource.data.robots || 'noai, noimageai' %>">Use resource.data.robots — not page.robots or {{ page.robots }}. The resource object is Bridgetown's equivalent of Jekyll's page.
How do I set a global default without editing every layout?
Use front_matter_defaults in bridgetown.config.yml:
front_matter_defaults:
- scope:
path: ""
values:
robots: "noai, noimageai"This injects robots: noai, noimageai into every resource's data before rendering, so your layout can access it via resource.data.robotswithout needing the || fallback.
How is Bridgetown different from Jekyll for AI bot blocking?
Key differences to keep in mind:
- Config file is
bridgetown.config.yml, not_config.yml - Default layout engine is ERB — access front matter via
resource.data.field - Liquid layouts use
resource.data.robots, notpage.robots - Files starting with
_insrc/are copied to_site/(the_headersfile for Cloudflare works without special config — unlike Lume) - Bridgetown's Builder API allows injecting
robotsdefaults programmatically via a plugin
Do I need a custom server for Bridgetown to block AI bots?
No. Bridgetown generates static HTML files. Hard 403 bot blocking runs at the hosting layer via Edge Functions (Netlify, Vercel, Cloudflare Pages) — there is no application server to configure. The robots.txt and noai meta approach requires no hosting-layer changes at all.
Will blocking AI bots affect my SEO?
Blocking AI-specific crawlers (GPTBot, ClaudeBot, CCBot, Google-Extended) does not affect standard search engine indexing. Googlebot and Bingbot are separate user agents from Google-Extended and are not blocked by the configurations in this guide. Always include explicit Allow rules for Googlebot and Bingbot in your robots.txt to make your intent unambiguous.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.