Skip to content
Ruby · Bridgetown SSG · Static Site Generator

How to Block AI Bots on Bridgetown (Ruby SSG)

Bridgetown is a fast, modern static site generator built on Ruby — often described as “Jekyll evolved.” It outputs static HTML to _site/ and supports ERB, Liquid, Serbea, and Markdown. Because Bridgetown produces static files with no runtime server, AI bot protection combines robots.txt, noai meta tags, host-level response headers, and Edge Functions at the hosting layer.

8 min readUpdated April 2026Bridgetown 2.x

1. robots.txt

Bridgetown copies every file in src/ to _site/ during the build. Place robots.txt directly in src/ and it will appear at the root of your deployed site — no configuration required.

Static robots.txt

Create src/robots.txt:

# Block all AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Bytespider
Disallow: /

# Allow legitimate search crawlers
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Allow: /

Bridgetown's build copies this verbatim to _site/robots.txt. No bridgetown.config.yml entry is needed — unlike Lume's site.copy(), Bridgetown copies all files in src/ automatically.

Dynamic robots.txt (environment-aware)

To serve a different robots.txt in staging vs production, create a template file. Bridgetown renders any file with a recognised template extension:

# src/robots.txt.erb
---
permalink: /robots.txt
---
<% if Bridgetown.env.production? %>
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: *
Allow: /
<% else %>
# Staging — block all crawlers
User-agent: *
Disallow: /
<% end %>

The permalink: /robots.txt front matter tells Bridgetown to output this file as _site/robots.txt rather than _site/robots.txt/index.html. Set BRIDGETOWN_ENV=production in your host environment variables to enable production mode.

Note: If both src/robots.txt (static) and src/robots.txt.erb (template) exist, Bridgetown will raise a conflict error during build. Use one or the other.

2. noai meta tags in layouts

The noai and noimageai meta values signal to AI crawlers that the page content and images should not be used for training. Add them to your base layout so every page is covered by default.

ERB layout (default in Bridgetown)

Bridgetown's default layout engine is ERB. Edit src/_layouts/default.erb:

<!DOCTYPE html>
<html lang="<%= site.metadata.locale || 'en' %>">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title><%= resource.data.title || site.metadata.title %></title>

  <!-- AI bot protection: default noai, allow override per-page -->
  <meta name="robots" content="<%= resource.data.robots || 'noai, noimageai' %>">

  <%= liquid_render "head" %>
</head>
<body>
  <%= liquid_render "navbar" %>
  <%= yield %>
  <%= liquid_render "footer" %>
</body>
</html>
Bridgetown ERB vs Jekyll Liquid: In Bridgetown ERB layouts, access front matter via resource.data.field_name, not {{ page.field_name }}. The resource object is Bridgetown's equivalent of Jekyll's page.

Liquid layout

If you use a Liquid layout (src/_layouts/default.liquid or .html):

<meta name="robots"
  content="{{ resource.data.robots | default: 'noai, noimageai' }}">
Bridgetown Liquid vs Jekyll Liquid: Bridgetown uses resource.data.robots, not page.robots. The resource object is always the correct accessor in Bridgetown templates regardless of engine.

Serbea layout

Bridgetown also supports Serbea (.serb), a Ruby-native template format:

<meta name="robots"
  content="{{ resource.data.robots || 'noai, noimageai' }}">

Per-page override

To allow a specific page to be indexed normally, set in its front matter:

---
title: About
robots: "index, follow"
---

This overrides the layout default for that page only. The meta tag will render content="index, follow" instead of content="noai, noimageai".

3. front_matter_defaults in bridgetown.config.yml

Rather than relying on a layout fallback, you can declare a site-wide default for the robots field in bridgetown.config.yml. This ensures the value is always present in resource.data even if the layout uses a strict resource.data.robots accessor with no fallback.

# bridgetown.config.yml
url: "https://yoursite.com"
title: "Your Site"

front_matter_defaults:
  - scope:
      path: ""          # applies to all content in src/
    values:
      robots: "noai, noimageai"

  # Allow the blog index and individual posts to be indexed
  - scope:
      path: "_posts"
    values:
      robots: "index, follow"

With this config, every page and post gets robots: noai, noimageai by default, and posts under _posts/ get index, follow instead. Individual pages can still override with their own front matter.

Scope matching: path: "" matches all content. path: "_posts" matches only files inside src/_posts/. More specific scopes take precedence over broader ones. Front matter in individual files always wins.

Using a Bridgetown plugin (Builder API)

Bridgetown's Builder API lets you inject data into every resource programmatically. Create plugins/builders/robots_defaults.rb:

# plugins/builders/robots_defaults.rb
class RobotsDefaults < SiteBuilder
  def build
    hook :resources, :pre_render do |resource|
      # Set default only if not already set in front matter
      resource.data.robots ||= "noai, noimageai"
    end
  end
end

Bridgetown auto-loads builders in plugins/builders/. This approach works for any resource type (pages, posts, collections) and does not require editing bridgetown.config.yml.

4. X-Robots-Tag via host headers

Bridgetown outputs static HTML — there is no application server adding HTTP headers in production. Set X-Robots-Tag at your hosting layer.

Netlify

In netlify.toml at the project root (not inside _site/):

[build]
  command = "bin/bridgetown build"
  publish = "_site"

[[headers]]
  for = "/*"
  [headers.values]
    X-Robots-Tag = "noai, noimageai"
    X-Content-Type-Options = "nosniff"
    X-Frame-Options = "SAMEORIGIN"

Vercel

In vercel.json at the project root. Set the build command to bin/bridgetown build and output directory to _site:

{
  "buildCommand": "bin/bridgetown build",
  "outputDirectory": "_site",
  "headers": [
    {
      "source": "/(.*)",
      "headers": [
        {
          "key": "X-Robots-Tag",
          "value": "noai, noimageai"
        }
      ]
    }
  ]
}

Cloudflare Pages

Create src/_headers (note the leading underscore). Bridgetown copies all files from src/ to _site/, so this will be placed at _site/_headers:

/*
  X-Robots-Tag: noai, noimageai
Cloudflare vs Lume: Unlike Lume, you do not need to explicitly copy the _headers file in Bridgetown — it's copied automatically because everything in src/ is output. In Lume, files starting with _ are skipped unless you call site.copy("_headers").

GitHub Pages

GitHub Pages does not support custom HTTP headers. The best you can do is the noai meta tag approach (Section 2). For header-level control, use a supported host (Netlify, Vercel, Cloudflare Pages).

5. Hard 403 via Edge Functions

A hard 403 blocks the AI bot before it reads any content — more effective than signals that a crawler can ignore. Requires server-side execution at the edge.

Netlify Edge Function

Create netlify/edge-functions/bot-block.ts (outside src/):

import type { Config, Context } from "@netlify/edge-functions";

const AI_BOTS = [
  "GPTBot", "ClaudeBot", "Claude-Web", "anthropic-ai",
  "CCBot", "Google-Extended", "PerplexityBot",
  "Applebot-Extended", "Amazonbot", "meta-externalagent",
  "Bytespider", "DuckAssistBot", "YouBot",
];

export default async function handler(req: Request, _ctx: Context) {
  const ua = req.headers.get("user-agent") ?? "";
  const isBot = AI_BOTS.some((bot) => ua.includes(bot));

  if (isBot) {
    return new Response("Forbidden", {
      status: 403,
      headers: { "content-type": "text/plain" },
    });
  }
}

export const config: Config = {
  path: "/*",
};

Register in netlify.toml:

[build]
  command = "bin/bridgetown build"
  publish = "_site"

[[edge_functions]]
  path = "/*"
  function = "bot-block"

Vercel middleware

Create middleware.ts at the project root (same level as vercel.json, not inside _site/):

import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";

const AI_BOTS = [
  "GPTBot", "ClaudeBot", "Claude-Web", "anthropic-ai",
  "CCBot", "Google-Extended", "PerplexityBot",
  "Applebot-Extended", "Amazonbot", "meta-externalagent",
  "Bytespider",
];

export function middleware(request: NextRequest) {
  const ua = request.headers.get("user-agent") ?? "";
  const isBot = AI_BOTS.some((bot) => ua.includes(bot));

  if (isBot) {
    return new NextResponse("Forbidden", { status: 403 });
  }
  return NextResponse.next();
}

export const config = {
  matcher: ["/((?!_next/static|favicon.ico).*)"],
};
Vercel static site middleware: Vercel middleware runs for static sites when deployed via Vercel's static output. The middleware.ts file must be at the project root — not inside _site/ or src/.

Cloudflare Pages middleware

Create functions/_middleware.ts at the project root (the functions/ directory is outside src/ and is never compiled into _site/):

// functions/_middleware.ts
const AI_BOTS = [
  "GPTBot", "ClaudeBot", "CCBot", "Google-Extended",
  "PerplexityBot", "Applebot-Extended", "Amazonbot",
  "meta-externalagent", "Bytespider",
];

export async function onRequest(context: EventContext<any, any, any>) {
  const ua = context.request.headers.get("user-agent") ?? "";
  const isBot = AI_BOTS.some((bot) => ua.includes(bot));

  if (isBot) {
    return new Response("Forbidden", { status: 403 });
  }
  return context.next();
}

6. Full bridgetown.config.yml

A complete bridgetown.config.yml with AI bot protection defaults, scoped overrides, and relevant settings:

# bridgetown.config.yml
url: "https://yoursite.com"
title: "Your Site"

# Default all content to noai — crawlers cannot use it for training
front_matter_defaults:
  - scope:
      path: ""
    values:
      robots: "noai, noimageai"

  # Allow legitimate indexing of blog posts
  - scope:
      path: "_posts"
    values:
      robots: "index, follow"

  # Allow sitemap.xml to be read by all crawlers
  - scope:
      path: "sitemap.xml"
    values:
      robots: "index, follow"

# Standard Bridgetown config
permalink: pretty
timezone: UTC

# Exclude from build output (not needed — these are not in src/)
exclude:
  - node_modules
  - vendor
  - Gemfile
  - Gemfile.lock
  - netlify.toml
  - vercel.json

7. Deployment comparison

Bridgetown's build command is bin/bridgetown build and its publish directory is _site. Here is how each host handles AI bot protection:

Hostrobots.txtnoai metaX-Robots-TagHard 403
Netlifysrc/robots.txt → _site/robots.txt ✓ERB/Liquid layout ✓netlify.toml [[headers]] ✓Edge Function ✓
Vercelsrc/robots.txt → _site/robots.txt ✓ERB/Liquid layout ✓vercel.json headers ✓middleware.ts at project root ✓
Cloudflare Pagessrc/robots.txt → _site/robots.txt ✓ERB/Liquid layout ✓src/_headers → _site/_headers ✓functions/_middleware.ts ✓
GitHub Pagessrc/robots.txt → _site/robots.txt ✓ERB/Liquid layout ✓Not supported ✗Not supported ✗
Rendersrc/robots.txt → _site/robots.txt ✓ERB/Liquid layout ✓render.yaml headers ✓Not native ✗

For full protection — robots.txt + meta tags + X-Robots-Tag + hard 403 — deploy to Netlify, Vercel, or Cloudflare Pages. GitHub Pages and Render lack edge-level bot blocking capability.

FAQ

How do I add robots.txt to a Bridgetown site?

Place src/robots.txt in your source directory. Bridgetown copies all files from src/ to _site/ automatically — no configuration required. For an environment-aware robots.txt, create src/robots.txt.erb with permalink: /robots.txt in front matter and conditionally render content based on Bridgetown.env.production?.

How do I add noai meta tags to a Bridgetown ERB layout?

In your base layout at src/_layouts/default.erb:

<meta name="robots"
  content="<%= resource.data.robots || 'noai, noimageai' %>">

Use resource.data.robots — not page.robots or {{ page.robots }}. The resource object is Bridgetown's equivalent of Jekyll's page.

How do I set a global default without editing every layout?

Use front_matter_defaults in bridgetown.config.yml:

front_matter_defaults:
  - scope:
      path: ""
    values:
      robots: "noai, noimageai"

This injects robots: noai, noimageai into every resource's data before rendering, so your layout can access it via resource.data.robotswithout needing the || fallback.

How is Bridgetown different from Jekyll for AI bot blocking?

Key differences to keep in mind:

Do I need a custom server for Bridgetown to block AI bots?

No. Bridgetown generates static HTML files. Hard 403 bot blocking runs at the hosting layer via Edge Functions (Netlify, Vercel, Cloudflare Pages) — there is no application server to configure. The robots.txt and noai meta approach requires no hosting-layer changes at all.

Will blocking AI bots affect my SEO?

Blocking AI-specific crawlers (GPTBot, ClaudeBot, CCBot, Google-Extended) does not affect standard search engine indexing. Googlebot and Bingbot are separate user agents from Google-Extended and are not blocked by the configurations in this guide. Always include explicit Allow rules for Googlebot and Bingbot in your robots.txt to make your intent unambiguous.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.