How to Block AI Bots on Lume: Complete 2026 Guide
Lume is a fast, flexible static site generator built on Deno. It supports Nunjucks, Markdown, JSX, TypeScript, and more — and outputs plain static HTML to _site/. Because Lume has no server process in production, AI bot protection uses a combination of robots.txt, noai meta tags in layouts, host-level header config, and Edge Functions for hard blocking.
Contents
robots.txt — static copy & dynamic route
Option 1: static copy (simplest)
Place robots.txt in your source directory root and tell Lume to copy it as-is via site.copy() in _config.ts:
// _config.ts
import lume from "lume/mod.ts";
const site = lume();
site.copy("robots.txt"); // copies src/robots.txt → _site/robots.txt
export default site;# robots.txt (in your source root)
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: YouBot
Disallow: /
User-agent: *
Allow: /./ or the src value in lume()options). If your source is src/, place the file at src/robots.txt and call site.copy("robots.txt"). Lume copies it verbatim to _site/robots.txt — no template processing.Option 2: dynamic robots.txt page
For environment-based content (different rules for staging vs production), create a Lume page that generates the file:
// src/robots.txt.ts
export const url = "/robots.txt";
const isProduction = Deno.env.get("LUME_ENV") !== "staging";
const aiBlockRules = `User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: YouBot
Disallow: /
`;
const stagingRules = `User-agent: *
Disallow: /
`;
export default function () {
if (!isProduction) {
// On staging: block all crawlers including Google
return stagingRules;
}
return `${aiBlockRules}
User-agent: *
Allow: /
Sitemap: https://yourdomain.com/sitemap.xml`;
}robots.txt file copied via site.copy() and a src/robots.txt.ts page, Lume's page wins — it overwrites the copied file in _site/. Use one approach, not both.noai meta tag in layouts
Lume supports multiple template engines. Add the robots meta tag to your base layout in _includes/:
Nunjucks layout (_includes/layout.njk)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>{{ title }}</title>
{# Block AI training by default; override per-page with robots: index, follow #}
<meta name="robots" content="{{ robots | default('noai, noimageai') }}">
</head>
<body>
{{ content | safe }}
</body>
</html>JSX/TSX layout (_includes/Layout.tsx)
interface Props {
title: string;
robots?: string;
children: unknown;
}
export default ({ title, robots = "noai, noimageai", children }: Props) => (
<html lang="en">
<head>
<meta charSet="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>{title}</title>
<meta name="robots" content={robots} />
</head>
<body>{children}</body>
</html>
);Liquid layout (_includes/layout.liquid)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>{{ title }}</title>
<meta name="robots" content="{{ robots | default: 'noai, noimageai' }}">
</head>
<body>
{{ content }}
</body>
</html>Per-page override (front matter)
---
title: About This Site
layout: layout.njk
robots: index, follow, max-image-preview:large
---
Page content here._data.yml / _data.ts cascade. Set a global default for all pages in _data.yml at the root level: robots: "noai, noimageai". Per-page front matter overrides it. This avoids updating every layout file individually.Global default via _data.yml
# _data.yml (applies to all pages in the directory and subdirectories)
robots: "noai, noimageai"
layout: layout.njkX-Robots-Tag via host headers config
Lume outputs a static site — there is no server process in production. Response headers must be configured at the hosting layer.
Netlify (netlify.toml)
# netlify.toml
[build]
publish = "_site"
command = "deno task build"
[[headers]]
for = "/*"
[headers.values]
X-Robots-Tag = "noai, noimageai"
X-Content-Type-Options = "nosniff"
X-Frame-Options = "SAMEORIGIN"Vercel (vercel.json)
{
"outputDirectory": "_site",
"buildCommand": "deno task build",
"headers": [
{
"source": "/(.*)",
"headers": [
{ "key": "X-Robots-Tag", "value": "noai, noimageai" },
{ "key": "X-Content-Type-Options", "value": "nosniff" }
]
}
]
}Cloudflare Pages (_headers file)
# _headers (place in your source root, copy via site.copy("_headers"))
/*
X-Robots-Tag: noai, noimageai
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN// _config.ts — copy _headers to _site/
site.copy("_headers");_headers file from your publish directory (_site/). Since Lume won't automatically copy files starting with _, you must explicitly include site.copy("_headers") in _config.ts.Hard 403 via Edge Functions
Robots.txt and meta tags are advisory — determined bots ignore them. Edge Functions enforce hard 403 responses before any HTML is served.
Netlify Edge Function
// netlify/edge-functions/bot-block.ts
import type { Context } from "https://edge.netlify.com/";
const BOT_PATTERN =
/GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot/i;
export default async (request: Request, context: Context): Promise<Response> => {
const ua = request.headers.get("user-agent") ?? "";
if (BOT_PATTERN.test(ua)) {
return new Response("Forbidden", {
status: 403,
headers: { "Content-Type": "text/plain" },
});
}
return context.next();
};
export const config = { path: "/*" };Vercel middleware (project root)
// middleware.ts (place in project root — NOT in _site/)
import { NextRequest, NextResponse } from "next/server";
const BOT_PATTERN =
/GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot/i;
export function middleware(req: NextRequest) {
const ua = req.headers.get("user-agent") ?? "";
if (BOT_PATTERN.test(ua)) {
return new NextResponse("Forbidden", { status: 403 });
}
return NextResponse.next();
}
export const config = { matcher: ["/((?!_next/static|_next/image|favicon.ico).*)"] };_config.ts and vercel.json. Lume's build output in _site/ is the static site — the middleware is a Vercel-specific file that Vercel picks up from the project root. Never put middleware.ts inside _site/.Cloudflare Pages (_middleware.ts)
// functions/_middleware.ts
import type { PagesFunction } from "@cloudflare/workers-types";
const BOT_PATTERN =
/GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot/i;
export const onRequest: PagesFunction = async (context) => {
const ua = context.request.headers.get("user-agent") ?? "";
if (BOT_PATTERN.test(ua)) {
return new Response("Forbidden", {
status: 403,
headers: { "Content-Type": "text/plain" },
});
}
return context.next();
};functions/ directory at the project root — not inside your publish directory. The _middleware.ts file in functions/ intercepts all requests before the static files are served.Deno Deploy bot-blocking entrypoint
Lume has a first-class integration with Deno Deploy. When using a custom server entrypoint, you can implement bot blocking before serving static files:
// server.ts (Deno Deploy entrypoint)
import { serveDir } from "jsr:@std/http/file-server";
const BOT_PATTERN =
/GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot/i;
Deno.serve(async (req: Request): Promise<Response> => {
const ua = req.headers.get("user-agent") ?? "";
// Block AI bots before serving any file
if (BOT_PATTERN.test(ua)) {
return new Response("Forbidden", {
status: 403,
headers: {
"Content-Type": "text/plain",
"X-Robots-Tag": "noai, noimageai",
},
});
}
// Inject X-Robots-Tag on all served responses
const res = await serveDir(req, {
fsRoot: "_site",
urlRoot: "",
quiet: true,
});
const headers = new Headers(res.headers);
headers.set("X-Robots-Tag", "noai, noimageai");
return new Response(res.body, {
status: res.status,
statusText: res.statusText,
headers,
});
});# deno.json — deployment tasks
{
"tasks": {
"build": "deno run -A https://deno.land/x/lume/ci.ts",
"serve": "deno run -A https://deno.land/x/lume/cli.ts --serve",
"deploy": "deployctl deploy --project=my-lume-site server.ts"
}
}serveDir() returns a Response object whose headers you cannot modify in place. Create a new Response with new Headers(res.headers), set your header, then wrap the original res.body in a new Response. The body stream passes through without buffering.Deploy to Deno Deploy with GitHub Actions
# .github/workflows/deploy.yml
name: Deploy to Deno Deploy
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- uses: actions/checkout@v4
- uses: denoland/setup-deno@v2
with:
deno-version: v2.x
- name: Build Lume site
run: deno task build
- name: Deploy to Deno Deploy
uses: denoland/deployctl@v1
with:
project: my-lume-site
entrypoint: server.tsFull _config.ts example
// _config.ts
import lume from "lume/mod.ts";
import nunjucks from "lume/plugins/nunjucks.ts";
import markdown from "lume/plugins/markdown.ts";
import jsx from "lume/plugins/jsx.ts";
import sitemap from "lume/plugins/sitemap.ts";
import minifyHTML from "lume/plugins/minify_html.ts";
const site = lume({
src: "./src",
dest: "./_site",
location: new URL("https://yourdomain.com"),
});
// Plugins
site.use(nunjucks());
site.use(markdown());
site.use(jsx());
site.use(sitemap());
site.use(minifyHTML());
// Copy static files as-is to _site/
site.copy("robots.txt");
site.copy("_headers"); // Cloudflare Pages headers
site.copy("favicon.ico");
site.copy("static", "."); // src/static/ → _site/ (merge at root)
// Global data available in all templates
site.data("robots", "noai, noimageai"); // default robots value
site.data("site", {
title: "My Lume Site",
url: "https://yourdomain.com",
});
export default site;site.data("robots", "noai, noimageai") sets a global data value available in all templates as {{ robots }} (Nunjucks) or page.data.robots (JSX). Per-page front matter overrides it. Alternatively, use _data.yml in the source root — both approaches work, but site.data() is more visible and IDE-friendly.Deployment comparison
| Platform | robots.txt | X-Robots-Tag | Hard 403 | Notes |
|---|---|---|---|---|
| Deno Deploy | serveDir auto-serves | server.ts response headers | server.ts UA check | Native Lume target; full control |
| Netlify | Copied to _site/ | netlify.toml [[headers]] | Edge Function | netlify.toml publish = "_site" |
| Vercel | Copied to _site/ | vercel.json headers() | middleware.ts (project root) | outputDirectory: "_site" |
| Cloudflare Pages | Copied to _site/ | _headers file (copy it) | functions/_middleware.ts | _headers needs site.copy("_headers") |
| GitHub Pages | Copied to _site/ | Not supported | Not supported | No custom headers; noai meta only |
| Firebase Hosting | Copied to _site/ | firebase.json headers | Cloud Functions rewrite | public: "_site" in firebase.json |
FAQ
How do I add robots.txt to a Lume site?
Two options: (1) Static copy — place robots.txt in your source directory and add site.copy("robots.txt") in _config.ts. Lume copies it verbatim to _site/. (2) Dynamic route — create src/robots.txt.ts that exports url: "/robots.txt" and a default function returning the file content as a string. If both exist, the page route wins.
How do I add noai meta tags to a Lume layout?
Add <meta name="robots" content="{{ robots | default('noai, noimageai') }}"> to your base layout in _includes/. Override per-page with robots: index, follow in front matter. For a global default without touching every layout, use site.data("robots", "noai, noimageai") in _config.ts.
How do I add X-Robots-Tag to a Lume static site?
Lume has no server process in production — headers are set at the hosting layer. Netlify: [[headers]] in netlify.toml. Vercel: headers() in vercel.json. Cloudflare Pages: _headers file copied via site.copy("_headers"). Deno Deploy: set headers in your server.ts entrypoint.
How do I hard-block AI bots with a 403 on a Lume site?
Hard 403 needs server-side execution. Netlify: Edge Function in netlify/edge-functions/. Vercel: middleware.ts in the project root (not inside _site/). Cloudflare Pages: functions/_middleware.ts. Deno Deploy: check User-Agent in server.ts before calling serveDir().
What is the output directory in Lume?
_site/ by default. Change with lume({ dest: "dist" }) in _config.ts. Point your hosting platform's publish directory to whichever value you set — Netlify: publish = "_site", Vercel: outputDirectory: "_site".
How does bot blocking work on Deno Deploy with a Lume site?
Use a custom server.ts entrypoint that checks req.headers.get("user-agent") before calling serveDir(req, { fsRoot: "_site" }). Matching bots return new Response("Forbidden", { status: 403 }) immediately. Legitimate requests pass through to the static file server.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.