How to Block AI Bots on Astro
Astro's zero-JS-by-default output produces the kind of clean, fast HTML that AI crawlers love. Block them with a static robots.txt, Astro's native src/pages/robots.txt.ts endpoint, and noai tags in your base layout — no extra dependencies needed.
SSG (Static) mode
- ✓ public/robots.txt
- ✓ src/pages/robots.txt.ts (pre-rendered)
- ✓ noai meta tag in BaseHead.astro
- ✓ public/_headers (Cloudflare/Netlify)
- ✓ Platform WAF rules
- ✗ Astro middleware (not available in SSG)
SSR mode (with adapter)
- ✓ public/robots.txt
- ✓ src/pages/robots.txt.ts (dynamic)
- ✓ noai meta tag in BaseHead.astro
- ✓ Astro middleware (defineMiddleware)
- ✓ Platform WAF rules
- ✓ All SSG methods above
Quick fix — create public/robots.txt
Same folder as src/ and astro.config.mjs. Astro copies it to dist/robots.txt at build.
User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: CCBot Disallow: / User-agent: Google-Extended Disallow: /
Available Methods
public/robots.txt (Recommended for SSG)
EasySSG + SSR
public/robots.txt (copied as-is to dist/robots.txt)
Astro copies everything in public/ directly to dist/ unchanged. A plain robots.txt here requires no configuration and works in all Astro modes and hosting platforms.
No frontmatter, no imports — plain text only. Works with any Astro adapter.
src/pages/robots.txt.ts — API Endpoint
EasySSG + SSR
src/pages/robots.txt.ts
Astro's native pattern for generating non-HTML files. Export a GET function returning a Response with text/plain. Lets you reference env vars, site config, or change rules per environment.
In SSG mode, Astro pre-renders this to a static robots.txt at build time. In SSR mode, it runs on each request.
noai meta tag in BaseHead.astro
EasySSG + SSR
src/layouts/BaseHead.astro (or Layout.astro)
Add <meta name="robots" content="noai, noimageai" /> in your base layout's <head> block. Applies globally to every page using that layout.
For per-page control, pass a prop from the page component and conditionally render in the layout.
Astro Middleware (defineMiddleware)
IntermediateSSR only
src/middleware.ts (requires SSR adapter)
Intercept requests at the server level and return 403 for matched AI bot user agents. Only works in SSR mode (with @astrojs/node, @astrojs/vercel, or @astrojs/netlify adapter).
Does NOT work in SSG (static) mode — there is no server to run middleware. For SSG, use platform WAF instead.
public/_headers + Cloudflare/Netlify WAF
IntermediateSSG + SSR
public/_headers (or platform dashboard)
Edge-level blocking via HTTP headers or WAF rules. Astro copies public/_headers to dist/_headers, which Cloudflare Pages and Netlify read natively. Only method that stops robots.txt violators.
The most reliable blocking method regardless of SSG/SSR mode.
Method 1: public/robots.txt
Astro copies every file in public/ to dist/ unchanged at build time. Create public/robots.txt in your project root — same level as src/ and astro.config.mjs.
User-agent: * Allow: / User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Google-Extended Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: meta-externalagent Disallow: / User-agent: Amazonbot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: xAI-Bot Disallow: / User-agent: DeepSeekBot Disallow: / User-agent: MistralBot Disallow: / User-agent: Diffbot Disallow: / User-agent: cohere-ai Disallow: / User-agent: AI2Bot Disallow: / User-agent: Ai2Bot-Dolma Disallow: / User-agent: YouBot Disallow: / User-agent: DuckAssistBot Disallow: / User-agent: omgili Disallow: / User-agent: omgilibot Disallow: / User-agent: webzio-extended Disallow: / User-agent: gemini-deep-research Disallow: /
Build and verify:
npx astro build cat dist/robots.txt | head -6
Method 2: src/pages/robots.txt.ts — Astro Endpoint
Astro's native approach for generating any non-HTML file. Create a src/pages/robots.txt.ts file exporting a GET function. Astro generates dist/robots.txt at build time (SSG) or serves it dynamically (SSR). Useful for environment-based rules.
public/robots.txt and src/pages/robots.txt.ts exist, Astro will warn about a conflict. Use one or the other. The endpoint approach takes precedence in SSR mode.// src/pages/robots.txt.ts
import type { APIRoute } from 'astro';
const AI_BOT_RULES = `
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: meta-externalagent
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: xAI-Bot
Disallow: /
User-agent: DeepSeekBot
Disallow: /
User-agent: MistralBot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: AI2Bot
Disallow: /
User-agent: Ai2Bot-Dolma
Disallow: /
User-agent: YouBot
Disallow: /
User-agent: DuckAssistBot
Disallow: /
User-agent: omgili
Disallow: /
User-agent: omgilibot
Disallow: /
User-agent: webzio-extended
Disallow: /
User-agent: gemini-deep-research
Disallow: /
`.trim();
export const GET: APIRoute = ({ site }) => {
const siteUrl = site ?? 'https://yourdomain.com';
const content = `User-agent: *\nAllow: /\n${AI_BOT_RULES}\n\nSitemap: ${siteUrl}sitemap-index.xml`;
return new Response(content, {
headers: {
'Content-Type': 'text/plain; charset=utf-8',
},
});
};Method 3: noai Meta Tag in BaseHead.astro
Add the noai meta tag to your base layout component. In Astro projects, this is typically src/layouts/BaseHead.astro or src/layouts/Layout.astro — the component that renders the <head> block.
<!-- src/layouts/BaseHead.astro -->
---
interface Props {
title: string;
description: string;
}
const { title, description } = Astro.props;
---
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width" />
<title>{title}</title>
<meta name="description" content={description} />
<!-- AI bot opt-out -->
<meta name="robots" content="noai, noimageai" />Per-page control via props:
<!-- src/layouts/BaseHead.astro -->
---
interface Props {
title: string;
description: string;
noai?: boolean;
}
const { title, description, noai = false } = Astro.props;
---
<!-- Only renders on pages that pass noai={true} -->
{noai && <meta name="robots" content="noai, noimageai" />}
<!-- In a specific page (src/pages/my-page.astro): -->
---
import BaseHead from '../layouts/BaseHead.astro';
---
<head>
<BaseHead title="Protected Page" description="..." noai={true} />
</head>Method 4: Astro Middleware (SSR Only)
For SSR deployments — intercept requests before any page renders and return 403 for matched bot user agents. Requires an SSR adapter (@astrojs/node, @astrojs/vercel, @astrojs/netlify, or @astrojs/cloudflare).
output: 'static') mode. Check your astro.config.mjs — you need output: 'server' or output: 'hybrid'.// src/middleware.ts
import { defineMiddleware } from 'astro:middleware';
const BLOCKED_BOTS = /GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Diffbot|DeepSeekBot|MistralBot|cohere-ai|AI2Bot|YouBot|omgili|omgilibot|webzio-extended|gemini-deep-research|xAI-Bot|Applebot-Extended|Amazonbot/i;
export const onRequest = defineMiddleware((context, next) => {
const ua = context.request.headers.get('user-agent') ?? '';
if (BLOCKED_BOTS.test(ua)) {
return new Response('Forbidden', {
status: 403,
headers: { 'Content-Type': 'text/plain' },
});
}
return next();
});Method 5: public/_headers + WAF
Create public/_headers — Astro copies it to dist/_headers, which Cloudflare Pages and Netlify read natively to set HTTP response headers on every response.
public/_headers
/* X-Robots-Tag: noai, noimageai
Works on Cloudflare Pages and Netlify. More authoritative than HTML meta tag.
Cloudflare WAF rule
Field: User Agent Contains: GPTBot Action: Block (repeat for each bot)
Edge-level 403 — stops bots before they hit your origin. Free plan available.
(http.user_agent contains "GPTBot") or (http.user_agent contains "ClaudeBot") or (http.user_agent contains "CCBot") or (http.user_agent contains "Bytespider") or (http.user_agent contains "Google-Extended") or (http.user_agent contains "Diffbot") or (http.user_agent contains "meta-externalagent") or (http.user_agent contains "DeepSeekBot")
Full AI Bot Reference
All 25 AI bots covered by the robots.txt block list above:
Frequently Asked Questions
Where do I put robots.txt in an Astro site?↓
Two options in Astro: (1) public/robots.txt — Astro copies everything in public/ directly to the dist/ output directory unchanged. This is the simplest approach and works in both SSG and SSR modes. (2) src/pages/robots.txt.ts — create a TypeScript API endpoint that returns a Response with Content-Type: text/plain. This is Astro's native way to generate any file dynamically, and lets you reference environment variables, site config, or conditionally change rules per deployment environment.
How do I add a noai meta tag to every Astro page?↓
Add the noai meta tag to your base layout component — typically src/layouts/BaseHead.astro or src/layouts/Layout.astro (the component that renders the <head> block). Add <meta name='robots' content='noai, noimageai' /> inside the <head>. Since Astro layouts are used by all pages that import them, this applies globally. For per-page control, add the meta tag to specific page components or pass a prop from the page to conditionally render it in the layout.
What is src/pages/robots.txt.ts in Astro?↓
In Astro, any file in src/pages/ that exports a GET function becomes a server endpoint (or static route at build time in SSG mode). Creating src/pages/robots.txt.ts and exporting a GET APIRoute function that returns a Response with Content-Type: text/plain causes Astro to generate a robots.txt file at /robots.txt. This is Astro's idiomatic way to generate non-HTML files like robots.txt, sitemap.xml, or RSS feeds — it's more flexible than a static file in public/ and supports dynamic content.
Does Astro middleware work for blocking AI bots in SSG mode?↓
Astro middleware (src/middleware.ts using defineMiddleware) only runs at request time — it works in SSR mode (with a server adapter like @astrojs/node, @astrojs/vercel, or @astrojs/netlify) but NOT in pure SSG (static) mode. In SSG mode, there is no server to intercept requests, so middleware is ignored. For SSG deployments, rely on robots.txt + noai meta tags, and use platform-level tools (Cloudflare WAF, Netlify Edge Functions, Vercel Middleware) for runtime bot blocking.
How do I block AI bots on Astro deployed to Cloudflare Pages?↓
Three options for Cloudflare Pages: (1) public/robots.txt — simplest, works immediately; (2) public/_headers file — Cloudflare Pages reads this natively to set HTTP response headers like X-Robots-Tag: noai, noimageai; (3) Cloudflare WAF custom rules — create user-agent-based rules in your Cloudflare dashboard to block specific AI crawlers with a 403 before they reach your site. The WAF approach is the most powerful for stopping robots.txt violators like Bytespider.
Will blocking AI bots affect Astro's built-in sitemap?↓
No. Blocking GPTBot, ClaudeBot, CCBot, and other AI training bots has no effect on Googlebot or Bingbot. Astro's @astrojs/sitemap integration continues generating sitemap.xml normally. Your search engine rankings, sitemap discovery, and canonical URLs are completely unaffected.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.