How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

AstroIslandsNew8 min read

How to Block AI Bots on Astro

Astro's zero-JS-by-default output produces the kind of clean, fast HTML that AI crawlers love. Block them with a static robots.txt, Astro's native src/pages/robots.txt.ts endpoint, and noai tags in your base layout — no extra dependencies needed.

SSG (Static) mode

✓ public/robots.txt
✓ src/pages/robots.txt.ts (pre-rendered)
✓ noai meta tag in BaseHead.astro
✓ public/_headers (Cloudflare/Netlify)
✓ Platform WAF rules
✗ Astro middleware (not available in SSG)

SSR mode (with adapter)

✓ public/robots.txt
✓ src/pages/robots.txt.ts (dynamic)
✓ noai meta tag in BaseHead.astro
✓ Astro middleware (defineMiddleware)
✓ Platform WAF rules
✓ All SSG methods above

Quick fix — create public/robots.txt

Same folder as src/ and astro.config.mjs. Astro copies it to dist/robots.txt at build.

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Available Methods

public/robots.txt (Recommended for SSG)

Easy

SSG + SSR

public/robots.txt (copied as-is to dist/robots.txt)

Astro copies everything in public/ directly to dist/ unchanged. A plain robots.txt here requires no configuration and works in all Astro modes and hosting platforms.

No frontmatter, no imports — plain text only. Works with any Astro adapter.

src/pages/robots.txt.ts — API Endpoint

Easy

SSG + SSR

src/pages/robots.txt.ts

Astro's native pattern for generating non-HTML files. Export a GET function returning a Response with text/plain. Lets you reference env vars, site config, or change rules per environment.

In SSG mode, Astro pre-renders this to a static robots.txt at build time. In SSR mode, it runs on each request.

noai meta tag in BaseHead.astro

Easy

SSG + SSR

src/layouts/BaseHead.astro (or Layout.astro)

Add <meta name="robots" content="noai, noimageai" /> in your base layout's <head> block. Applies globally to every page using that layout.

For per-page control, pass a prop from the page component and conditionally render in the layout.

Astro Middleware (defineMiddleware)

Intermediate

SSR only

src/middleware.ts (requires SSR adapter)

Intercept requests at the server level and return 403 for matched AI bot user agents. Only works in SSR mode (with @astrojs/node, @astrojs/vercel, or @astrojs/netlify adapter).

Does NOT work in SSG (static) mode — there is no server to run middleware. For SSG, use platform WAF instead.

public/_headers + Cloudflare/Netlify WAF

Intermediate

SSG + SSR

public/_headers (or platform dashboard)

Edge-level blocking via HTTP headers or WAF rules. Astro copies public/_headers to dist/_headers, which Cloudflare Pages and Netlify read natively. Only method that stops robots.txt violators.

The most reliable blocking method regardless of SSG/SSR mode.

Method 1: public/robots.txt

Astro copies every file in public/ to dist/ unchanged at build time. Create public/robots.txt in your project root — same level as src/ and astro.config.mjs.

User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: DeepSeekBot
Disallow: /

User-agent: MistralBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: AI2Bot
Disallow: /

User-agent: Ai2Bot-Dolma
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: DuckAssistBot
Disallow: /

User-agent: omgili
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: webzio-extended
Disallow: /

User-agent: gemini-deep-research
Disallow: /

Build and verify:

npx astro build
cat dist/robots.txt | head -6

Method 2: src/pages/robots.txt.ts — Astro Endpoint

Astro's native approach for generating any non-HTML file. Create a src/pages/robots.txt.ts file exporting a GET function. Astro generates dist/robots.txt at build time (SSG) or serves it dynamically (SSR). Useful for environment-based rules.

Conflict note: If both public/robots.txt and src/pages/robots.txt.ts exist, Astro will warn about a conflict. Use one or the other. The endpoint approach takes precedence in SSR mode.

// src/pages/robots.txt.ts
import type { APIRoute } from 'astro';

const AI_BOT_RULES = `
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: DeepSeekBot
Disallow: /

User-agent: MistralBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: AI2Bot
Disallow: /

User-agent: Ai2Bot-Dolma
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: DuckAssistBot
Disallow: /

User-agent: omgili
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: webzio-extended
Disallow: /

User-agent: gemini-deep-research
Disallow: /
`.trim();

export const GET: APIRoute = ({ site }) => {
  const siteUrl = site ?? 'https://yourdomain.com';
  const content = `User-agent: *\nAllow: /\n${AI_BOT_RULES}\n\nSitemap: ${siteUrl}sitemap-index.xml`;

  return new Response(content, {
    headers: {
      'Content-Type': 'text/plain; charset=utf-8',
    },
  });
};

Method 3: noai Meta Tag in BaseHead.astro

Add the noai meta tag to your base layout component. In Astro projects, this is typically src/layouts/BaseHead.astro or src/layouts/Layout.astro — the component that renders the <head> block.

<!-- src/layouts/BaseHead.astro -->
---
interface Props {
  title: string;
  description: string;
}
const { title, description } = Astro.props;
---

<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width" />
<title>{title}</title>
<meta name="description" content={description} />

<!-- AI bot opt-out -->
<meta name="robots" content="noai, noimageai" />

Per-page control via props:

<!-- src/layouts/BaseHead.astro -->
---
interface Props {
  title: string;
  description: string;
  noai?: boolean;
}
const { title, description, noai = false } = Astro.props;
---
<!-- Only renders on pages that pass noai={true} -->
{noai && <meta name="robots" content="noai, noimageai" />}

<!-- In a specific page (src/pages/my-page.astro): -->
---
import BaseHead from '../layouts/BaseHead.astro';
---
<head>
  <BaseHead title="Protected Page" description="..." noai={true} />
</head>

Method 4: Astro Middleware (SSR Only)

For SSR deployments — intercept requests before any page renders and return 403 for matched bot user agents. Requires an SSR adapter (@astrojs/node, @astrojs/vercel, @astrojs/netlify, or @astrojs/cloudflare).

SSR only: Astro middleware does not run in static (output: 'static') mode. Check your astro.config.mjs — you need output: 'server' or output: 'hybrid'.

// src/middleware.ts
import { defineMiddleware } from 'astro:middleware';

const BLOCKED_BOTS = /GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Diffbot|DeepSeekBot|MistralBot|cohere-ai|AI2Bot|YouBot|omgili|omgilibot|webzio-extended|gemini-deep-research|xAI-Bot|Applebot-Extended|Amazonbot/i;

export const onRequest = defineMiddleware((context, next) => {
  const ua = context.request.headers.get('user-agent') ?? '';

  if (BLOCKED_BOTS.test(ua)) {
    return new Response('Forbidden', {
      status: 403,
      headers: { 'Content-Type': 'text/plain' },
    });
  }

  return next();
});

Method 5: public/_headers + WAF

Create public/_headers — Astro copies it to dist/_headers, which Cloudflare Pages and Netlify read natively to set HTTP response headers on every response.

public/_headers

/*
  X-Robots-Tag: noai, noimageai

Works on Cloudflare Pages and Netlify. More authoritative than HTML meta tag.

Cloudflare WAF rule

Field: User Agent
Contains: GPTBot
Action: Block
(repeat for each bot)

Edge-level 403 — stops bots before they hit your origin. Free plan available.

Cloudflare WAF expression for major AI bots:

(http.user_agent contains "GPTBot") or (http.user_agent contains "ClaudeBot") or (http.user_agent contains "CCBot") or (http.user_agent contains "Bytespider") or (http.user_agent contains "Google-Extended") or (http.user_agent contains "Diffbot") or (http.user_agent contains "meta-externalagent") or (http.user_agent contains "DeepSeekBot")

Full AI Bot Reference

All 25 AI bots covered by the robots.txt block list above:

GPTBotChatGPT-UserOAI-SearchBotClaudeBotanthropic-aiGoogle-ExtendedBytespiderCCBotPerplexityBotmeta-externalagentAmazonbotApplebot-ExtendedxAI-BotDeepSeekBotMistralBotDiffbotcohere-aiAI2BotAi2Bot-DolmaYouBotDuckAssistBotomgiliomgilibotwebzio-extendedgemini-deep-research

Frequently Asked Questions

Where do I put robots.txt in an Astro site?↓

Two options in Astro: (1) public/robots.txt — Astro copies everything in public/ directly to the dist/ output directory unchanged. This is the simplest approach and works in both SSG and SSR modes. (2) src/pages/robots.txt.ts — create a TypeScript API endpoint that returns a Response with Content-Type: text/plain. This is Astro's native way to generate any file dynamically, and lets you reference environment variables, site config, or conditionally change rules per deployment environment.

How do I add a noai meta tag to every Astro page?↓

Add the noai meta tag to your base layout component — typically src/layouts/BaseHead.astro or src/layouts/Layout.astro (the component that renders the <head> block). Add <meta name='robots' content='noai, noimageai' /> inside the <head>. Since Astro layouts are used by all pages that import them, this applies globally. For per-page control, add the meta tag to specific page components or pass a prop from the page to conditionally render it in the layout.

What is src/pages/robots.txt.ts in Astro?↓

In Astro, any file in src/pages/ that exports a GET function becomes a server endpoint (or static route at build time in SSG mode). Creating src/pages/robots.txt.ts and exporting a GET APIRoute function that returns a Response with Content-Type: text/plain causes Astro to generate a robots.txt file at /robots.txt. This is Astro's idiomatic way to generate non-HTML files like robots.txt, sitemap.xml, or RSS feeds — it's more flexible than a static file in public/ and supports dynamic content.

Does Astro middleware work for blocking AI bots in SSG mode?↓

Astro middleware (src/middleware.ts using defineMiddleware) only runs at request time — it works in SSR mode (with a server adapter like @astrojs/node, @astrojs/vercel, or @astrojs/netlify) but NOT in pure SSG (static) mode. In SSG mode, there is no server to intercept requests, so middleware is ignored. For SSG deployments, rely on robots.txt + noai meta tags, and use platform-level tools (Cloudflare WAF, Netlify Edge Functions, Vercel Middleware) for runtime bot blocking.

How do I block AI bots on Astro deployed to Cloudflare Pages?↓

Three options for Cloudflare Pages: (1) public/robots.txt — simplest, works immediately; (2) public/_headers file — Cloudflare Pages reads this natively to set HTTP response headers like X-Robots-Tag: noai, noimageai; (3) Cloudflare WAF custom rules — create user-agent-based rules in your Cloudflare dashboard to block specific AI crawlers with a 403 before they reach your site. The WAF approach is the most powerful for stopping robots.txt violators like Bytespider.

Will blocking AI bots affect Astro's built-in sitemap?↓

No. Blocking GPTBot, ClaudeBot, CCBot, and other AI training bots has no effect on Googlebot or Bingbot. Astro's @astrojs/sitemap integration continues generating sitemap.xml normally. Your search engine rankings, sitemap discovery, and canonical URLs are completely unaffected.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.