Skip to content
GatsbyReact SSGNew8 min read

How to Block AI Bots on Gatsby

Gatsby's pre-rendered HTML and fast load times make it ideal for AI crawlers. Here's how to lock them out with a plain robots.txt, global noai tags via gatsby-ssr.js, and edge-level blocking via Netlify or Cloudflare — all without additional build complexity.

Quick fix — create static/robots.txt

Same folder as gatsby-config.js. Gatsby copies it to public/robots.txt at build — no plugins needed.

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Available Methods

static/robots.txt (Recommended)

Easy

static/robots.txt (copied as-is to public/robots.txt at build)

Gatsby copies every file in static/ directly to public/ unchanged. A plain robots.txt here works on all Gatsby deployments with no plugins or configuration.

No JSX, no imports — just plain text. Gatsby never processes files in static/.

gatsby-plugin-robots-txt

Easy

gatsby-config.js → plugins array

npm plugin that generates robots.txt from your gatsby-config.js at build time. Useful for referencing siteUrl, environment-specific rules, and keeping config in one place.

Requires: npm install gatsby-plugin-robots-txt. Adds a build dependency — for simple blocking, static/ is cleaner.

gatsby-ssr.js — global noai tag

Easy

gatsby-ssr.js (project root)

Use the onRenderBody SSR hook to inject <meta name="robots" content="noai, noimageai"> into every page's <head> at build time. Works with Gatsby 4 and 5.

This is the most reliable global meta tag injection method in Gatsby — runs before React hydration.

Gatsby 5 Head API (per-page)

Easy

export function Head() in any page component

Gatsby 5 introduced the Head export — a React component for page-level <head> management. Use it in specific pages or a shared layout for per-page or global noai tags.

Gatsby 5+ only. For Gatsby 4, use gatsby-plugin-react-helmet with the Helmet component.

Netlify/Cloudflare headers + WAF

Intermediate

static/_headers or netlify.toml (or Cloudflare WAF)

Edge-level blocking via HTTP headers or WAF rules. The only method that stops robots.txt violators like Bytespider. static/_headers works natively on both Netlify and Cloudflare Pages.

static/_headers is copied to public/_headers by Gatsby and read natively by Netlify and Cloudflare Pages.

Method 1: static/robots.txt (Recommended)

Gatsby copies every file in static/ directly to public/ during the build — no processing, no plugins. Create static/robots.txt in your project root (same level as gatsby-config.js).

  1. 1

    Create static/robots.txt in your Gatsby project root with the full AI bot block list:

User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: DeepSeekBot
Disallow: /

User-agent: MistralBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: AI2Bot
Disallow: /

User-agent: Ai2Bot-Dolma
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: DuckAssistBot
Disallow: /

User-agent: omgili
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: webzio-extended
Disallow: /

User-agent: gemini-deep-research
Disallow: /
  1. 2

    Build and verify locally:

    gatsby build
    cat public/robots.txt | head -6
  2. 3

    Commit and deploy. Netlify, Vercel, Cloudflare Pages, and Gatsby Cloud all serve public/ files automatically.

Method 2: gatsby-plugin-robots-txt

For dynamic robots.txt generation tied to your Gatsby config — useful for staging environments or referencing siteUrl in the Sitemap directive.

# Install
npm install gatsby-plugin-robots-txt
// gatsby-config.js
module.exports = {
  siteMetadata: {
    siteUrl: 'https://yourdomain.com',
  },
  plugins: [
    {
      resolve: 'gatsby-plugin-robots-txt',
      options: {
        host: 'https://yourdomain.com',
        sitemap: 'https://yourdomain.com/sitemap/sitemap-index.xml',
        policy: [
          { userAgent: '*', allow: '/' },
          { userAgent: 'GPTBot', disallow: '/' },
          { userAgent: 'ChatGPT-User', disallow: '/' },
          { userAgent: 'OAI-SearchBot', disallow: '/' },
          { userAgent: 'ClaudeBot', disallow: '/' },
          { userAgent: 'anthropic-ai', disallow: '/' },
          { userAgent: 'Google-Extended', disallow: '/' },
          { userAgent: 'Bytespider', disallow: '/' },
          { userAgent: 'CCBot', disallow: '/' },
          { userAgent: 'PerplexityBot', disallow: '/' },
          { userAgent: 'meta-externalagent', disallow: '/' },
          { userAgent: 'Diffbot', disallow: '/' },
          { userAgent: 'DeepSeekBot', disallow: '/' },
          { userAgent: 'MistralBot', disallow: '/' },
          { userAgent: 'cohere-ai', disallow: '/' },
          { userAgent: 'AI2Bot', disallow: '/' },
          // Add full list...
        ],
      },
    },
  ],
};

Method 3: noai Meta Tag via gatsby-ssr.js

gatsby-ssr.js runs at build time and injects elements into every generated page's <head>. This is the most reliable way to add a global noai meta tag — it runs before React hydration and applies to every page including dynamically generated ones.

Create gatsby-ssr.js in your project root:

// gatsby-ssr.js
const React = require('react');

exports.onRenderBody = ({ setHeadComponents }) => {
  setHeadComponents([
    React.createElement('meta', {
      key: 'robots-noai',
      name: 'robots',
      content: 'noai, noimageai',
    }),
  ]);
};

TypeScript version (gatsby-ssr.tsx):

// gatsby-ssr.tsx
import React from 'react';
import type { GatsbySSR } from 'gatsby';

export const onRenderBody: GatsbySSR['onRenderBody'] = ({
  setHeadComponents,
}) => {
  setHeadComponents([
    <meta
      key="robots-noai"
      name="robots"
      content="noai, noimageai"
    />,
  ]);
};
Restart gatsby develop after creating gatsby-ssr.js. Changes to gatsby-ssr.js require a fresh development server start — gatsby develop hot reload does not pick up SSR file changes. Run gatsby build && gatsby serve to verify the tag appears in the generated HTML.

Method 4: Gatsby 5 Head Export (Per-Page)

Gatsby 5 introduced the Head export — a React component for page-level <head> management without react-helmet. Add it to your root layout or specific pages for global or per-page control.

// src/pages/index.tsx (or any page — Gatsby 5+)
import React from 'react';

// The Head export is rendered into the page's <head>
export function Head() {
  return (
    <>
      <meta name="robots" content="noai, noimageai" />
    </>
  );
}

export default function IndexPage() {
  return <main>...</main>;
}

For Gatsby 4 — use gatsby-plugin-react-helmet:

// In your layout component (Gatsby 4)
import { Helmet } from 'react-helmet';

export function Layout({ children }) {
  return (
    <>
      <Helmet>
        <meta name="robots" content="noai, noimageai" />
      </Helmet>
      {children}
    </>
  );
}

Method 5: Netlify / Cloudflare Pages Headers

For edge-level blocking. Place a _headers file in your static/ directory — Gatsby copies it to public/_headers, which Netlify and Cloudflare Pages read natively to set HTTP response headers.

static/_headers (Netlify + Cloudflare Pages)

/*
  X-Robots-Tag: noai, noimageai

Sets the HTTP X-Robots-Tag header on every response. More authoritative than the HTML meta tag.

netlify.toml

[[headers]]
  for = "/*"
  [headers.values]
    X-Robots-Tag = "noai, noimageai"

Cloudflare WAF for hard bot blocking:

(http.user_agent contains "GPTBot") or (http.user_agent contains "ClaudeBot") or (http.user_agent contains "CCBot") or (http.user_agent contains "Bytespider") or (http.user_agent contains "Google-Extended") or (http.user_agent contains "Diffbot") or (http.user_agent contains "meta-externalagent") or (http.user_agent contains "DeepSeekBot")

Action: Block. Stops robots.txt violators before they reach your origin.

Full AI Bot Reference

All 25 AI bots covered by the robots.txt block list above:

GPTBotChatGPT-UserOAI-SearchBotClaudeBotanthropic-aiGoogle-ExtendedBytespiderCCBotPerplexityBotmeta-externalagentAmazonbotApplebot-ExtendedxAI-BotDeepSeekBotMistralBotDiffbotcohere-aiAI2BotAi2Bot-DolmaYouBotDuckAssistBotomgiliomgilibotwebzio-extendedgemini-deep-research

Frequently Asked Questions

Where do I put robots.txt in a Gatsby site?

The simplest approach: put robots.txt in the static/ directory at your Gatsby project root. Gatsby copies everything in static/ to the public/ output directory unchanged during the build. So static/robots.txt becomes public/robots.txt and is served at yourdomain.com/robots.txt. Alternatively, use gatsby-plugin-robots-txt to generate robots.txt from your gatsby-config.js — this lets you use site metadata and environment variables in your rules. Both methods work with Gatsby 4 and 5.

How do I add a noai meta tag to every Gatsby page?

The recommended method for Gatsby is gatsby-ssr.js: export the onRenderBody function and use setHeadComponents to inject a <meta name='robots' content='noai, noimageai'> element into every page's <head>. This runs at build time (SSG) or on the server (SSR) and is the most reliable global approach. Alternatively, use gatsby-plugin-react-helmet (Gatsby 4) or the built-in Head export (Gatsby 5) in your base layout component.

What is gatsby-plugin-robots-txt and do I need it?

gatsby-plugin-robots-txt is an npm package that generates robots.txt from your gatsby-config.js configuration during the Gatsby build. It's useful if you want to: reference your siteUrl from gatsby-config.js in the Sitemap directive, use different rules per environment (e.g. block everything on staging), or manage rules alongside your other Gatsby config. For simple static AI bot blocking, putting a plain robots.txt in static/ is simpler and doesn't require an additional dependency.

How do I use gatsby-ssr.js to inject meta tags?

Create gatsby-ssr.js (or gatsby-ssr.ts for TypeScript) in your project root. Export the onRenderBody function: exports.onRenderBody = ({ setHeadComponents }) => { setHeadComponents([ React.createElement('meta', { name: 'robots', content: 'noai, noimageai', key: 'robots-noai' }) ]); }. This injects the meta tag into every generated page's <head> at build time. It works with both Gatsby 4 and Gatsby 5.

Does blocking AI bots affect Gatsby's built-in SEO features?

No. Blocking GPTBot, ClaudeBot, CCBot, and other AI training bots has zero effect on Googlebot or Bingbot. Gatsby's built-in sitemap plugin (gatsby-plugin-sitemap), canonical URLs, and meta robots settings are completely unaffected. Your Gatsby site's search engine visibility remains the same.

How do I block AI bots on Gatsby deployed to Netlify?

Two approaches work on Netlify: (1) static/robots.txt — Gatsby copies it to public/ and Netlify serves it automatically; (2) netlify.toml headers — add [[headers]] with X-Robots-Tag: noai, noimageai for all routes, or use Netlify's Edge Functions (similar to Vercel Middleware) to block specific user agents with a 403. The netlify.toml approach is most powerful for stopping bots that ignore robots.txt. Netlify also reads a _headers file in your static/ directory (copied to public/_headers) as an alternative to netlify.toml.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

Related Guides