How to Block AI Bots on Gatsby
Gatsby's pre-rendered HTML and fast load times make it ideal for AI crawlers. Here's how to lock them out with a plain robots.txt, global noai tags via gatsby-ssr.js, and edge-level blocking via Netlify or Cloudflare — all without additional build complexity.
Quick fix — create static/robots.txt
Same folder as gatsby-config.js. Gatsby copies it to public/robots.txt at build — no plugins needed.
User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: CCBot Disallow: / User-agent: Google-Extended Disallow: /
Available Methods
static/robots.txt (Recommended)
Easystatic/robots.txt (copied as-is to public/robots.txt at build)
Gatsby copies every file in static/ directly to public/ unchanged. A plain robots.txt here works on all Gatsby deployments with no plugins or configuration.
No JSX, no imports — just plain text. Gatsby never processes files in static/.
gatsby-plugin-robots-txt
Easygatsby-config.js → plugins array
npm plugin that generates robots.txt from your gatsby-config.js at build time. Useful for referencing siteUrl, environment-specific rules, and keeping config in one place.
Requires: npm install gatsby-plugin-robots-txt. Adds a build dependency — for simple blocking, static/ is cleaner.
gatsby-ssr.js — global noai tag
Easygatsby-ssr.js (project root)
Use the onRenderBody SSR hook to inject <meta name="robots" content="noai, noimageai"> into every page's <head> at build time. Works with Gatsby 4 and 5.
This is the most reliable global meta tag injection method in Gatsby — runs before React hydration.
Gatsby 5 Head API (per-page)
Easyexport function Head() in any page component
Gatsby 5 introduced the Head export — a React component for page-level <head> management. Use it in specific pages or a shared layout for per-page or global noai tags.
Gatsby 5+ only. For Gatsby 4, use gatsby-plugin-react-helmet with the Helmet component.
Netlify/Cloudflare headers + WAF
Intermediatestatic/_headers or netlify.toml (or Cloudflare WAF)
Edge-level blocking via HTTP headers or WAF rules. The only method that stops robots.txt violators like Bytespider. static/_headers works natively on both Netlify and Cloudflare Pages.
static/_headers is copied to public/_headers by Gatsby and read natively by Netlify and Cloudflare Pages.
Method 1: static/robots.txt (Recommended)
Gatsby copies every file in static/ directly to public/ during the build — no processing, no plugins. Create static/robots.txt in your project root (same level as gatsby-config.js).
- 1
Create
static/robots.txtin your Gatsby project root with the full AI bot block list:
User-agent: * Allow: / User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Google-Extended Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: meta-externalagent Disallow: / User-agent: Amazonbot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: xAI-Bot Disallow: / User-agent: DeepSeekBot Disallow: / User-agent: MistralBot Disallow: / User-agent: Diffbot Disallow: / User-agent: cohere-ai Disallow: / User-agent: AI2Bot Disallow: / User-agent: Ai2Bot-Dolma Disallow: / User-agent: YouBot Disallow: / User-agent: DuckAssistBot Disallow: / User-agent: omgili Disallow: / User-agent: omgilibot Disallow: / User-agent: webzio-extended Disallow: / User-agent: gemini-deep-research Disallow: /
- 2
Build and verify locally:
gatsby build cat public/robots.txt | head -6
- 3
Commit and deploy. Netlify, Vercel, Cloudflare Pages, and Gatsby Cloud all serve
public/files automatically.
Method 2: gatsby-plugin-robots-txt
For dynamic robots.txt generation tied to your Gatsby config — useful for staging environments or referencing siteUrl in the Sitemap directive.
# Install npm install gatsby-plugin-robots-txt
// gatsby-config.js
module.exports = {
siteMetadata: {
siteUrl: 'https://yourdomain.com',
},
plugins: [
{
resolve: 'gatsby-plugin-robots-txt',
options: {
host: 'https://yourdomain.com',
sitemap: 'https://yourdomain.com/sitemap/sitemap-index.xml',
policy: [
{ userAgent: '*', allow: '/' },
{ userAgent: 'GPTBot', disallow: '/' },
{ userAgent: 'ChatGPT-User', disallow: '/' },
{ userAgent: 'OAI-SearchBot', disallow: '/' },
{ userAgent: 'ClaudeBot', disallow: '/' },
{ userAgent: 'anthropic-ai', disallow: '/' },
{ userAgent: 'Google-Extended', disallow: '/' },
{ userAgent: 'Bytespider', disallow: '/' },
{ userAgent: 'CCBot', disallow: '/' },
{ userAgent: 'PerplexityBot', disallow: '/' },
{ userAgent: 'meta-externalagent', disallow: '/' },
{ userAgent: 'Diffbot', disallow: '/' },
{ userAgent: 'DeepSeekBot', disallow: '/' },
{ userAgent: 'MistralBot', disallow: '/' },
{ userAgent: 'cohere-ai', disallow: '/' },
{ userAgent: 'AI2Bot', disallow: '/' },
// Add full list...
],
},
},
],
};Method 3: noai Meta Tag via gatsby-ssr.js
gatsby-ssr.js runs at build time and injects elements into every generated page's <head>. This is the most reliable way to add a global noai meta tag — it runs before React hydration and applies to every page including dynamically generated ones.
Create gatsby-ssr.js in your project root:
// gatsby-ssr.js
const React = require('react');
exports.onRenderBody = ({ setHeadComponents }) => {
setHeadComponents([
React.createElement('meta', {
key: 'robots-noai',
name: 'robots',
content: 'noai, noimageai',
}),
]);
};TypeScript version (gatsby-ssr.tsx):
// gatsby-ssr.tsx
import React from 'react';
import type { GatsbySSR } from 'gatsby';
export const onRenderBody: GatsbySSR['onRenderBody'] = ({
setHeadComponents,
}) => {
setHeadComponents([
<meta
key="robots-noai"
name="robots"
content="noai, noimageai"
/>,
]);
};gatsby-ssr.js require a fresh development server start — gatsby develop hot reload does not pick up SSR file changes. Run gatsby build && gatsby serve to verify the tag appears in the generated HTML.Method 4: Gatsby 5 Head Export (Per-Page)
Gatsby 5 introduced the Head export — a React component for page-level <head> management without react-helmet. Add it to your root layout or specific pages for global or per-page control.
// src/pages/index.tsx (or any page — Gatsby 5+)
import React from 'react';
// The Head export is rendered into the page's <head>
export function Head() {
return (
<>
<meta name="robots" content="noai, noimageai" />
</>
);
}
export default function IndexPage() {
return <main>...</main>;
}For Gatsby 4 — use gatsby-plugin-react-helmet:
// In your layout component (Gatsby 4)
import { Helmet } from 'react-helmet';
export function Layout({ children }) {
return (
<>
<Helmet>
<meta name="robots" content="noai, noimageai" />
</Helmet>
{children}
</>
);
}Method 5: Netlify / Cloudflare Pages Headers
For edge-level blocking. Place a _headers file in your static/ directory — Gatsby copies it to public/_headers, which Netlify and Cloudflare Pages read natively to set HTTP response headers.
static/_headers (Netlify + Cloudflare Pages)
/* X-Robots-Tag: noai, noimageai
Sets the HTTP X-Robots-Tag header on every response. More authoritative than the HTML meta tag.
netlify.toml
[[headers]]
for = "/*"
[headers.values]
X-Robots-Tag = "noai, noimageai"Cloudflare WAF for hard bot blocking:
(http.user_agent contains "GPTBot") or (http.user_agent contains "ClaudeBot") or (http.user_agent contains "CCBot") or (http.user_agent contains "Bytespider") or (http.user_agent contains "Google-Extended") or (http.user_agent contains "Diffbot") or (http.user_agent contains "meta-externalagent") or (http.user_agent contains "DeepSeekBot")
Action: Block. Stops robots.txt violators before they reach your origin.
Full AI Bot Reference
All 25 AI bots covered by the robots.txt block list above:
Frequently Asked Questions
Where do I put robots.txt in a Gatsby site?↓
The simplest approach: put robots.txt in the static/ directory at your Gatsby project root. Gatsby copies everything in static/ to the public/ output directory unchanged during the build. So static/robots.txt becomes public/robots.txt and is served at yourdomain.com/robots.txt. Alternatively, use gatsby-plugin-robots-txt to generate robots.txt from your gatsby-config.js — this lets you use site metadata and environment variables in your rules. Both methods work with Gatsby 4 and 5.
How do I add a noai meta tag to every Gatsby page?↓
The recommended method for Gatsby is gatsby-ssr.js: export the onRenderBody function and use setHeadComponents to inject a <meta name='robots' content='noai, noimageai'> element into every page's <head>. This runs at build time (SSG) or on the server (SSR) and is the most reliable global approach. Alternatively, use gatsby-plugin-react-helmet (Gatsby 4) or the built-in Head export (Gatsby 5) in your base layout component.
What is gatsby-plugin-robots-txt and do I need it?↓
gatsby-plugin-robots-txt is an npm package that generates robots.txt from your gatsby-config.js configuration during the Gatsby build. It's useful if you want to: reference your siteUrl from gatsby-config.js in the Sitemap directive, use different rules per environment (e.g. block everything on staging), or manage rules alongside your other Gatsby config. For simple static AI bot blocking, putting a plain robots.txt in static/ is simpler and doesn't require an additional dependency.
How do I use gatsby-ssr.js to inject meta tags?↓
Create gatsby-ssr.js (or gatsby-ssr.ts for TypeScript) in your project root. Export the onRenderBody function: exports.onRenderBody = ({ setHeadComponents }) => { setHeadComponents([ React.createElement('meta', { name: 'robots', content: 'noai, noimageai', key: 'robots-noai' }) ]); }. This injects the meta tag into every generated page's <head> at build time. It works with both Gatsby 4 and Gatsby 5.
Does blocking AI bots affect Gatsby's built-in SEO features?↓
No. Blocking GPTBot, ClaudeBot, CCBot, and other AI training bots has zero effect on Googlebot or Bingbot. Gatsby's built-in sitemap plugin (gatsby-plugin-sitemap), canonical URLs, and meta robots settings are completely unaffected. Your Gatsby site's search engine visibility remains the same.
How do I block AI bots on Gatsby deployed to Netlify?↓
Two approaches work on Netlify: (1) static/robots.txt — Gatsby copies it to public/ and Netlify serves it automatically; (2) netlify.toml headers — add [[headers]] with X-Robots-Tag: noai, noimageai for all routes, or use Netlify's Edge Functions (similar to Vercel Middleware) to block specific user agents with a 403. The netlify.toml approach is most powerful for stopping bots that ignore robots.txt. Netlify also reads a _headers file in your static/ directory (copied to public/_headers) as an alternative to netlify.toml.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.