How to Block AI Bots on Payload CMS: Complete 2026 Guide
Payload CMS v3 runs on Next.js App Router — the same techniques that work for Next.js work here, with one critical addition: you must never accidentally block the /admin panel or /api routes. This guide covers robots.txt via the Next.js Metadata API, hard 403 blocking in middleware.ts with the admin path exemption, noai meta tags via generateMetadata, and Payload v2 (Express-based) patterns for legacy projects.
Payload v3 (Next.js) vs v2 (Express)
Payload v3 (released 2024, current stable) is built on Next.js App Router. All bot-blocking techniques from the Next.js guide apply directly. Payload v2 is Express-based — see the Express.js guide for the middleware pattern. This guide covers v3 in full, with a v2 section at the end.
Methods at a glance
| Method | What it does | Blocks JS-less bots? |
|---|---|---|
| public/robots.txt | Signals crawlers to stay out | Signal only |
| app/robots.ts (Metadata API) | Dynamic robots.txt with env rules | Signal only |
| generateMetadata robots field | noai meta per page or globally | ✓ (server-rendered) |
| X-Robots-Tag in next.config.mjs | noai header site-wide | ✓ (header) |
| middleware.ts hard block | Hard 403 — before page render | ✓ |
| Payload SEO plugin | Per-document robots meta | ✓ (server-rendered) |
| nginx / Vercel WAF | Hard 403 at infrastructure layer | ✓ |
1. robots.txt — static or Metadata API
Two options in Payload v3. Static public/robots.txt is simplest. The app/robots.ts Metadata API approach is better for environment-based rules.
Option A — public/robots.txt (static)
# public/robots.txt
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Applebot-Extended
Disallow: /
# Allow the admin panel to be crawled by legitimate bots
# (they won't be able to log in, but no harm in allowing)
User-agent: *
Allow: /Option B — src/app/robots.ts (Metadata API, env-aware)
Delete public/robots.txt first — static files take precedence over Next.js route handlers.
// src/app/robots.ts
import { MetadataRoute } from "next";
export default function robots(): MetadataRoute.Robots {
const isProd = process.env.NODE_ENV === "production";
if (!isProd) {
// Block everything in staging/dev — no AI crawler should index non-production
return {
rules: { userAgent: "*", disallow: "/" },
};
}
return {
rules: [
{ userAgent: "GPTBot", disallow: "/" },
{ userAgent: "ChatGPT-User", disallow: "/" },
{ userAgent: "OAI-SearchBot", disallow: "/" },
{ userAgent: "ClaudeBot", disallow: "/" },
{ userAgent: "Claude-Web", disallow: "/" },
{ userAgent: "anthropic-ai", disallow: "/" },
{ userAgent: "Google-Extended", disallow: "/" },
{ userAgent: "Bytespider", disallow: "/" },
{ userAgent: "CCBot", disallow: "/" },
{ userAgent: "PerplexityBot", disallow: "/" },
{ userAgent: "Applebot-Extended", disallow: "/" },
{ userAgent: "*", allow: "/" },
],
sitemap: `${process.env.NEXT_PUBLIC_SERVER_URL}/sitemap.xml`,
};
}Dynamic robots.txt via Payload Global (CMS-editable)
To let content editors manage robots.txt from the Payload admin panel without a code deploy, create a Global and query it in robots.ts.
// src/globals/SiteSettings.ts — Payload Global
import { GlobalConfig } from "payload";
export const SiteSettings: GlobalConfig = {
slug: "site-settings",
fields: [
{
name: "blockAiBots",
type: "checkbox",
label: "Block AI training bots (GPTBot, ClaudeBot, etc.)",
defaultValue: true,
},
],
};
// src/app/robots.ts — query the Global
import { getPayload } from "payload";
import config from "@payload-config";
import { MetadataRoute } from "next";
export default async function robots(): Promise<MetadataRoute.Robots> {
const payload = await getPayload({ config });
const settings = await payload.findGlobal({ slug: "site-settings" });
if (!settings.blockAiBots) {
return { rules: { userAgent: "*", allow: "/" } };
}
return {
rules: [
{ userAgent: "GPTBot", disallow: "/" },
{ userAgent: "ChatGPT-User", disallow: "/" },
{ userAgent: "ClaudeBot", disallow: "/" },
{ userAgent: "Google-Extended", disallow: "/" },
{ userAgent: "Bytespider", disallow: "/" },
{ userAgent: "CCBot", disallow: "/" },
{ userAgent: "*", allow: "/" },
],
};
}2. Hard 403 blocking — middleware.ts
Next.js edge middleware runs before any page or API handler. The critical Payload-specific requirement: never block /admin, /api, or /_next. The matcher config is the safest way to enforce this — middleware will not run at all for those paths.
src/middleware.ts
// src/middleware.ts
import { NextRequest, NextResponse } from "next/server";
// Compiled once at module load — not per-request
const BLOCKED_UAS =
/GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|Applebot-Extended|meta-externalagent|Diffbot|ImagesiftBot/i;
export function middleware(request: NextRequest) {
const ua = request.headers.get("user-agent") ?? "";
if (BLOCKED_UAS.test(ua)) {
return new NextResponse("Forbidden", {
status: 403,
headers: { "Content-Type": "text/plain" },
});
}
return NextResponse.next();
}
export const config = {
// Never run on: admin panel, Payload API, Next.js internals, static files
matcher: [
"/((?!admin|api|_next/static|_next/image|favicon\.ico|robots\.txt|sitemap\.xml).*)",
],
};Admin panel gotcha
Payload's admin panel at /admin loads its own JavaScript and makes API calls to /api. If your middleware runs on these paths, you may accidentally block your own browser when using a headless browser testing tool or when the Payload admin app makes internal requests. The matcher pattern above excludes both.
Adding X-Robots-Tag in middleware
// src/middleware.ts — with X-Robots-Tag on all frontend responses
import { NextRequest, NextResponse } from "next/server";
const BLOCKED_UAS =
/GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|Applebot-Extended/i;
export function middleware(request: NextRequest) {
const ua = request.headers.get("user-agent") ?? "";
if (BLOCKED_UAS.test(ua)) {
return new NextResponse("Forbidden", { status: 403 });
}
const response = NextResponse.next();
response.headers.set("X-Robots-Tag", "noai, noimageai");
return response;
}
export const config = {
matcher: [
"/((?!admin|api|_next/static|_next/image|favicon\.ico|robots\.txt|sitemap\.xml).*)",
],
};Alternative: next.config.mjs headers (simpler for X-Robots-Tag only)
// next.config.mjs
import { withPayload } from "@payloadcms/next/withPayload";
/** @type {import('next').NextConfig} */
const nextConfig = {
async headers() {
return [
{
// Apply to all frontend routes — not admin/api
source: "/((?!admin|api).*)",
headers: [
{
key: "X-Robots-Tag",
value: "noai, noimageai",
},
],
},
];
},
};
export default withPayload(nextConfig);3. noai meta tags — generateMetadata
In Payload v3 (Next.js App Router), meta tags are server-rendered by default — AI crawlers see them on the initial HTML response without JavaScript.
Global default — root layout or root page
// src/app/(frontend)/layout.tsx (or src/app/layout.tsx)
import type { Metadata } from "next";
export const metadata: Metadata = {
// Global noai default — applies to every page that doesn't override it
robots: {
index: true,
follow: true,
// Next.js Metadata API: use the 'other' field for non-standard robots directives
},
other: {
robots: "noai, noimageai",
},
};
export default function RootLayout({ children }: { children: React.ReactNode }) {
return (
<html lang="en">
<body>{children}</body>
</html>
);
}Per-page override via generateMetadata
// src/app/(frontend)/[slug]/page.tsx — dynamic Payload page
import type { Metadata } from "next";
import { getPayload } from "payload";
import config from "@payload-config";
type Props = {
params: Promise<{ slug: string }>;
};
export async function generateMetadata({ params }: Props): Promise<Metadata> {
const { slug } = await params;
const payload = await getPayload({ config });
const { docs } = await payload.find({
collection: "pages",
where: { slug: { equals: slug } },
limit: 1,
});
const page = docs[0];
if (!page) return {};
return {
title: page.title,
description: page.meta?.description,
// Per-page robots: check a field in Payload admin
// page.meta?.blockAiTraining defaults to true in your collection config
other: {
robots: page.meta?.allowAiTraining ? "index, follow" : "noai, noimageai",
},
};
}
export default async function Page({ params }: Props) {
const { slug } = await params;
// ... render page
}Payload SEO plugin — per-document robots field
The official @payloadcms/plugin-seo adds SEO fields to collections. Extend it with a custom blockAiTraining checkbox:
// payload.config.ts
import { seoPlugin } from "@payloadcms/plugin-seo";
import { buildConfig } from "payload";
export default buildConfig({
plugins: [
seoPlugin({
collections: ["pages", "posts"],
uploadsCollection: "media",
// Add a custom field to the SEO group
fields: [
{
name: "blockAiTraining",
type: "checkbox",
label: "Block AI training bots (noai)",
defaultValue: true,
},
],
generateTitle: ({ doc }) => doc?.title ?? "",
generateDescription: ({ doc }) => doc?.meta?.description ?? "",
}),
],
// ... rest of config
});4. Payload v2 — Express middleware
Payload v2 exposes an Express app. Add bot-blocking middleware before the Payload handler. Same admin path exemption applies — exclude /admin and /api.
// server.js (Payload v2 custom server)
import express from "express";
import payload from "payload";
const BLOCKED_UAS =
/GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|Applebot-Extended/i;
const app = express();
// Bot-blocking middleware — runs before Payload
app.use((req, res, next) => {
// Never block admin or API routes
if (req.path.startsWith("/admin") || req.path.startsWith("/api")) {
return next();
}
// Never block robots.txt
if (req.path === "/robots.txt") {
return next();
}
const ua = req.headers["user-agent"] ?? "";
if (BLOCKED_UAS.test(ua)) {
return res.status(403).type("text/plain").send("Forbidden");
}
next();
});
// Serve robots.txt statically
app.use(express.static("public"));
await payload.init({
secret: process.env.PAYLOAD_SECRET,
express: app,
onInit: () => {
payload.logger.info(`Payload Admin URL: ${payload.getAdminURL()}`);
},
});
app.listen(3000);5. Deployment
Payload v3 deploys like any Next.js app. The database connection (MongoDB or Postgres) and Payload secret are the only extra environment variables.
| Platform | Notes | middleware.ts runs? |
|---|---|---|
| Vercel | Auto-detect Next.js, add PAYLOAD_SECRET + DB_URI env vars | ✓ Edge Function |
| Railway | Docker or Nixpacks auto-build, add Postgres addon | ✓ per request |
| Render | Web Service with Dockerfile or auto-build | ✓ per request |
| Docker + VPS | Multi-stage build, NODE_ENV=production, nginx in front | ✓ per request |
| Payload Cloud | Managed hosting by Payload — add env vars in dashboard | ✓ Edge |
Docker — multi-stage build
# Dockerfile
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npx next build
FROM node:22-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public
EXPOSE 3000
CMD ["node", "server.js"]
# next.config.mjs must include: output: "standalone"Frequently asked questions
How do I serve robots.txt in Payload CMS v3?
Place robots.txt in public/ for a static file, or create src/app/robots.ts using the Next.js Metadata API for environment-based rules. For CMS-editable rules, create a Payload Global and query it from robots.ts. Delete public/robots.txt if you use the API route — static files take precedence.
How do I block AI bots in Payload CMS without breaking the admin panel?
In src/middleware.ts, use the matcher config to exclude /admin, /api, and /_next paths. The middleware will not run for those paths at all — safer than an in-function path check because it avoids edge cases where path checks might not match all admin sub-routes.
How do I add noai meta tags to Payload CMS v3 pages?
Export a metadata constant or generateMetadata function from your page component. Use the other: { robots: "noai, noimageai" } field for non-standard robots directives. Set a global default in the root layout and override per-page based on a Payload field.
What is different about blocking bots in Payload v2 vs v3?
Payload v2 is Express-based — add an app.use() middleware before the Payload handler. Payload v3 is Next.js App Router — use src/middleware.ts with the matcher config. Both require the same admin path exemption.
Does Payload CMS have a robots.txt field in the admin panel?
Not natively, but you can build it. Create a Globals collection (e.g., SiteSettings) with a blockAiBots checkbox, then query it in app/robots.ts. Content editors can toggle AI bot blocking from the Payload admin without a code deploy.
How do I add X-Robots-Tag headers in Payload CMS v3?
Two options: (1) in src/middleware.ts via response.headers.set("X-Robots-Tag", "noai, noimageai"); (2) in next.config.mjs via the headers() export. Use a path pattern that excludes /admin and /api in both cases.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.