Skip to content
Payload CMS · Next.js · TypeScript·9 min read

How to Block AI Bots on Payload CMS: Complete 2026 Guide

Payload CMS v3 runs on Next.js App Router — the same techniques that work for Next.js work here, with one critical addition: you must never accidentally block the /admin panel or /api routes. This guide covers robots.txt via the Next.js Metadata API, hard 403 blocking in middleware.ts with the admin path exemption, noai meta tags via generateMetadata, and Payload v2 (Express-based) patterns for legacy projects.

Payload v3 (Next.js) vs v2 (Express)

Payload v3 (released 2024, current stable) is built on Next.js App Router. All bot-blocking techniques from the Next.js guide apply directly. Payload v2 is Express-based — see the Express.js guide for the middleware pattern. This guide covers v3 in full, with a v2 section at the end.

Methods at a glance

MethodWhat it doesBlocks JS-less bots?
public/robots.txtSignals crawlers to stay outSignal only
app/robots.ts (Metadata API)Dynamic robots.txt with env rulesSignal only
generateMetadata robots fieldnoai meta per page or globally✓ (server-rendered)
X-Robots-Tag in next.config.mjsnoai header site-wide✓ (header)
middleware.ts hard blockHard 403 — before page render
Payload SEO pluginPer-document robots meta✓ (server-rendered)
nginx / Vercel WAFHard 403 at infrastructure layer

1. robots.txt — static or Metadata API

Two options in Payload v3. Static public/robots.txt is simplest. The app/robots.ts Metadata API approach is better for environment-based rules.

Option A — public/robots.txt (static)

# public/robots.txt
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

# Allow the admin panel to be crawled by legitimate bots
# (they won't be able to log in, but no harm in allowing)
User-agent: *
Allow: /

Option B — src/app/robots.ts (Metadata API, env-aware)

Delete public/robots.txt first — static files take precedence over Next.js route handlers.

// src/app/robots.ts
import { MetadataRoute } from "next";

export default function robots(): MetadataRoute.Robots {
  const isProd = process.env.NODE_ENV === "production";

  if (!isProd) {
    // Block everything in staging/dev — no AI crawler should index non-production
    return {
      rules: { userAgent: "*", disallow: "/" },
    };
  }

  return {
    rules: [
      { userAgent: "GPTBot", disallow: "/" },
      { userAgent: "ChatGPT-User", disallow: "/" },
      { userAgent: "OAI-SearchBot", disallow: "/" },
      { userAgent: "ClaudeBot", disallow: "/" },
      { userAgent: "Claude-Web", disallow: "/" },
      { userAgent: "anthropic-ai", disallow: "/" },
      { userAgent: "Google-Extended", disallow: "/" },
      { userAgent: "Bytespider", disallow: "/" },
      { userAgent: "CCBot", disallow: "/" },
      { userAgent: "PerplexityBot", disallow: "/" },
      { userAgent: "Applebot-Extended", disallow: "/" },
      { userAgent: "*", allow: "/" },
    ],
    sitemap: `${process.env.NEXT_PUBLIC_SERVER_URL}/sitemap.xml`,
  };
}

Dynamic robots.txt via Payload Global (CMS-editable)

To let content editors manage robots.txt from the Payload admin panel without a code deploy, create a Global and query it in robots.ts.

// src/globals/SiteSettings.ts — Payload Global
import { GlobalConfig } from "payload";

export const SiteSettings: GlobalConfig = {
  slug: "site-settings",
  fields: [
    {
      name: "blockAiBots",
      type: "checkbox",
      label: "Block AI training bots (GPTBot, ClaudeBot, etc.)",
      defaultValue: true,
    },
  ],
};

// src/app/robots.ts — query the Global
import { getPayload } from "payload";
import config from "@payload-config";
import { MetadataRoute } from "next";

export default async function robots(): Promise<MetadataRoute.Robots> {
  const payload = await getPayload({ config });
  const settings = await payload.findGlobal({ slug: "site-settings" });

  if (!settings.blockAiBots) {
    return { rules: { userAgent: "*", allow: "/" } };
  }

  return {
    rules: [
      { userAgent: "GPTBot", disallow: "/" },
      { userAgent: "ChatGPT-User", disallow: "/" },
      { userAgent: "ClaudeBot", disallow: "/" },
      { userAgent: "Google-Extended", disallow: "/" },
      { userAgent: "Bytespider", disallow: "/" },
      { userAgent: "CCBot", disallow: "/" },
      { userAgent: "*", allow: "/" },
    ],
  };
}

2. Hard 403 blocking — middleware.ts

Next.js edge middleware runs before any page or API handler. The critical Payload-specific requirement: never block /admin, /api, or /_next. The matcher config is the safest way to enforce this — middleware will not run at all for those paths.

src/middleware.ts

// src/middleware.ts
import { NextRequest, NextResponse } from "next/server";

// Compiled once at module load — not per-request
const BLOCKED_UAS =
  /GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|Applebot-Extended|meta-externalagent|Diffbot|ImagesiftBot/i;

export function middleware(request: NextRequest) {
  const ua = request.headers.get("user-agent") ?? "";

  if (BLOCKED_UAS.test(ua)) {
    return new NextResponse("Forbidden", {
      status: 403,
      headers: { "Content-Type": "text/plain" },
    });
  }

  return NextResponse.next();
}

export const config = {
  // Never run on: admin panel, Payload API, Next.js internals, static files
  matcher: [
    "/((?!admin|api|_next/static|_next/image|favicon\.ico|robots\.txt|sitemap\.xml).*)",
  ],
};

Admin panel gotcha

Payload's admin panel at /admin loads its own JavaScript and makes API calls to /api. If your middleware runs on these paths, you may accidentally block your own browser when using a headless browser testing tool or when the Payload admin app makes internal requests. The matcher pattern above excludes both.

Adding X-Robots-Tag in middleware

// src/middleware.ts — with X-Robots-Tag on all frontend responses
import { NextRequest, NextResponse } from "next/server";

const BLOCKED_UAS =
  /GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|Applebot-Extended/i;

export function middleware(request: NextRequest) {
  const ua = request.headers.get("user-agent") ?? "";

  if (BLOCKED_UAS.test(ua)) {
    return new NextResponse("Forbidden", { status: 403 });
  }

  const response = NextResponse.next();
  response.headers.set("X-Robots-Tag", "noai, noimageai");
  return response;
}

export const config = {
  matcher: [
    "/((?!admin|api|_next/static|_next/image|favicon\.ico|robots\.txt|sitemap\.xml).*)",
  ],
};

Alternative: next.config.mjs headers (simpler for X-Robots-Tag only)

// next.config.mjs
import { withPayload } from "@payloadcms/next/withPayload";

/** @type {import('next').NextConfig} */
const nextConfig = {
  async headers() {
    return [
      {
        // Apply to all frontend routes — not admin/api
        source: "/((?!admin|api).*)",
        headers: [
          {
            key: "X-Robots-Tag",
            value: "noai, noimageai",
          },
        ],
      },
    ];
  },
};

export default withPayload(nextConfig);

3. noai meta tags — generateMetadata

In Payload v3 (Next.js App Router), meta tags are server-rendered by default — AI crawlers see them on the initial HTML response without JavaScript.

Global default — root layout or root page

// src/app/(frontend)/layout.tsx (or src/app/layout.tsx)
import type { Metadata } from "next";

export const metadata: Metadata = {
  // Global noai default — applies to every page that doesn't override it
  robots: {
    index: true,
    follow: true,
    // Next.js Metadata API: use the 'other' field for non-standard robots directives
  },
  other: {
    robots: "noai, noimageai",
  },
};

export default function RootLayout({ children }: { children: React.ReactNode }) {
  return (
    <html lang="en">
      <body>{children}</body>
    </html>
  );
}

Per-page override via generateMetadata

// src/app/(frontend)/[slug]/page.tsx — dynamic Payload page
import type { Metadata } from "next";
import { getPayload } from "payload";
import config from "@payload-config";

type Props = {
  params: Promise<{ slug: string }>;
};

export async function generateMetadata({ params }: Props): Promise<Metadata> {
  const { slug } = await params;
  const payload = await getPayload({ config });

  const { docs } = await payload.find({
    collection: "pages",
    where: { slug: { equals: slug } },
    limit: 1,
  });

  const page = docs[0];
  if (!page) return {};

  return {
    title: page.title,
    description: page.meta?.description,
    // Per-page robots: check a field in Payload admin
    // page.meta?.blockAiTraining defaults to true in your collection config
    other: {
      robots: page.meta?.allowAiTraining ? "index, follow" : "noai, noimageai",
    },
  };
}

export default async function Page({ params }: Props) {
  const { slug } = await params;
  // ... render page
}

Payload SEO plugin — per-document robots field

The official @payloadcms/plugin-seo adds SEO fields to collections. Extend it with a custom blockAiTraining checkbox:

// payload.config.ts
import { seoPlugin } from "@payloadcms/plugin-seo";
import { buildConfig } from "payload";

export default buildConfig({
  plugins: [
    seoPlugin({
      collections: ["pages", "posts"],
      uploadsCollection: "media",
      // Add a custom field to the SEO group
      fields: [
        {
          name: "blockAiTraining",
          type: "checkbox",
          label: "Block AI training bots (noai)",
          defaultValue: true,
        },
      ],
      generateTitle: ({ doc }) => doc?.title ?? "",
      generateDescription: ({ doc }) => doc?.meta?.description ?? "",
    }),
  ],
  // ... rest of config
});

4. Payload v2 — Express middleware

Payload v2 exposes an Express app. Add bot-blocking middleware before the Payload handler. Same admin path exemption applies — exclude /admin and /api.

// server.js (Payload v2 custom server)
import express from "express";
import payload from "payload";

const BLOCKED_UAS =
  /GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|Applebot-Extended/i;

const app = express();

// Bot-blocking middleware — runs before Payload
app.use((req, res, next) => {
  // Never block admin or API routes
  if (req.path.startsWith("/admin") || req.path.startsWith("/api")) {
    return next();
  }
  // Never block robots.txt
  if (req.path === "/robots.txt") {
    return next();
  }

  const ua = req.headers["user-agent"] ?? "";
  if (BLOCKED_UAS.test(ua)) {
    return res.status(403).type("text/plain").send("Forbidden");
  }

  next();
});

// Serve robots.txt statically
app.use(express.static("public"));

await payload.init({
  secret: process.env.PAYLOAD_SECRET,
  express: app,
  onInit: () => {
    payload.logger.info(`Payload Admin URL: ${payload.getAdminURL()}`);
  },
});

app.listen(3000);

5. Deployment

Payload v3 deploys like any Next.js app. The database connection (MongoDB or Postgres) and Payload secret are the only extra environment variables.

PlatformNotesmiddleware.ts runs?
VercelAuto-detect Next.js, add PAYLOAD_SECRET + DB_URI env vars✓ Edge Function
RailwayDocker or Nixpacks auto-build, add Postgres addon✓ per request
RenderWeb Service with Dockerfile or auto-build✓ per request
Docker + VPSMulti-stage build, NODE_ENV=production, nginx in front✓ per request
Payload CloudManaged hosting by Payload — add env vars in dashboard✓ Edge

Docker — multi-stage build

# Dockerfile
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npx next build

FROM node:22-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public
EXPOSE 3000
CMD ["node", "server.js"]

# next.config.mjs must include: output: "standalone"

Frequently asked questions

How do I serve robots.txt in Payload CMS v3?

Place robots.txt in public/ for a static file, or create src/app/robots.ts using the Next.js Metadata API for environment-based rules. For CMS-editable rules, create a Payload Global and query it from robots.ts. Delete public/robots.txt if you use the API route — static files take precedence.

How do I block AI bots in Payload CMS without breaking the admin panel?

In src/middleware.ts, use the matcher config to exclude /admin, /api, and /_next paths. The middleware will not run for those paths at all — safer than an in-function path check because it avoids edge cases where path checks might not match all admin sub-routes.

How do I add noai meta tags to Payload CMS v3 pages?

Export a metadata constant or generateMetadata function from your page component. Use the other: { robots: "noai, noimageai" } field for non-standard robots directives. Set a global default in the root layout and override per-page based on a Payload field.

What is different about blocking bots in Payload v2 vs v3?

Payload v2 is Express-based — add an app.use() middleware before the Payload handler. Payload v3 is Next.js App Router — use src/middleware.ts with the matcher config. Both require the same admin path exemption.

Does Payload CMS have a robots.txt field in the admin panel?

Not natively, but you can build it. Create a Globals collection (e.g., SiteSettings) with a blockAiBots checkbox, then query it in app/robots.ts. Content editors can toggle AI bot blocking from the Payload admin without a code deploy.

How do I add X-Robots-Tag headers in Payload CMS v3?

Two options: (1) in src/middleware.ts via response.headers.set("X-Robots-Tag", "noai, noimageai"); (2) in next.config.mjs via the headers() export. Use a path pattern that excludes /admin and /api in both cases.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

Related Guides