How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

Payload CMS · Next.js · TypeScript·9 min read

How to Block AI Bots on Payload CMS: Complete 2026 Guide

Payload CMS v3 runs on Next.js App Router — the same techniques that work for Next.js work here, with one critical addition: you must never accidentally block the /admin panel or /api routes. This guide covers robots.txt via the Next.js Metadata API, hard 403 blocking in middleware.ts with the admin path exemption, noai meta tags via generateMetadata, and Payload v2 (Express-based) patterns for legacy projects.

Payload v3 (Next.js) vs v2 (Express)

Payload v3 (released 2024, current stable) is built on Next.js App Router. All bot-blocking techniques from the Next.js guide apply directly. Payload v2 is Express-based — see the Express.js guide for the middleware pattern. This guide covers v3 in full, with a v2 section at the end.

Methods at a glance

Method	What it does	Blocks JS-less bots?
public/robots.txt	Signals crawlers to stay out	Signal only
app/robots.ts (Metadata API)	Dynamic robots.txt with env rules	Signal only
generateMetadata robots field	noai meta per page or globally	✓ (server-rendered)
X-Robots-Tag in next.config.mjs	noai header site-wide	✓ (header)
middleware.ts hard block	Hard 403 — before page render	✓
Payload SEO plugin	Per-document robots meta	✓ (server-rendered)
nginx / Vercel WAF	Hard 403 at infrastructure layer	✓

1. robots.txt — static or Metadata API

Two options in Payload v3. Static public/robots.txt is simplest. The app/robots.ts Metadata API approach is better for environment-based rules.

Option A — public/robots.txt (static)

# public/robots.txt
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

# Allow the admin panel to be crawled by legitimate bots
# (they won't be able to log in, but no harm in allowing)
User-agent: *
Allow: /

Option B — src/app/robots.ts (Metadata API, env-aware)

Delete public/robots.txt first — static files take precedence over Next.js route handlers.

// src/app/robots.ts
import { MetadataRoute } from "next";

export default function robots(): MetadataRoute.Robots {
  const isProd = process.env.NODE_ENV === "production";

  if (!isProd) {
    // Block everything in staging/dev — no AI crawler should index non-production
    return {
      rules: { userAgent: "*", disallow: "/" },
    };
  }

  return {
    rules: [
      { userAgent: "GPTBot", disallow: "/" },
      { userAgent: "ChatGPT-User", disallow: "/" },
      { userAgent: "OAI-SearchBot", disallow: "/" },
      { userAgent: "ClaudeBot", disallow: "/" },
      { userAgent: "Claude-Web", disallow: "/" },
      { userAgent: "anthropic-ai", disallow: "/" },
      { userAgent: "Google-Extended", disallow: "/" },
      { userAgent: "Bytespider", disallow: "/" },
      { userAgent: "CCBot", disallow: "/" },
      { userAgent: "PerplexityBot", disallow: "/" },
      { userAgent: "Applebot-Extended", disallow: "/" },
      { userAgent: "*", allow: "/" },
    ],
    sitemap: `${process.env.NEXT_PUBLIC_SERVER_URL}/sitemap.xml`,
  };
}

Dynamic robots.txt via Payload Global (CMS-editable)

To let content editors manage robots.txt from the Payload admin panel without a code deploy, create a Global and query it in robots.ts.

// src/globals/SiteSettings.ts — Payload Global
import { GlobalConfig } from "payload";

export const SiteSettings: GlobalConfig = {
  slug: "site-settings",
  fields: [
    {
      name: "blockAiBots",
      type: "checkbox",
      label: "Block AI training bots (GPTBot, ClaudeBot, etc.)",
      defaultValue: true,
    },
  ],
};

// src/app/robots.ts — query the Global
import { getPayload } from "payload";
import config from "@payload-config";
import { MetadataRoute } from "next";

export default async function robots(): Promise<MetadataRoute.Robots> {
  const payload = await getPayload({ config });
  const settings = await payload.findGlobal({ slug: "site-settings" });

  if (!settings.blockAiBots) {
    return { rules: { userAgent: "*", allow: "/" } };
  }

  return {
    rules: [
      { userAgent: "GPTBot", disallow: "/" },
      { userAgent: "ChatGPT-User", disallow: "/" },
      { userAgent: "ClaudeBot", disallow: "/" },
      { userAgent: "Google-Extended", disallow: "/" },
      { userAgent: "Bytespider", disallow: "/" },
      { userAgent: "CCBot", disallow: "/" },
      { userAgent: "*", allow: "/" },
    ],
  };
}

2. Hard 403 blocking — middleware.ts

Next.js edge middleware runs before any page or API handler. The critical Payload-specific requirement: never block /admin, /api, or /_next. The matcher config is the safest way to enforce this — middleware will not run at all for those paths.

src/middleware.ts

// src/middleware.ts
import { NextRequest, NextResponse } from "next/server";

// Compiled once at module load — not per-request
const BLOCKED_UAS =
  /GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|Applebot-Extended|meta-externalagent|Diffbot|ImagesiftBot/i;

export function middleware(request: NextRequest) {
  const ua = request.headers.get("user-agent") ?? "";

  if (BLOCKED_UAS.test(ua)) {
    return new NextResponse("Forbidden", {
      status: 403,
      headers: { "Content-Type": "text/plain" },
    });
  }

  return NextResponse.next();
}

export const config = {
  // Never run on: admin panel, Payload API, Next.js internals, static files
  matcher: [
    "/((?!admin|api|_next/static|_next/image|favicon\.ico|robots\.txt|sitemap\.xml).*)",
  ],
};

Admin panel gotcha

Payload's admin panel at /admin loads its own JavaScript and makes API calls to /api. If your middleware runs on these paths, you may accidentally block your own browser when using a headless browser testing tool or when the Payload admin app makes internal requests. The matcher pattern above excludes both.

Adding X-Robots-Tag in middleware

// src/middleware.ts — with X-Robots-Tag on all frontend responses
import { NextRequest, NextResponse } from "next/server";

const BLOCKED_UAS =
  /GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|Applebot-Extended/i;

export function middleware(request: NextRequest) {
  const ua = request.headers.get("user-agent") ?? "";

  if (BLOCKED_UAS.test(ua)) {
    return new NextResponse("Forbidden", { status: 403 });
  }

  const response = NextResponse.next();
  response.headers.set("X-Robots-Tag", "noai, noimageai");
  return response;
}

export const config = {
  matcher: [
    "/((?!admin|api|_next/static|_next/image|favicon\.ico|robots\.txt|sitemap\.xml).*)",
  ],
};

Alternative: next.config.mjs headers (simpler for X-Robots-Tag only)

// next.config.mjs
import { withPayload } from "@payloadcms/next/withPayload";

/** @type {import('next').NextConfig} */
const nextConfig = {
  async headers() {
    return [
      {
        // Apply to all frontend routes — not admin/api
        source: "/((?!admin|api).*)",
        headers: [
          {
            key: "X-Robots-Tag",
            value: "noai, noimageai",
          },
        ],
      },
    ];
  },
};

export default withPayload(nextConfig);

3. noai meta tags — generateMetadata

In Payload v3 (Next.js App Router), meta tags are server-rendered by default — AI crawlers see them on the initial HTML response without JavaScript.

Global default — root layout or root page

// src/app/(frontend)/layout.tsx (or src/app/layout.tsx)
import type { Metadata } from "next";

export const metadata: Metadata = {
  // Global noai default — applies to every page that doesn't override it
  robots: {
    index: true,
    follow: true,
    // Next.js Metadata API: use the 'other' field for non-standard robots directives
  },
  other: {
    robots: "noai, noimageai",
  },
};

export default function RootLayout({ children }: { children: React.ReactNode }) {
  return (
    <html lang="en">
      <body>{children}</body>
    </html>
  );
}

Per-page override via generateMetadata

// src/app/(frontend)/[slug]/page.tsx — dynamic Payload page
import type { Metadata } from "next";
import { getPayload } from "payload";
import config from "@payload-config";

type Props = {
  params: Promise<{ slug: string }>;
};

export async function generateMetadata({ params }: Props): Promise<Metadata> {
  const { slug } = await params;
  const payload = await getPayload({ config });

  const { docs } = await payload.find({
    collection: "pages",
    where: { slug: { equals: slug } },
    limit: 1,
  });

  const page = docs[0];
  if (!page) return {};

  return {
    title: page.title,
    description: page.meta?.description,
    // Per-page robots: check a field in Payload admin
    // page.meta?.blockAiTraining defaults to true in your collection config
    other: {
      robots: page.meta?.allowAiTraining ? "index, follow" : "noai, noimageai",
    },
  };
}

export default async function Page({ params }: Props) {
  const { slug } = await params;
  // ... render page
}

Payload SEO plugin — per-document robots field

The official @payloadcms/plugin-seo adds SEO fields to collections. Extend it with a custom blockAiTraining checkbox:

// payload.config.ts
import { seoPlugin } from "@payloadcms/plugin-seo";
import { buildConfig } from "payload";

export default buildConfig({
  plugins: [
    seoPlugin({
      collections: ["pages", "posts"],
      uploadsCollection: "media",
      // Add a custom field to the SEO group
      fields: [
        {
          name: "blockAiTraining",
          type: "checkbox",
          label: "Block AI training bots (noai)",
          defaultValue: true,
        },
      ],
      generateTitle: ({ doc }) => doc?.title ?? "",
      generateDescription: ({ doc }) => doc?.meta?.description ?? "",
    }),
  ],
  // ... rest of config
});

4. Payload v2 — Express middleware

Payload v2 exposes an Express app. Add bot-blocking middleware before the Payload handler. Same admin path exemption applies — exclude /admin and /api.

// server.js (Payload v2 custom server)
import express from "express";
import payload from "payload";

const BLOCKED_UAS =
  /GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|Applebot-Extended/i;

const app = express();

// Bot-blocking middleware — runs before Payload
app.use((req, res, next) => {
  // Never block admin or API routes
  if (req.path.startsWith("/admin") || req.path.startsWith("/api")) {
    return next();
  }
  // Never block robots.txt
  if (req.path === "/robots.txt") {
    return next();
  }

  const ua = req.headers["user-agent"] ?? "";
  if (BLOCKED_UAS.test(ua)) {
    return res.status(403).type("text/plain").send("Forbidden");
  }

  next();
});

// Serve robots.txt statically
app.use(express.static("public"));

await payload.init({
  secret: process.env.PAYLOAD_SECRET,
  express: app,
  onInit: () => {
    payload.logger.info(`Payload Admin URL: ${payload.getAdminURL()}`);
  },
});

app.listen(3000);

5. Deployment

Payload v3 deploys like any Next.js app. The database connection (MongoDB or Postgres) and Payload secret are the only extra environment variables.

Platform	Notes	middleware.ts runs?
Vercel	Auto-detect Next.js, add PAYLOAD_SECRET + DB_URI env vars	✓ Edge Function
Railway	Docker or Nixpacks auto-build, add Postgres addon	✓ per request
Render	Web Service with Dockerfile or auto-build	✓ per request
Docker + VPS	Multi-stage build, NODE_ENV=production, nginx in front	✓ per request
Payload Cloud	Managed hosting by Payload — add env vars in dashboard	✓ Edge

Docker — multi-stage build

# Dockerfile
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npx next build

FROM node:22-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public
EXPOSE 3000
CMD ["node", "server.js"]

# next.config.mjs must include: output: "standalone"

Frequently asked questions

How do I serve robots.txt in Payload CMS v3?

Place robots.txt in public/ for a static file, or create src/app/robots.ts using the Next.js Metadata API for environment-based rules. For CMS-editable rules, create a Payload Global and query it from robots.ts. Delete public/robots.txt if you use the API route — static files take precedence.

How do I block AI bots in Payload CMS without breaking the admin panel?

In src/middleware.ts, use the matcher config to exclude /admin, /api, and /_next paths. The middleware will not run for those paths at all — safer than an in-function path check because it avoids edge cases where path checks might not match all admin sub-routes.

How do I add noai meta tags to Payload CMS v3 pages?

Export a metadata constant or generateMetadata function from your page component. Use the other: { robots: "noai, noimageai" } field for non-standard robots directives. Set a global default in the root layout and override per-page based on a Payload field.

What is different about blocking bots in Payload v2 vs v3?

Payload v2 is Express-based — add an app.use() middleware before the Payload handler. Payload v3 is Next.js App Router — use src/middleware.ts with the matcher config. Both require the same admin path exemption.

Does Payload CMS have a robots.txt field in the admin panel?

Not natively, but you can build it. Create a Globals collection (e.g., SiteSettings) with a blockAiBots checkbox, then query it in app/robots.ts. Content editors can toggle AI bot blocking from the Payload admin without a code deploy.

How do I add X-Robots-Tag headers in Payload CMS v3?

Two options: (1) in src/middleware.ts via response.headers.set("X-Robots-Tag", "noai, noimageai"); (2) in next.config.mjs via the headers() export. Use a path pattern that excludes /admin and /api in both cases.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.