Skip to content
Express.jsNode.jsNew8 min read

How to Block AI Bots on Express.js

Express middleware gives you precise control over bot blocking — an app.use() function runs before every route, checks the user agent, and returns 403 before any page logic executes. Combine with a static robots.txt and nginx for layered defence.

Middleware order is everything in Express

Express processes app.use() calls in registration order. Your bot-blocking middleware must be registered before your route definitions — otherwise the routes respond before the bot check runs. The recommended order:

express.static() blockAiBots middleware your routes

Quick fix — public/robots.txt + express.static

Create public/robots.txt and make sure app.use(express.static('public')) is in your app.

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

All Methods

express.static robots.txt (Recommended)

Easy

All deployments

public/robots.txt + app.use(express.static)

Place robots.txt in public/ and serve it with express.static('public'). Express serves the static file before any route handlers — zero application logic for robots.txt requests.

Plain text only. express.static must be registered before other middleware for this to work as expected.

GET /robots.txt route handler

Easy

All deployments

app.get('/robots.txt', ...)

A route handler that generates robots.txt content dynamically. Useful for environment-based rules (block all in dev/staging) or pulling the bot list from configuration.

Register before other routes. Conflicts with static file if public/robots.txt also exists — static file takes precedence.

app.use() middleware — X-Robots-Tag

Easy

All deployments

app.use() — before routes

Add a global middleware that sets res.setHeader('X-Robots-Tag', 'noai, noimageai') on every response. Applies at the HTTP layer — more authoritative than HTML meta tags.

Register after express.static() so the header is only applied to Express-handled responses, not static files.

app.use() middleware — hard bot blocking

Easy

All deployments

app.use() — before routes

Check req.headers[user-agent] and return res.status(403) for matched AI bots. Must be registered before route definitions — Express processes middleware in order.

Critical: use next() for non-bot requests, res.status(403).send() for bots. Never call both.

nginx reverse proxy — server-level block

Intermediate

nginx deployments

nginx server block config

Block AI bots in the nginx config before requests reach Node.js. Most efficient for high-traffic sites — matched bots never invoke Express.

Requires server access (VPS, Docker with nginx). Not available on Heroku, Railway, or other PaaS without custom buildpacks.

Method 1: express.static + public/robots.txt

The simplest approach. Create public/robots.txt in your project root and ensure app.use(express.static('public')) is registered. Express serves static files before route handlers — the request for /robots.txt never reaches your application logic.

// app.js (or server.js / index.js)
const express = require('express');
const app = express();

// Serve public/ directory as static assets — robots.txt served here
app.use(express.static('public'));

// ... your routes and other middleware
app.listen(3000);

Full public/robots.txt content:

User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: DeepSeekBot
Disallow: /

User-agent: MistralBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: AI2Bot
Disallow: /

User-agent: Ai2Bot-Dolma
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: DuckAssistBot
Disallow: /

User-agent: omgili
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: webzio-extended
Disallow: /

User-agent: gemini-deep-research
Disallow: /

Method 2: GET /robots.txt Route Handler

A route handler gives you dynamic control — vary rules by environment, pull the bot list from a config file, or add a Sitemap: directive using your actual domain. Register it before other routes and after removing public/robots.txt (the static file takes precedence if it exists).

// JavaScript
const AI_BOTS = [
  'GPTBot', 'ChatGPT-User', 'OAI-SearchBot',
  'ClaudeBot', 'anthropic-ai', 'Google-Extended',
  'Bytespider', 'CCBot', 'PerplexityBot',
  'meta-externalagent', 'Amazonbot', 'Applebot-Extended',
  'xAI-Bot', 'DeepSeekBot', 'MistralBot', 'Diffbot',
  'cohere-ai', 'AI2Bot', 'Ai2Bot-Dolma', 'YouBot',
  'DuckAssistBot', 'omgili', 'omgilibot',
  'webzio-extended', 'gemini-deep-research',
];

app.get('/robots.txt', (req, res) => {
  const isProd = process.env.NODE_ENV === 'production';

  const lines = ['User-agent: *', 'Allow: /', ''];

  if (!isProd) {
    // Block everything on non-production
    lines.unshift('User-agent: *', 'Disallow: /', '');
  } else {
    for (const bot of AI_BOTS) {
      lines.push(`User-agent: ${bot}`, 'Disallow: /', '');
    }
  }

  lines.push(`Sitemap: ${req.protocol}://${req.hostname}/sitemap.xml`);

  res.type('text/plain');
  res.set('Cache-Control', 'public, max-age=86400');
  res.send(lines.join('\n'));
});

TypeScript version:

// TypeScript
import { Request, Response } from 'express';

app.get('/robots.txt', (req: Request, res: Response): void => {
  const lines = ['User-agent: *', 'Allow: /', ''];
  for (const bot of AI_BOTS) {
    lines.push(`User-agent: ${bot}`, 'Disallow: /', '');
  }
  res.type('text/plain').send(lines.join('\n'));
});

Method 3: X-Robots-Tag Header Middleware

Set X-Robots-Tag: noai, noimageai on every response via a global middleware. This HTTP header is more authoritative than the HTML meta tag — bots that download HTML without executing JavaScript still see it.

// Global X-Robots-Tag header — register before routes
app.use((req, res, next) => {
  res.setHeader('X-Robots-Tag', 'noai, noimageai');
  next();
});

// Or with helmet (if you're already using it):
// import helmet from 'helmet';
// helmet doesn't handle X-Robots-Tag natively, so add it separately:
app.use(helmet());
app.use((req, res, next) => {
  res.setHeader('X-Robots-Tag', 'noai, noimageai');
  next();
});

Method 4: Bot-Blocking Middleware (Hard Block)

An app.use() middleware that inspects the User-Agent header and returns a 403 before any route handler runs. Register it after express.static() (so robots.txt remains accessible) but before all route definitions:

// JavaScript
const BLOCKED_UAS = /GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|webzio-extended|gemini-deep-research/i;

function blockAiBots(req, res, next) {
  const ua = req.headers['user-agent'] || '';
  if (BLOCKED_UAS.test(ua)) {
    return res.status(403).send('Forbidden');
  }
  next();
}

// Registration order matters!
app.use(express.static('public'));  // 1. static files first (robots.txt accessible)
app.use(blockAiBots);               // 2. block bots before any routes
app.get('/', (req, res) => { ... }); // 3. routes after
// TypeScript
import { Request, Response, NextFunction } from 'express';

const BLOCKED_UAS = /GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|webzio-extended|gemini-deep-research/i;

function blockAiBots(req: Request, res: Response, next: NextFunction): void {
  const ua = req.headers['user-agent'] ?? '';
  if (BLOCKED_UAS.test(ua)) {
    res.status(403).send('Forbidden');
    return;
  }
  next();
}

TypeScript: return after res.send()

In TypeScript Express middleware typed as void, call res.status(403).send() then return on a separate line. Do not call next() after sending a response — this causes "Cannot set headers after they are sent" errors.

Method 5: nginx Reverse Proxy

Production Express apps typically run behind nginx. Add a user agent block to the nginx config to reject AI bots before Node.js is invoked:

# /etc/nginx/sites-available/yourapp.conf
server {
    listen 80;
    server_name yourdomain.com;

    # Block AI training crawlers at the edge — before Node.js
    if ($http_user_agent ~* "(GPTBot|ClaudeBot|anthropic-ai|CCBot|Bytespider|Google-Extended|PerplexityBot|Diffbot|DeepSeekBot|MistralBot|cohere-ai|meta-externalagent|Amazonbot|xAI-Bot|AI2Bot|omgili|webzio-extended|gemini-deep-research|OAI-SearchBot|ChatGPT-User)") {
        return 403;
    }

    location / {
        proxy_pass http://127.0.0.1:3000;  # Express on port 3000
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_cache_bypass $http_upgrade;
    }
}

Docker + Docker Compose

If your Express app runs in Docker, the recommended setup is nginx as a separate container in your Compose file, proxying to the Express container:

# docker-compose.yml (simplified)
services:
  nginx:
    image: nginx:alpine
    ports: ["80:80"]
    volumes: ["./nginx.conf:/etc/nginx/conf.d/default.conf"]
    depends_on: [app]
  app:
    build: .
    expose: ["3000"]

Add the AI bot block to nginx.conf as shown above. For PaaS (Heroku, Railway, Render) without nginx access: use the Express middleware approach.

AI Bots to Block

25 user agents covering AI training crawlers and AI search bots. The robots.txt and middleware patterns above include all of them.

GPTBotChatGPT-UserOAI-SearchBotClaudeBotanthropic-aiGoogle-ExtendedBytespiderCCBotPerplexityBotmeta-externalagentAmazonbotApplebot-ExtendedxAI-BotDeepSeekBotMistralBotDiffbotcohere-aiAI2BotAi2Bot-DolmaYouBotDuckAssistBotomgiliomgilibotwebzio-extendedgemini-deep-research

Frequently Asked Questions

How do I serve robots.txt in Express.js?

Two ways: (1) Static file — place robots.txt in your public/ directory and add app.use(express.static('public')) to your app. Express serves it automatically at /robots.txt before any route handlers run. (2) Route handler — add app.get('/robots.txt', (req, res) => { res.type('text/plain'); res.send(content); }) before your other routes. The static file approach is simpler and has lower overhead; the route approach lets you generate rules dynamically or vary them by environment.

How do I block AI bots with Express middleware?

Create an app.use() middleware that checks req.headers['user-agent'] against a regex of AI bot names. If matched, return res.status(403).send('Forbidden') immediately. If not matched, call next() to pass the request to the next middleware or route. Critical: register this middleware before your route definitions — Express processes middleware in the order it's registered. Place it after express.static() if you want static assets to be accessible, or before it if you want to block bots from static assets too.

Does middleware order matter for bot blocking in Express?

Yes, significantly. Express processes middleware in registration order. If you register express.static() before your bot-blocking middleware, static file requests (including robots.txt itself) bypass the bot check. If you register bot-blocking middleware before express.static(), bots are blocked even from static files. For most sites, register bot-blocking middleware after express.static('public') so robots.txt is accessible to crawlers, then block before HTML routes: app.use(express.static('public')); app.use(blockAiBots); app.get('/', ...).

How do I set X-Robots-Tag headers in Express?

Add a middleware that calls res.setHeader('X-Robots-Tag', 'noai, noimageai') before calling next(). Register it globally with app.use() to apply it to all responses. This is more authoritative than an HTML meta tag because it applies at the HTTP layer — bots that fetch pages without executing JavaScript still see the header. For more complete security headers, consider using the helmet npm package which bundles multiple security headers including configurable X-Robots-Tag.

Should I block AI bots in Express middleware or in nginx?

For high-traffic production deployments, nginx is more efficient — it rejects matched bots before Node.js is invoked, saving CPU and memory. For simpler setups, PaaS deployments (Heroku, Railway, Render) where you don't control nginx, or when you want to log blocked requests through your application, Express middleware is the right choice. Both approaches can be combined: nginx blocks the most aggressive crawlers at the network layer, while Express middleware handles anything nginx misses.

How do I add noai meta tags when using Express with a template engine?

Add the meta tag to your base layout template. For EJS: include <meta name="robots" content="noai, noimageai"> in your views/layout.ejs or partials/head.ejs. For Pug: add meta(name='robots' content='noai, noimageai') to your layout.pug. For Handlebars: add it to your main.hbs layout template. If you're serving a React/Vue SPA from Express, add the meta tag to the HTML template file that Express serves as the SPA shell.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

Related Guides