Skip to content
Traefik · Reverse Proxy · Docker · Kubernetes·9 min read

How to Block AI Bots on Traefik: Complete 2026 Guide

Traefik is a cloud-native reverse proxy popular in Docker Compose and Kubernetes deployments. Its bot-blocking model is different from nginx and Apache: Traefik has no built-in User-Agent blocking middleware, so the strategy is to combine Traefik's Headers middleware (for X-Robots-Tag), robots.txt served from your upstream app, and either a Traefik plugin or your application-layer middleware for hard UA blocking.

Traefik has no built-in User-Agent blocking

Unlike nginx (map $http_user_agent) and Apache (mod_rewrite), Traefik's middleware library does not include a UA-matching block. Your options: (1) Traefik plugin (e.g. traefik-plugin-bot-blocker from the plugin catalog), (2) application middleware in your upstream service (Next.js middleware.ts, Express middleware, etc.), (3) nginx sidecar in front of Traefik's upstream. X-Robots-Tag and robots.txt work cleanly from Traefik.

Methods at a glance

MethodWhat it doesWhere
robots.txt (via upstream app)Signals bots which paths are off-limitsYour app / nginx sidecar
Headers middlewareAdds X-Robots-Tag to all responsesTraefik dynamic config
Plugin middlewareHard UA block at Traefik layerTraefik plugin catalog
App-layer middlewareHard UA block in your serviceNext.js / Express / nginx
IPAllowList middlewareIP-range blocking (no UA needed)Traefik dynamic config
Cloudflare WAFRule-based UA + rate limitingCloudflare dashboard

1. robots.txt — serve from upstream

Traefik is a proxy — it doesn't serve static files. Serve robots.txt from your upstream application. For a dedicated static file, add a minimal nginx container to your Docker Compose stack and route /robots.txt requests to it:

# docker-compose.yml — robots.txt via nginx sidecar
services:
  traefik:
    image: traefik:v3
    command:
      - --providers.docker=true
      - --providers.docker.exposedbydefault=false
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
      - --certificatesresolvers.le.acme.tlschallenge=true
      - --certificatesresolvers.le.acme.email=admin@example.com
      - --certificatesresolvers.le.acme.storage=/letsencrypt/acme.json
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./letsencrypt:/letsencrypt

  # Serves robots.txt at /robots.txt
  robots:
    image: nginx:alpine
    volumes:
      - ./robots.txt:/usr/share/nginx/html/robots.txt:ro
      - ./nginx-robots.conf:/etc/nginx/conf.d/default.conf:ro
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.robots.rule=Host(`example.com`) && Path(`/robots.txt`)"
      - "traefik.http.routers.robots.entrypoints=websecure"
      - "traefik.http.routers.robots.tls.certresolver=le"
      - "traefik.http.services.robots.loadbalancer.server.port=80"

  app:
    image: myapp:latest
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.app.rule=Host(`example.com`)"
      - "traefik.http.routers.app.entrypoints=websecure"
      - "traefik.http.routers.app.tls.certresolver=le"
      - "traefik.http.services.app.loadbalancer.server.port=3000"
# nginx-robots.conf — minimal config for the robots sidecar
server {
    listen 80;
    location = /robots.txt {
        root /usr/share/nginx/html;
        access_log off;
    }
}

If your upstream app already serves /robots.txt (e.g. Next.js public/robots.txt), skip the sidecar entirely.

2. X-Robots-Tag — Headers middleware

Traefik's Headers middleware adds, removes, or modifies HTTP headers on requests and responses. customResponseHeaders injects headers into every response from the upstream.

Docker Compose labels:

services:
  app:
    image: myapp:latest
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.app.rule=Host(`example.com`)"
      - "traefik.http.routers.app.entrypoints=websecure"
      - "traefik.http.routers.app.tls.certresolver=le"
      - "traefik.http.routers.app.middlewares=security-headers@docker"
      - "traefik.http.services.app.loadbalancer.server.port=3000"

      # Headers middleware — adds X-Robots-Tag to all responses
      - "traefik.http.middlewares.security-headers.headers.customresponseheaders.X-Robots-Tag=noai, noimageai"
      - "traefik.http.middlewares.security-headers.headers.customresponseheaders.X-Content-Type-Options=nosniff"

Dynamic file config (dynamic.yml with file provider):

# dynamic.yml — loaded by Traefik file provider
http:
  middlewares:
    security-headers:
      headers:
        customResponseHeaders:
          X-Robots-Tag: "noai, noimageai"
          X-Content-Type-Options: "nosniff"
          X-Frame-Options: "SAMEORIGIN"

  routers:
    app:
      rule: "Host(`example.com`)"
      entryPoints:
        - websecure
      middlewares:
        - security-headers
      service: app-service
      tls:
        certResolver: le

  services:
    app-service:
      loadBalancer:
        servers:
          - url: "http://app:3000"

Provider suffix in middleware references

When referencing a middleware in a router, always include the provider suffix: security-headers@docker (defined via Docker labels), security-headers@file (defined in a YAML file provider), security-headers@kubernetescrd (Kubernetes). Omitting the suffix causes middleware not found errors at runtime.

3. User-Agent blocking — plugin middleware

Traefik's plugin system (Traefik Plugin Catalog) lets you install community middleware plugins. The traefik-plugin-bot-blocker plugin blocks requests by User-Agent before they reach your upstream.

# traefik.yml (static config) — enable the plugin
experimental:
  plugins:
    bot-blocker:
      moduleName: "github.com/bots-garden/traefik-plugin-bot-blocker"
      version: "v0.1.0"

entryPoints:
  web:
    address: ":80"
  websecure:
    address: ":443"

providers:
  docker:
    exposedByDefault: false
  file:
    filename: /etc/traefik/dynamic.yml
# dynamic.yml — configure the plugin middleware
http:
  middlewares:
    block-ai-bots:
      plugin:
        bot-blocker:
          # List of User-Agent substrings to block
          botUserAgents:
            - "GPTBot"
            - "ChatGPT-User"
            - "ClaudeBot"
            - "Claude-Web"
            - "anthropic-ai"
            - "CCBot"
            - "Google-Extended"
            - "PerplexityBot"
            - "Amazonbot"
            - "Bytespider"
            - "YouBot"
            - "DuckAssistBot"
            - "meta-externalagent"
            - "MistralAI-Spider"
            - "oai-searchbot"
          blockedStatusCode: 403

  routers:
    app:
      rule: "Host(`example.com`)"
      entryPoints:
        - websecure
      middlewares:
        - block-ai-bots
        - security-headers
      service: app-service
      tls:
        certResolver: le

Plugin availability depends on the Traefik version and whether the plugin is in the official catalog. Check plugins.traefik.io for the current plugin list and versions.

4. Application-layer blocking

The most reliable approach with Traefik is to handle User-Agent blocking in your upstream service. Traefik passes the original User-Agent header through unchanged, so your app sees the real value.

// Next.js middleware.ts — works with any Traefik setup
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';

const BLOCKED_UA = /GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai|CCBot|Google-Extended|PerplexityBot|Amazonbot|Bytespider|YouBot|DuckAssistBot|meta-externalagent|MistralAI-Spider|oai-searchbot/i;

export function middleware(request: NextRequest) {
  const ua = request.headers.get('user-agent') ?? '';

  if (BLOCKED_UA.test(ua)) {
    return new NextResponse('Forbidden', { status: 403 });
  }

  return NextResponse.next();
}

export const config = {
  matcher: ['/((?!_next|robots.txt|favicon.ico).*)'],
};
# Express.js middleware (Node.js)
const BLOCKED_UA = /GPTBot|ChatGPT-User|ClaudeBot|anthropic-ai|CCBot|Google-Extended|PerplexityBot|Amazonbot|Bytespider/i;

app.use((req, res, next) => {
  if (req.path === '/robots.txt') return next(); // always allow
  if (BLOCKED_UA.test(req.get('user-agent') || '')) {
    return res.status(403).send('Forbidden');
  }
  next();
});

5. Kubernetes — IngressRoute + Middleware

In Kubernetes, Traefik uses Custom Resource Definitions (CRDs). Define a Middleware resource for the headers, then reference it in your IngressRoute.

# middleware.yaml — X-Robots-Tag via Traefik Middleware CRD
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: security-headers
  namespace: default
spec:
  headers:
    customResponseHeaders:
      X-Robots-Tag: "noai, noimageai"
      X-Content-Type-Options: "nosniff"
---
# ingressroute.yaml
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: app-ingress
  namespace: default
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`example.com`)
      kind: Rule
      middlewares:
        - name: security-headers    # references Middleware CRD above
      services:
        - name: app-service
          port: 3000
  tls:
    certResolver: le

Middleware CRDs are namespaced. To use a middleware from a different namespace, use the namespace@kubernetescrd reference format in the IngressRoute.

6. Complete traefik.yml (static config)

Static config defines entrypoints, providers, and certificate resolvers. It does not define routers, services, or middlewares — those live in dynamic config (Docker labels or file provider).

# traefik.yml (static config — requires restart to change)

api:
  dashboard: false   # Disable in production

entryPoints:
  web:
    address: ":80"
    http:
      redirections:
        entrypoint:
          to: websecure
          scheme: https
  websecure:
    address: ":443"
    http3: {}        # Enable HTTP/3

certificatesResolvers:
  le:
    acme:
      email: admin@example.com
      storage: /letsencrypt/acme.json
      tlsChallenge: {}

providers:
  docker:
    exposedByDefault: false
    network: proxy           # Only route services on this Docker network
  file:
    filename: /etc/traefik/dynamic.yml
    watch: true              # Hot-reload on file changes

log:
  level: INFO

accessLog:
  filePath: /var/log/traefik/access.log
  bufferingSize: 100

7. Full Docker Compose example

Complete Docker Compose stack: Traefik v3 with automatic HTTPS, security headers middleware, and an app service.

# docker-compose.yml
networks:
  proxy:
    external: true   # Create with: docker network create proxy

services:
  traefik:
    image: traefik:v3
    restart: unless-stopped
    networks:
      - proxy
    ports:
      - "80:80"
      - "443:443"
      - "443:443/udp"   # HTTP/3
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./traefik.yml:/etc/traefik/traefik.yml:ro
      - ./dynamic.yml:/etc/traefik/dynamic.yml:ro
      - ./letsencrypt:/letsencrypt
    labels:
      - "traefik.enable=true"

  app:
    image: myapp:latest
    restart: unless-stopped
    networks:
      - proxy
    # No ports exposed — Traefik handles all ingress
    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=proxy"
      - "traefik.http.routers.app.rule=Host(`example.com`)"
      - "traefik.http.routers.app.entrypoints=websecure"
      - "traefik.http.routers.app.tls.certresolver=le"
      - "traefik.http.routers.app.middlewares=security-headers@file"
      - "traefik.http.services.app.loadbalancer.server.port=3000"
# dynamic.yml (watched by Traefik — no restart needed on changes)
http:
  middlewares:
    security-headers:
      headers:
        customResponseHeaders:
          X-Robots-Tag: "noai, noimageai"
          X-Frame-Options: "SAMEORIGIN"
          X-Content-Type-Options: "nosniff"
        # Force HTTPS in browser for 1 year:
        stsSeconds: 31536000
        stsIncludeSubdomains: true

Frequently asked questions

How do I block bots by User-Agent in Traefik?

Traefik has no built-in UA-blocking middleware. Options: (1) a Traefik plugin from the Plugin Catalog, (2) application-layer middleware in your upstream (Next.js middleware.ts, Express middleware, etc.), (3) an nginx sidecar in front of your service with a map $http_user_agent $bad_bot block. For most setups, app-layer blocking (option 2) is the simplest.

How do I add response headers in Traefik?

Use the Headers middleware with customResponseHeaders. Via Docker labels: traefik.http.middlewares.hdr.headers.customresponseheaders.X-Robots-Tag=noai, noimageai. Then attach to a router: traefik.http.routers.app.middlewares=hdr@docker. The @docker suffix is required.

What is static vs dynamic config in Traefik?

Static config (traefik.yml) defines entrypoints, providers, and certificate resolvers — requires a restart to change. Dynamic config (Docker labels, file provider YAML, Kubernetes CRDs) defines routers, services, and middlewares — updated live with no restart. Always put bot-blocking middleware in dynamic config.

How do I serve robots.txt with Traefik?

Traefik doesn't serve static files. Options: (1) serve from your upstream app (Next.js public/robots.txt, nginx root), (2) add a minimal nginx sidecar container with a router rule Host(`example.com`) && Path(`/robots.txt`) pointing to it. Most apps already serve robots.txt natively, so a sidecar is usually unnecessary.

Does Traefik work with Kubernetes for bot blocking?

Yes. Use Traefik's IngressRoute CRD with a Middleware CRD for customResponseHeaders. For hard UA blocking, use an application-layer middleware (e.g. Next.js middleware.ts) or a Kubernetes NetworkPolicy + WAF solution.

What is the @file vs @docker suffix in Traefik middleware references?

The suffix indicates which provider defined the middleware. @docker — defined via Docker labels; @file — defined in a YAML/TOML file provider; @kubernetescrd — Kubernetes Middleware CRD. Always include the suffix in router middleware references — omitting it causes middleware not found errors.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.