Skip to content
Caddy · Web Server · Reverse Proxy·9 min read

How to Block AI Bots on Caddy: Complete 2026 Guide

Caddy's named matchers and clean Caddyfile syntax make bot blocking concise: define a @bad_bot matcher on User-Agent, then respond @bad_bot 403. Caddy's automatic HTTPS via Let's Encrypt means no SSL config — just point a domain at your server. The key behaviour to understand: Caddy evaluates directives by precedence, not top-to-bottom — respond always fires before reverse_proxy regardless of order in the Caddyfile.

Directive order ≠ execution order in Caddy

Unlike nginx and Apache, Caddy evaluates directives in a fixed precedence order, not top-to-bottom. You don't need to put respond @bad_bot 403 before reverse_proxy in the Caddyfile — Caddy's precedence list ensures respond is evaluated first. But writing it first is still good practice for readability.

Methods at a glance

MethodWhat it doesCaddyfile directive
robots.txt via file_serverSignals bots which paths are off-limitsfile_server
@bad_bot header matcher + respond 403Hard block on known AI User-Agentsrespond @bad_bot 403
header X-Robots-Tagnoai header on all HTTP responsesheader
noai <meta> tagAI training opt-out per HTML pageHTML files / SSG layout
caddy-ratelimit pluginRate-limit to catch UA-rotating botsrate_limit (plugin)
reverse_proxyProxy to backend after bot checkreverse_proxy

1. robots.txt — file_server

Place robots.txt in the directory set by the root directive. Caddy's file_server serves it automatically at /robots.txt — no location block needed.

# Caddyfile
example.com {
    root * /var/www/html
    file_server

    # robots.txt is served automatically from root directory
    # No special config needed
}
# /var/www/html/robots.txt

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: *
Allow: /

2. Hard 403 — header matcher

Define a named matcher (@bad_bot) using the header matcher on User-Agent. Multiple header User-Agent lines inside the same named matcher are evaluated as OR conditions. The *wildcard* syntax matches the pattern anywhere in the header value (case-sensitive by default — use the header_regexp matcher for case-insensitive matching).

example.com {
    root * /var/www/html

    # Named matcher — multiple header lines = OR conditions
    @bad_bot {
        header User-Agent *GPTBot*
        header User-Agent *ChatGPT-User*
        header User-Agent *ClaudeBot*
        header User-Agent *Claude-Web*
        header User-Agent *anthropic-ai*
        header User-Agent *CCBot*
        header User-Agent *Google-Extended*
        header User-Agent *PerplexityBot*
        header User-Agent *Amazonbot*
        header User-Agent *Bytespider*
        header User-Agent *YouBot*
        header User-Agent *Applebot*
        header User-Agent *DuckAssistBot*
        header User-Agent *meta-externalagent*
        header User-Agent *MistralAI-Spider*
        header User-Agent *oai-searchbot*
    }

    # Return 403 for matched bots
    # Caddy evaluates respond before reverse_proxy/file_server regardless of order
    respond @bad_bot 403

    file_server
}

header vs header_regexp matcher

  • header User-Agent *GPTBot* — glob-style wildcard, case-sensitive
  • header_regexp bot User-Agent (?i)(GPTBot|ClaudeBot) — regex with named capture group, (?i) for case-insensitive

For a long list of bots, header_regexp with a single pipe-separated regex is more concise. The bot name is a capture group label used internally by Caddy.

Alternative using header_regexp for a more compact single-matcher approach:

example.com {
    @bad_bot header_regexp bot User-Agent (?i)(GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai|CCBot|Google-Extended|PerplexityBot|Amazonbot|Bytespider|YouBot|DuckAssistBot|meta-externalagent|MistralAI-Spider|oai-searchbot)

    respond @bad_bot 403

    file_server
}

3. X-Robots-Tag — header directive

Caddy's header directive adds, sets, or removes response headers. Place it in the site block for all responses. Unlike nginx's always keyword, Caddy adds headers to all responses including error responses by default.

example.com {
    # Add X-Robots-Tag to all responses (including 4xx/5xx)
    header X-Robots-Tag "noai, noimageai"

    # Or add multiple security headers at once:
    header {
        X-Robots-Tag    "noai, noimageai"
        X-Frame-Options "SAMEORIGIN"
        X-Content-Type-Options "nosniff"
    }

    # Scope to HTML responses only (using path matcher):
    @html path *.html /
    header @html X-Robots-Tag "noai, noimageai"
}

header vs header_down for reverse proxy

When using reverse_proxy, the header directive applies to Caddy's response to the client. If your upstream already sets X-Robots-Tag and you want to override it, use header_down inside the reverse_proxy block to modify the upstream response before Caddy forwards it.

4. noai meta tag — static sites

Caddy serves HTML files as-is — it does not inject content. Add the noai meta tag to your HTML files directly, or to your SSG base layout template.

<!-- In your HTML <head> -->
<meta name="robots" content="noai, noimageai">

<!-- Hugo base layout (layouts/_default/baseof.html): -->
<meta name="robots" content="{{ with .Params.robots }}{{ . }}{{ else }}noai, noimageai{{ end }}">

<!-- Eleventy base layout (_includes/base.njk): -->
<meta name="robots" content="{{ robots | default('noai, noimageai') }}">

Use X-Robots-Tag (Section 3) as the HTTP-layer equivalent — no HTML edits needed.

5. Rate limiting — caddy-ratelimit plugin

Caddy does not include rate limiting in the standard distribution. Install the caddy-ratelimit plugin by building Caddy with xcaddy, or use Cloudflare in front of Caddy for free rate limiting without a custom build.

# Build Caddy with rate limit plugin:
go install github.com/caddyserver/xcaddy/cmd/xcaddy@latest
xcaddy build --with github.com/mholt/caddy-ratelimit

# Caddyfile with rate limiting:
example.com {
    @bad_bot header_regexp bot User-Agent (?i)(GPTBot|ClaudeBot|CCBot|PerplexityBot|Bytespider)
    respond @bad_bot 403

    # Rate limit: 10 requests per second per IP
    rate_limit {
        zone dynamic {
            key {remote_host}
            events 10
            window 1s
        }
    }

    file_server
}

Without the plugin, use Caddy's reverse_proxy health checks and upstream load balancing to mitigate scraping — or proxy through Cloudflare.

6. Reverse proxy setup

Caddy's reverse_proxy directive forwards requests to an upstream server. The @bad_bot matcher and respond 403 fire before proxying — blocked bots never reach your backend.

example.com {
    # Bot check — fires before reverse_proxy due to directive precedence
    @bad_bot header_regexp bot User-Agent (?i)(GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai|CCBot|Google-Extended|PerplexityBot|Amazonbot|Bytespider|YouBot|DuckAssistBot|meta-externalagent|MistralAI-Spider|oai-searchbot)
    respond @bad_bot 403

    # X-Robots-Tag on all responses
    header X-Robots-Tag "noai, noimageai"

    # Serve robots.txt locally — don't proxy it
    handle /robots.txt {
        root * /var/www/html
        file_server
    }

    # Proxy everything else to backend
    reverse_proxy localhost:3000 {
        # Pass real client IP to upstream
        header_up X-Real-IP {remote_host}
        header_up X-Forwarded-For {remote_host}
        header_up X-Forwarded-Proto {scheme}
    }
}

7. Full Caddyfile example

A complete production Caddyfile with automatic HTTPS, bot blocking, X-Robots-Tag, and both static site and reverse proxy patterns. Caddy handles Let's Encrypt automatically — no SSL config needed.

# Caddyfile — production ready

# Global options (optional)
{
    email admin@example.com   # Let's Encrypt notifications
    # admin off               # Disable admin API in production
}

example.com {
    # ── AI bot blocking ────────────────────────────────────────────────
    @bad_bot header_regexp bot User-Agent (?i)(GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai|CCBot|Google-Extended|PerplexityBot|Amazonbot|Bytespider|YouBot|Applebot|DuckAssistBot|meta-externalagent|MistralAI-Spider|oai-searchbot)
    respond @bad_bot 403

    # ── AI training opt-out header ─────────────────────────────────────
    header X-Robots-Tag "noai, noimageai"

    # ── Static site (comment out if using reverse_proxy) ───────────────
    root * /var/www/html
    file_server

    # ── Reverse proxy (comment out file_server above, uncomment below) ─
    # handle /robots.txt {
    #     root * /var/www/html
    #     file_server
    # }
    # reverse_proxy localhost:3000 {
    #     header_up X-Real-IP         {remote_host}
    #     header_up X-Forwarded-For   {remote_host}
    #     header_up X-Forwarded-Proto {scheme}
    # }

    # ── Logging ────────────────────────────────────────────────────────
    log {
        output file /var/log/caddy/access.log
        format json
    }
}

# Redirect www → apex
www.example.com {
    redir https://example.com{uri} permanent
}

8. Docker deployment

The official caddy Docker image includes automatic HTTPS. Mount your Caddyfile and a data volume (for certificates) — Caddy handles everything else.

# docker-compose.yml
services:
  caddy:
    image: caddy:alpine
    ports:
      - "80:80"
      - "443:443"
      - "443:443/udp"   # HTTP/3
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile:ro
      - ./dist:/var/www/html:ro
      - caddy_data:/data          # TLS certificates — persist this!
      - caddy_config:/config
    restart: unless-stopped

volumes:
  caddy_data:
  caddy_config:
# Dockerfile — custom build with plugins
FROM caddy:builder AS builder
RUN xcaddy build \
    --with github.com/mholt/caddy-ratelimit

FROM caddy:alpine
COPY --from=builder /usr/bin/caddy /usr/bin/caddy
COPY Caddyfile /etc/caddy/Caddyfile
COPY dist/ /var/www/html/
# Reload config without downtime:
docker exec caddy-container caddy reload --config /etc/caddy/Caddyfile

# Validate config:
docker exec caddy-container caddy validate --config /etc/caddy/Caddyfile

9. Dynamic updates via JSON API

Caddy's admin API accepts live config updates over HTTP — no restart needed. This lets you add bots to a blocklist programmatically. The admin endpoint runs on localhost:2019 by default.

# Get current config:
curl http://localhost:2019/config/

# Reload from Caddyfile (converts to JSON internally):
curl -X POST http://localhost:2019/load \
  -H "Content-Type: text/caddyfile" \
  --data-binary @Caddyfile

# Add a new bot pattern to the regexp matcher via JSON API:
# (Advanced — use only if you need dynamic updates without Caddyfile reload)
curl -X PATCH http://localhost:2019/config/apps/http/servers/srv0/routes \
  -H "Content-Type: application/json" \
  -d '{"@id":"bot-block","match":[{"header_regexp":{"User-Agent":{"pattern":"NewBotName","name":"bot"}}}],"handle":[{"handler":"static_response","status_code":403}]}'

Disable the admin API in production if you don't need live updates: { admin off } in the global options block.

Frequently asked questions

How do I match User-Agent in Caddy?

Use the header matcher: header User-Agent *GPTBot*. The * wildcard matches anywhere in the string (case-sensitive). For case-insensitive matching across many bots, use header_regexp with a single pipe-separated pattern: header_regexp bot User-Agent (?i)(GPTBot|ClaudeBot|CCBot). Multiple header User-Agent lines in one named matcher are OR conditions.

Does Caddy serve robots.txt automatically?

Yes — with file_server enabled and robots.txt in the root directory, Caddy serves it at /robots.txt with no additional config. No special location block required.

How does Caddy directive ordering work?

Caddy evaluates directives in a fixed precedence order, not top-to-bottom. respond has higher precedence than reverse_proxy and file_server, so respond @bad_bot 403 always fires first regardless of where you write it. Writing it first is still recommended for readability.

How do I add X-Robots-Tag in Caddy?

header X-Robots-Tag "noai, noimageai" in the site block. Caddy adds headers to all responses including error responses by default — no always keyword needed (unlike nginx). For reverse proxy, use header_down inside the reverse_proxy block to modify upstream response headers.

Does Caddy have built-in rate limiting?

No — rate limiting requires the caddy-ratelimit plugin, installed by building Caddy with xcaddy build --with github.com/mholt/caddy-ratelimit. The standard Docker image and binary do not include it. Alternatively, put Caddy behind Cloudflare for free rate limiting without a custom build.

What is the difference between Caddyfile and JSON config?

The Caddyfile is a human-readable format Caddy converts to JSON internally. JSON config is Caddy's native format — it exposes every option and can be updated live via the admin API (POST localhost:2019/load) without restarting. Use Caddyfile for simplicity; use the JSON API for programmatic config updates (e.g. dynamically adding bots to a blocklist). Disable the admin API in production if you don't need it: { admin off }.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.