Skip to content
Guides/OpenResty

How to Block AI Bots on OpenResty (Nginx + Lua): Complete 2026 Guide

OpenResty embeds LuaJIT inside nginx, enabling arbitrary code in request-processing phases. access_by_lua_block runs at the access phase — before proxy_pass. Calling ngx.exit(ngx.HTTP_FORBIDDEN) here means the upstream server never receives the request. Use string.find(ua, pattern, 1, true) — the true flag enables plain-text matching, avoiding Lua regex escaping for hyphens.

OpenResty request phases — bot check goes in access

1. rewrite2. access ← HERE3. content4. header_filter5. body_filter6. log

access_by_lua_block fires before the content phase. ngx.exit(403) here short-circuits all subsequent phases — proxy_pass, header_filter, and body_filter never run. The upstream server never receives the blocked request.

Protection layers

1
robots.txtlocation = /robots.txt exact match — served before any Lua check, all crawlers can read it
2
noai meta tagIn HTML content served by your backend — <meta name="robots" content="noai, noimageai">
3
X-Robots-Tag (blocked)ngx.header["X-Robots-Tag"] in access_by_lua_block before ngx.exit — included in the 403 response
4
X-Robots-Tag (legitimate)header_filter_by_lua_block — adds header to all upstream responses that pass the access check
5
Hard 403 — access phasengx.exit(ngx.HTTP_FORBIDDEN) — upstream never receives blocked request, 403 returned immediately

Step 1 — Load bot patterns at startup (init_by_lua_block)

init_by_lua_block runs once in the nginx master process during startup. Globals set here are copy-on-write shared across all worker processes. The bot pattern table is allocated once — not per-request.

# nginx.conf — http block
# init_by_lua_block runs once at nginx start (master process).
# Bot patterns table is available in all worker processes.

http {
  lua_package_path '/etc/nginx/lua/?.lua;;';

  init_by_lua_block {
    AI_BOT_PATTERNS = {
      -- OpenAI
      "gptbot", "chatgpt-user", "oai-searchbot",
      -- Anthropic
      "claudebot", "claude-web",
      -- Common Crawl
      "ccbot",
      -- Bytedance
      "bytespider",
      -- Meta
      "meta-externalagent",
      -- Perplexity
      "perplexitybot",
      -- Google AI
      "google-extended", "googleother",
      -- Cohere
      "cohere-ai",
      -- Amazon
      "amazonbot",
      -- Diffbot
      "diffbot",
      -- AI2
      "ai2bot",
      -- DeepSeek
      "deepseekbot",
      -- Mistral
      "mistralai-user",
      -- xAI
      "xai-bot",
      -- You.com
      "youbot",
      -- DuckDuckGo AI
      "duckassistbot",
    }
  }

  # ... server blocks below
}

Step 2 — Access phase check + response header filter

location = /robots.txt (exact match) has higher nginx priority than location / — it serves robots.txt before any Lua code runs. The access_by_lua_block in location / never sees robots.txt requests.

# nginx.conf — server block

server {
  listen 80;
  server_name example.com;

  # robots.txt — served BEFORE the access_by_lua_block check.
  # location = is an exact match — highest priority in nginx.
  # AI bots must be able to read robots.txt even if they'd be blocked elsewhere.
  location = /robots.txt {
    root /var/www/html;
    add_header Content-Type "text/plain";
    # Do NOT add access_by_lua_block here — allow all crawlers unconditionally.
  }

  # All other locations — bot check applied
  location / {
    # access_by_lua_block runs at the ACCESS phase.
    # If ngx.exit() is called here, the proxy_pass below NEVER executes.
    # The upstream server never receives the blocked request.
    access_by_lua_block {
      local ua = ngx.var.http_user_agent or ""
      local ua_lower = string.lower(ua)

      for _, pattern in ipairs(AI_BOT_PATTERNS) do
        -- string.find(str, pattern, init, plain=true)
        -- plain=true: literal match — hyphens in bot names need no escaping.
        if string.find(ua_lower, pattern, 1, true) then
          ngx.header["X-Robots-Tag"] = "noai, noimageai"
          ngx.log(ngx.WARN, "AI bot blocked: " .. ua)
          ngx.exit(ngx.HTTP_FORBIDDEN)  -- 403, stops all subsequent phases
        end
      end
    end

    # header_filter_by_lua_block runs during the HEADER FILTER phase.
    # Only reached for requests that passed the access check above.
    # Adds X-Robots-Tag to all legitimate upstream responses.
    header_filter_by_lua_block {
      ngx.header["X-Robots-Tag"] = "noai, noimageai"
    }

    proxy_pass http://backend;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
  }
}

Step 3 — Separate Lua module (/etc/nginx/lua/ai_bots.lua)

For maintainability, extract the bot list into a separate .lua file. Load it via require "ai_bots" in init_by_lua_block. Set lua_package_path in the http block.

-- /etc/nginx/lua/ai_bots.lua — separate Lua module
-- Loaded with: lua_package_path '/etc/nginx/lua/?.lua;;'
-- In nginx.conf: require "ai_bots" in init_by_lua_block

local M = {}

local patterns = {
  "gptbot", "chatgpt-user", "oai-searchbot",
  "claudebot", "claude-web",
  "ccbot",
  "bytespider",
  "meta-externalagent",
  "perplexitybot",
  "google-extended", "googleother",
  "cohere-ai",
  "amazonbot",
  "diffbot",
  "ai2bot",
  "deepseekbot",
  "mistralai-user",
  "xai-bot",
  "youbot",
  "duckassistbot",
}

-- is_ai_bot: returns true if ua matches any known AI bot pattern.
-- ua must be lowercase before calling.
function M.is_ai_bot(ua)
  for _, pattern in ipairs(patterns) do
    if string.find(ua, pattern, 1, true) then
      return true
    end
  end
  return false
end

return M

-- ----------------------------------------------------------------
-- nginx.conf usage:
--
-- init_by_lua_block {
--   ai_bots = require "ai_bots"
-- }
--
-- access_by_lua_block {
--   local ua = string.lower(ngx.var.http_user_agent or "")
--   if ai_bots.is_ai_bot(ua) then
--     ngx.header["X-Robots-Tag"] = "noai, noimageai"
--     ngx.exit(ngx.HTTP_FORBIDDEN)
--   end
-- }

Step 4 — Dynamic updates and metrics with lua_shared_dict

lua_shared_dict is a shared memory zone readable and writable from all worker processes atomically. Use it to track block counts per bot and to add new bot patterns at runtime without an nginx reload.

# lua_shared_dict — dynamic bot list + metrics without nginx restart

http {
  # Shared memory zones — readable/writable from ALL worker processes.
  # Survives worker restarts (not nginx -s reload of config).
  lua_shared_dict bot_metrics 10m;   -- counters per bot name
  lua_shared_dict dynamic_bots 1m;   -- runtime-added patterns

  init_by_lua_block {
    -- Static patterns (compile-time)
    AI_BOT_PATTERNS = { "gptbot", "claudebot", "ccbot", ... }
  }

  server {
    location / {
      access_by_lua_block {
        local ua = string.lower(ngx.var.http_user_agent or "")

        -- Check static patterns
        for _, pattern in ipairs(AI_BOT_PATTERNS) do
          if string.find(ua, pattern, 1, true) then
            -- Increment counter in shared dict (atomic)
            local metrics = ngx.shared.bot_metrics
            metrics:incr(pattern, 1, 0)
            ngx.header["X-Robots-Tag"] = "noai, noimageai"
            ngx.exit(ngx.HTTP_FORBIDDEN)
          end
        end

        -- Check dynamic patterns added at runtime via /admin endpoint
        local dynamic = ngx.shared.dynamic_bots
        local keys = dynamic:get_keys()
        for _, key in ipairs(keys) do
          if string.find(ua, key, 1, true) then
            ngx.header["X-Robots-Tag"] = "noai, noimageai"
            ngx.exit(ngx.HTTP_FORBIDDEN)
          end
        end
      end

      proxy_pass http://backend;
    }

    # Admin endpoint to add dynamic bot patterns at runtime
    # (Protect this with allow/deny or authentication in production)
    location = /admin/block-bot {
      allow 127.0.0.1;
      deny all;
      content_by_lua_block {
        local pattern = ngx.var.arg_pattern
        if pattern and #pattern > 0 then
          ngx.shared.dynamic_bots:set(pattern, true)
          ngx.say("blocked: " .. pattern)
        else
          ngx.status = 400
          ngx.say("missing ?pattern=")
        end
      }
    }
  }
}

Step 5 — Docker deployment

The official openresty/openresty Docker image includes LuaJIT and all standard OpenResty modules. Mount your config and Lua files as volumes for live editing without rebuilding.

# Dockerfile — OpenResty with custom Lua config

FROM openresty/openresty:1.25.3-bookworm

# Copy nginx config and Lua modules
COPY nginx.conf /etc/nginx/nginx.conf
COPY lua/ /etc/nginx/lua/
COPY html/ /var/www/html/

EXPOSE 80
CMD ["/usr/local/openresty/nginx/sbin/nginx", "-g", "daemon off;"]

# ----------------------------------------------------------------
# docker-compose.yml

# version: "3.9"
# services:
#   openresty:
#     build: .
#     ports:
#       - "80:80"
#     volumes:
#       - ./nginx.conf:/etc/nginx/nginx.conf:ro
#       - ./lua:/etc/nginx/lua:ro
#       - ./html:/var/www/html:ro

# ----------------------------------------------------------------
# robots.txt — /var/www/html/robots.txt

# User-agent: *
# Allow: /
#
# User-agent: GPTBot
# Disallow: /
#
# User-agent: ClaudeBot
# Disallow: /
#
# User-agent: CCBot
# Disallow: /
#
# User-agent: Bytespider
# Disallow: /
#
# User-agent: Google-Extended
# Disallow: /
#
# User-agent: PerplexityBot
# Disallow: /
#
# User-agent: Meta-ExternalAgent
# Disallow: /

OpenResty vs plain Nginx vs Nginx Unit vs Caddy

FeatureOpenRestyPlain NginxNginx UnitCaddy
Bot check mechanismaccess_by_lua_block — LuaJIT code iterates pattern list, string.find plain-text matchmap $http_user_agent $is_bot { ... } + if ($is_bot) { return 403; } — static config onlyPython/Ruby/PHP handler script called per request via unit config routingheader_regexp matcher + respond directive, or Caddy Lua module (less common)
Short-circuitngx.exit(ngx.HTTP_FORBIDDEN) at access phase — proxy_pass never executesreturn 403 in if block — but if directive in nginx is often fragileHTTP 403 response returned from application handlerrespond 403 directive in Caddyfile — before reverse_proxy
X-Robots-Tagheader_filter_by_lua_block on pass-through; ngx.header[] in access block for 403add_header X-Robots-Tag "noai, noimageai" always — applies to all responsesSet via application framework response headersheader X-Robots-Tag "noai, noimageai" directive in Caddyfile
Dynamic bot listlua_shared_dict — update at runtime without reload, atomic incr for metricsRequires nginx -s reload to pick up map block changesReload unit config via REST API; application code can read dynamic sourcesRequires config reload via Admin API or caddy reload
robots.txt servinglocation = /robots.txt { root ...; } — exact match before Lua location, no access checklocation = /robots.txt { root ...; } — identical, no Lua neededConfigured as a static route in unit config or served by applicationfile_server for /robots.txt before reverse_proxy block
Lua pattern matchingstring.find(ua, pattern, 1, true) — plain=true avoids % escaping for hyphensPCRE regex in map block — ~* for case-insensitive; hyphens safe in character classesLanguage-native string matching in application codeheader_regexp uses Go regexp2 — hyphens in character classes are safe
PerformanceLuaJIT — near-native speed, JIT-compiled Lua, minimal overhead per requestFastest — pure C, no scripting overhead; limited flexibilityLanguage startup overhead; Go/Python/Ruby runtimes; higher memory per workerGo runtime — fast, GC pauses possible at scale; simpler ops than OpenResty

Summary

  • access phase — before upstream access_by_lua_block runs before proxy_pass. Blocked requests never reach the upstream server.
  • string.find(ua, pattern, 1, true) — the true flag is plain-text matching. Bot names with hyphens (chatgpt-user, meta-externalagent) are matched literally without %- escaping.
  • init_by_lua_block — allocates the bot pattern table once at startup, copy-on-write shared across all workers. Not per-request allocation.
  • location = /robots.txt — nginx exact match has higher priority than location /. robots.txt is always served without hitting any Lua code.
  • lua_shared_dict — for runtime updates and metrics without nginx reload. Atomic operations across all worker processes.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.