How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

Guides/OpenResty

How to Block AI Bots on OpenResty (Nginx + Lua): Complete 2026 Guide

OpenResty embeds LuaJIT inside nginx, enabling arbitrary code in request-processing phases. access_by_lua_block runs at the access phase — before proxy_pass. Calling ngx.exit(ngx.HTTP_FORBIDDEN) here means the upstream server never receives the request. Use string.find(ua, pattern, 1, true) — the true flag enables plain-text matching, avoiding Lua regex escaping for hyphens.

OpenResty request phases — bot check goes in access

1. rewrite2. access ← HERE3. content4. header_filter5. body_filter6. log

access_by_lua_block fires before the content phase. ngx.exit(403) here short-circuits all subsequent phases — proxy_pass, header_filter, and body_filter never run. The upstream server never receives the blocked request.

Protection layers

robots.txt — location = /robots.txt exact match — served before any Lua check, all crawlers can read it

noai meta tag — In HTML content served by your backend — <meta name="robots" content="noai, noimageai">

X-Robots-Tag (blocked) — ngx.header["X-Robots-Tag"] in access_by_lua_block before ngx.exit — included in the 403 response

X-Robots-Tag (legitimate) — header_filter_by_lua_block — adds header to all upstream responses that pass the access check

Hard 403 — access phase — ngx.exit(ngx.HTTP_FORBIDDEN) — upstream never receives blocked request, 403 returned immediately

Step 1 — Load bot patterns at startup (`init_by_lua_block`)

init_by_lua_block runs once in the nginx master process during startup. Globals set here are copy-on-write shared across all worker processes. The bot pattern table is allocated once — not per-request.

# nginx.conf — http block
# init_by_lua_block runs once at nginx start (master process).
# Bot patterns table is available in all worker processes.

http {
  lua_package_path '/etc/nginx/lua/?.lua;;';

  init_by_lua_block {
    AI_BOT_PATTERNS = {
      -- OpenAI
      "gptbot", "chatgpt-user", "oai-searchbot",
      -- Anthropic
      "claudebot", "claude-web",
      -- Common Crawl
      "ccbot",
      -- Bytedance
      "bytespider",
      -- Meta
      "meta-externalagent",
      -- Perplexity
      "perplexitybot",
      -- Google AI
      "google-extended", "googleother",
      -- Cohere
      "cohere-ai",
      -- Amazon
      "amazonbot",
      -- Diffbot
      "diffbot",
      -- AI2
      "ai2bot",
      -- DeepSeek
      "deepseekbot",
      -- Mistral
      "mistralai-user",
      -- xAI
      "xai-bot",
      -- You.com
      "youbot",
      -- DuckDuckGo AI
      "duckassistbot",
    }
  }

  # ... server blocks below
}

Step 2 — Access phase check + response header filter

location = /robots.txt (exact match) has higher nginx priority than location / — it serves robots.txt before any Lua code runs. The access_by_lua_block in location / never sees robots.txt requests.

# nginx.conf — server block

server {
  listen 80;
  server_name example.com;

  # robots.txt — served BEFORE the access_by_lua_block check.
  # location = is an exact match — highest priority in nginx.
  # AI bots must be able to read robots.txt even if they'd be blocked elsewhere.
  location = /robots.txt {
    root /var/www/html;
    add_header Content-Type "text/plain";
    # Do NOT add access_by_lua_block here — allow all crawlers unconditionally.
  }

  # All other locations — bot check applied
  location / {
    # access_by_lua_block runs at the ACCESS phase.
    # If ngx.exit() is called here, the proxy_pass below NEVER executes.
    # The upstream server never receives the blocked request.
    access_by_lua_block {
      local ua = ngx.var.http_user_agent or ""
      local ua_lower = string.lower(ua)

      for _, pattern in ipairs(AI_BOT_PATTERNS) do
        -- string.find(str, pattern, init, plain=true)
        -- plain=true: literal match — hyphens in bot names need no escaping.
        if string.find(ua_lower, pattern, 1, true) then
          ngx.header["X-Robots-Tag"] = "noai, noimageai"
          ngx.log(ngx.WARN, "AI bot blocked: " .. ua)
          ngx.exit(ngx.HTTP_FORBIDDEN)  -- 403, stops all subsequent phases
        end
      end
    end

    # header_filter_by_lua_block runs during the HEADER FILTER phase.
    # Only reached for requests that passed the access check above.
    # Adds X-Robots-Tag to all legitimate upstream responses.
    header_filter_by_lua_block {
      ngx.header["X-Robots-Tag"] = "noai, noimageai"
    }

    proxy_pass http://backend;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
  }
}

Step 3 — Separate Lua module (`/etc/nginx/lua/ai_bots.lua`)

For maintainability, extract the bot list into a separate .lua file. Load it via require "ai_bots" in init_by_lua_block. Set lua_package_path in the http block.

-- /etc/nginx/lua/ai_bots.lua — separate Lua module
-- Loaded with: lua_package_path '/etc/nginx/lua/?.lua;;'
-- In nginx.conf: require "ai_bots" in init_by_lua_block

local M = {}

local patterns = {
  "gptbot", "chatgpt-user", "oai-searchbot",
  "claudebot", "claude-web",
  "ccbot",
  "bytespider",
  "meta-externalagent",
  "perplexitybot",
  "google-extended", "googleother",
  "cohere-ai",
  "amazonbot",
  "diffbot",
  "ai2bot",
  "deepseekbot",
  "mistralai-user",
  "xai-bot",
  "youbot",
  "duckassistbot",
}

-- is_ai_bot: returns true if ua matches any known AI bot pattern.
-- ua must be lowercase before calling.
function M.is_ai_bot(ua)
  for _, pattern in ipairs(patterns) do
    if string.find(ua, pattern, 1, true) then
      return true
    end
  end
  return false
end

return M

-- ----------------------------------------------------------------
-- nginx.conf usage:
--
-- init_by_lua_block {
--   ai_bots = require "ai_bots"
-- }
--
-- access_by_lua_block {
--   local ua = string.lower(ngx.var.http_user_agent or "")
--   if ai_bots.is_ai_bot(ua) then
--     ngx.header["X-Robots-Tag"] = "noai, noimageai"
--     ngx.exit(ngx.HTTP_FORBIDDEN)
--   end
-- }

Step 4 — Dynamic updates and metrics with `lua_shared_dict`

lua_shared_dict is a shared memory zone readable and writable from all worker processes atomically. Use it to track block counts per bot and to add new bot patterns at runtime without an nginx reload.

# lua_shared_dict — dynamic bot list + metrics without nginx restart

http {
  # Shared memory zones — readable/writable from ALL worker processes.
  # Survives worker restarts (not nginx -s reload of config).
  lua_shared_dict bot_metrics 10m;   -- counters per bot name
  lua_shared_dict dynamic_bots 1m;   -- runtime-added patterns

  init_by_lua_block {
    -- Static patterns (compile-time)
    AI_BOT_PATTERNS = { "gptbot", "claudebot", "ccbot", ... }
  }

  server {
    location / {
      access_by_lua_block {
        local ua = string.lower(ngx.var.http_user_agent or "")

        -- Check static patterns
        for _, pattern in ipairs(AI_BOT_PATTERNS) do
          if string.find(ua, pattern, 1, true) then
            -- Increment counter in shared dict (atomic)
            local metrics = ngx.shared.bot_metrics
            metrics:incr(pattern, 1, 0)
            ngx.header["X-Robots-Tag"] = "noai, noimageai"
            ngx.exit(ngx.HTTP_FORBIDDEN)
          end
        end

        -- Check dynamic patterns added at runtime via /admin endpoint
        local dynamic = ngx.shared.dynamic_bots
        local keys = dynamic:get_keys()
        for _, key in ipairs(keys) do
          if string.find(ua, key, 1, true) then
            ngx.header["X-Robots-Tag"] = "noai, noimageai"
            ngx.exit(ngx.HTTP_FORBIDDEN)
          end
        end
      end

      proxy_pass http://backend;
    }

    # Admin endpoint to add dynamic bot patterns at runtime
    # (Protect this with allow/deny or authentication in production)
    location = /admin/block-bot {
      allow 127.0.0.1;
      deny all;
      content_by_lua_block {
        local pattern = ngx.var.arg_pattern
        if pattern and #pattern > 0 then
          ngx.shared.dynamic_bots:set(pattern, true)
          ngx.say("blocked: " .. pattern)
        else
          ngx.status = 400
          ngx.say("missing ?pattern=")
        end
      }
    }
  }
}

Step 5 — Docker deployment

The official openresty/openresty Docker image includes LuaJIT and all standard OpenResty modules. Mount your config and Lua files as volumes for live editing without rebuilding.

# Dockerfile — OpenResty with custom Lua config

FROM openresty/openresty:1.25.3-bookworm

# Copy nginx config and Lua modules
COPY nginx.conf /etc/nginx/nginx.conf
COPY lua/ /etc/nginx/lua/
COPY html/ /var/www/html/

EXPOSE 80
CMD ["/usr/local/openresty/nginx/sbin/nginx", "-g", "daemon off;"]

# ----------------------------------------------------------------
# docker-compose.yml

# version: "3.9"
# services:
#   openresty:
#     build: .
#     ports:
#       - "80:80"
#     volumes:
#       - ./nginx.conf:/etc/nginx/nginx.conf:ro
#       - ./lua:/etc/nginx/lua:ro
#       - ./html:/var/www/html:ro

# ----------------------------------------------------------------
# robots.txt — /var/www/html/robots.txt

# User-agent: *
# Allow: /
#
# User-agent: GPTBot
# Disallow: /
#
# User-agent: ClaudeBot
# Disallow: /
#
# User-agent: CCBot
# Disallow: /
#
# User-agent: Bytespider
# Disallow: /
#
# User-agent: Google-Extended
# Disallow: /
#
# User-agent: PerplexityBot
# Disallow: /
#
# User-agent: Meta-ExternalAgent
# Disallow: /

OpenResty vs plain Nginx vs Nginx Unit vs Caddy

Feature	OpenResty	Plain Nginx	Nginx Unit	Caddy
Bot check mechanism	access_by_lua_block — LuaJIT code iterates pattern list, string.find plain-text match	map $http_user_agent $is_bot { ... } + if ($is_bot) { return 403; } — static config only	Python/Ruby/PHP handler script called per request via unit config routing	header_regexp matcher + respond directive, or Caddy Lua module (less common)
Short-circuit	ngx.exit(ngx.HTTP_FORBIDDEN) at access phase — proxy_pass never executes	return 403 in if block — but if directive in nginx is often fragile	HTTP 403 response returned from application handler	respond 403 directive in Caddyfile — before reverse_proxy
X-Robots-Tag	header_filter_by_lua_block on pass-through; ngx.header[] in access block for 403	add_header X-Robots-Tag "noai, noimageai" always — applies to all responses	Set via application framework response headers	header X-Robots-Tag "noai, noimageai" directive in Caddyfile
Dynamic bot list	lua_shared_dict — update at runtime without reload, atomic incr for metrics	Requires nginx -s reload to pick up map block changes	Reload unit config via REST API; application code can read dynamic sources	Requires config reload via Admin API or caddy reload
robots.txt serving	location = /robots.txt { root ...; } — exact match before Lua location, no access check	location = /robots.txt { root ...; } — identical, no Lua needed	Configured as a static route in unit config or served by application	file_server for /robots.txt before reverse_proxy block
Lua pattern matching	string.find(ua, pattern, 1, true) — plain=true avoids % escaping for hyphens	PCRE regex in map block — ~* for case-insensitive; hyphens safe in character classes	Language-native string matching in application code	header_regexp uses Go regexp2 — hyphens in character classes are safe
Performance	LuaJIT — near-native speed, JIT-compiled Lua, minimal overhead per request	Fastest — pure C, no scripting overhead; limited flexibility	Language startup overhead; Go/Python/Ruby runtimes; higher memory per worker	Go runtime — fast, GC pauses possible at scale; simpler ops than OpenResty

Summary

access phase — before upstream — access_by_lua_block runs before proxy_pass. Blocked requests never reach the upstream server.
string.find(ua, pattern, 1, true) — the true flag is plain-text matching. Bot names with hyphens (chatgpt-user, meta-externalagent) are matched literally without %- escaping.
init_by_lua_block — allocates the bot pattern table once at startup, copy-on-write shared across all workers. Not per-request allocation.
location = /robots.txt — nginx exact match has higher priority than location /. robots.txt is always served without hitting any Lua code.
lua_shared_dict — for runtime updates and metrics without nginx reload. Atomic operations across all worker processes.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.