Skip to content

How to Block AI Bots in Lua Lapis

Lapis is a Lua web framework that runs on OpenResty (nginx + LuaJIT), providing a higher-level MVC abstraction over raw OpenResty. It supports both Lua and MoonScript and includes a routing DSL, database models, sessions, and a before_filter system that fires before every action. Lapis sits above the OpenResty/nginx layer — requests reach Lapis only after nginx has processed the connection and executed its rewrite/access phases. The Lapis-specific detail for bot blocking: returning a table from before_filter short-circuits the action. Any non-nil, non-false return value is used as the response — the action function never runs. The table can include a headers key to set response headers on the blocked response in a single return.

1. Bot detection module

A Lua module with no external dependencies. string.find(pattern, 1, true) — the true fourth argument enables plain mode (literal substring, no Lua pattern engine). Applied to the lowercased UA string via :lower().

-- bot_utils.lua — bot detection, no dependencies
local M = {}

-- All lowercase — matched against ua:lower()
local AI_BOT_PATTERNS = {
  "gptbot",
  "chatgpt-user",
  "claudebot",
  "anthropic-ai",
  "ccbot",
  "google-extended",
  "cohere-ai",
  "meta-externalagent",
  "bytespider",
  "omgili",
  "diffbot",
  "imagesiftbot",
  "magpie-crawler",
  "amazonbot",
  "dataprovider",
  "netcraft",
}

--- Returns true if the User-Agent string matches a known AI crawler.
function M.is_ai_bot(ua)
  if not ua or ua == "" then return false end
  local lower = ua:lower()
  -- string.find() with plain=true — literal substring, no regex engine
  for _, pattern in ipairs(AI_BOT_PATTERNS) do
    if lower:find(pattern, 1, true) then
      return true
    end
  end
  return false
end

return M

2. before_filter — global bot blocking

app:before_filter(fn) registers a function that runs before every action. Return a table to short-circuit; return nil (or nothing) to pass through. The headers key in the return table sets response headers on the blocked response. self.res.headers sets headers on passing responses.

-- app.lua — Lapis application
local lapis = require("lapis")
local bot_utils = require("bot_utils")

local app = lapis.Application()

-- ── Global before_filter ──────────────────────────────────────────────────────
-- before_filter fires before every action in this application.
-- Return a table to short-circuit the action and use it as the response.
-- Return nil (or nothing) to continue to the action.
app:before_filter(function(self)
  -- Path guard: let robots.txt through.
  -- In most deployments, nginx serves robots.txt as a static file before
  -- Lapis runs. This guard handles edge cases where it reaches Lapis.
  if self.req.cmd_url == "/robots.txt" then
    return  -- nil = pass through
  end

  -- Lapis normalises request header names to lowercase.
  -- self.req.headers["user-agent"] is nil when absent.
  local ua = self.req.headers["user-agent"] or ""

  if bot_utils.is_ai_bot(ua) then
    -- Return a table to short-circuit. Any non-nil, non-false return
    -- from before_filter causes Lapis to render it as the response.
    -- 'headers' key sets response headers on the blocked response.
    return {
      status = 403,
      headers = {
        ["X-Robots-Tag"] = "noai, noimageai",
        ["Content-Type"]  = "text/plain",
      },
      "Forbidden",
    }
  end

  -- Pass-through: set X-Robots-Tag on all non-blocked responses.
  -- self.res.headers is the response headers table for the current request.
  self.res.headers["X-Robots-Tag"] = "noai, noimageai"
end)

-- ── Routes ────────────────────────────────────────────────────────────────────
app:get("/", function(self)
  return { json = { message = "Hello" } }
end)

app:get("/api/data", function(self)
  return { json = { data = "value" } }
end)

app:get("/health", function(self)
  return { json = { status = "ok" } }
end)

return app

3. MoonScript variant

MoonScript is Lapis's preferred language — cleaner syntax, class-based application definition. The @before_filter class method registers the filter. Inside the filter, @ is self @req, @res, and other Lapis properties are accessed with the fat arrow (=>) syntax.

-- app.moon — MoonScript variant (Lapis's native language)
-- MoonScript compiles to Lua; the pattern is identical but with cleaner syntax.
lapis = require "lapis"
bot_utils = require "bot_utils"

class App extends lapis.Application
  @before_filter =>
    return if @req.cmd_url == "/robots.txt"

    ua = @req.headers["user-agent"] or ""

    if bot_utils.is_ai_bot ua
      return {
        status: 403
        headers: {
          "X-Robots-Tag": "noai, noimageai"
          "Content-Type": "text/plain"
        }
        "Forbidden"
      }

    @res.headers["X-Robots-Tag"] = "noai, noimageai"

  [index]: =>
    json: { message: "Hello" }

  ["/api/data"]: =>
    json: { data: "value" }

4. Scoped protection — action wrapper

When some routes should bypass the filter (health checks, public endpoints), use an action wrapper function instead of a global filter with path guards. The wrapper is applied only to actions that need protection — no special cases needed in the filter logic.

-- Scoped before_filter using respond_to — protect only specific routes
-- Use this when some routes (health check, public API) should bypass the filter.

local lapis = require("lapis")
local bot_utils = require("bot_utils")

local app = lapis.Application()

-- Helper: wrap an action with bot blocking
local function protected(action)
  return function(self)
    local ua = self.req.headers["user-agent"] or ""
    if bot_utils.is_ai_bot(ua) then
      return {
        status = 403,
        headers = { ["X-Robots-Tag"] = "noai, noimageai" },
        "Forbidden",
      }
    end
    self.res.headers["X-Robots-Tag"] = "noai, noimageai"
    return action(self)
  end
end

-- Public endpoint — no filter
app:get("/health", function(self)
  return { json = { status = "ok" } }
end)

-- Protected endpoint — bot filter applied via wrapper
app:get("/api/data", protected(function(self)
  return { json = { data = "value" } }
end))

5. Direct OpenResty access via ngx.var

Inside a Lapis before_filter, you can access OpenResty's ngx API directly. ngx.var.http_user_agent is the raw nginx variable — nginx normalises header names to lowercase and replaces hyphens with underscores when building variable names. If you call ngx.exit() inside Lapis, it bypasses the Lapis response pipeline entirely — prefer table-return for Lapis applications.

-- Lower-level: ngx.var.http_user_agent (direct OpenResty access)
-- Use this inside a before_filter when you want the raw nginx variable
-- rather than the Lapis-normalised header. Slightly faster.
-- Note: ngx.var.http_user_agent returns nil when the header is absent.

app:before_filter(function(self)
  if self.req.cmd_url == "/robots.txt" then return end

  -- ngx.var.http_user_agent: nginx converts header name to lowercase,
  -- replaces hyphens with underscores, and prefixes with http_.
  -- "User-Agent" → ngx.var.http_user_agent
  local ua = ngx.var.http_user_agent or ""

  if bot_utils.is_ai_bot(ua) then
    -- ngx.exit() is the raw OpenResty approach — bypasses Lapis response pipeline.
    -- Lapis table-return is preferred inside Lapis applications.
    -- Use ngx.exit() only if you need to abort before Lapis processes anything.
    ngx.status = 403
    ngx.header["X-Robots-Tag"] = "noai, noimageai"
    ngx.header["Content-Type"] = "text/plain"
    ngx.say("Forbidden")
    ngx.exit(ngx.HTTP_FORBIDDEN)
  end

  self.res.headers["X-Robots-Tag"] = "noai, noimageai"
end)

6. nginx.conf — static robots.txt + Lapis routing

Configure nginx to serve robots.txt as a static file before Lapis handles the request. The Lapis before_filter never fires for statically-served files. All other requests are passed to Lapis via content_by_lua_block.

# nginx.conf — OpenResty configuration for Lapis
# Place in the http block, inside a server block.

server {
    listen 8080;
    server_name example.com;

    # Serve robots.txt as a static file — Lapis never runs for it.
    # This is the recommended approach: before_filter does not fire.
    location = /robots.txt {
        root /var/www/html;
        add_header X-Robots-Tag "noai, noimageai";
        try_files $uri =404;
    }

    # All other requests: route through Lapis
    location / {
        # content_by_lua_block runs Lapis
        content_by_lua_block {
            require("lapis").serve("app")
        }
    }
}

7. robots.txt

# static/robots.txt (served by nginx before Lapis)
User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Key points

Framework comparison — Lua / OpenResty web frameworks

FrameworkHook / phaseBlock callUA header
Lapisapp:before_filter(fn)return table { status=403, ... }self.req.headers["user-agent"]
Raw OpenRestyaccess_by_lua_blockngx.exit(403)ngx.var.http_user_agent
LÖVE / lua-http (standalone)request handler functionheaders:upsert(":status", "403")headers:get("user-agent")
nginx (config only)if ($http_user_agent ~* pattern)return 403;$http_user_agent

Lapis sits between raw OpenResty and a full MVC framework — it gives you table-return responses and a routing DSL while retaining direct access to the ngx API. Raw OpenResty's access_by_lua_block fires earlier in the nginx pipeline (before the content phase) and is more efficient for blocking, but has no application context. Lapis before_filter is the right choice when you already have a Lapis application and need access to sessions, models, or Lapis helpers alongside bot detection.

Dependencies

# Install OpenResty (includes LuaJIT)
# macOS
brew install openresty

# Ubuntu/Debian
apt install openresty

# Install Lapis via LuaRocks
luarocks install lapis

# Install MoonScript (optional, for .moon files)
luarocks install moonscript

# Create a new Lapis project
lapis new --lua    # Lua project
lapis new          # MoonScript project (default)

# Run development server (uses OpenResty internally)
lapis server       # starts on port 8080 by default

# Production: use the nginx.conf generated by Lapis
# lapis build       # generates nginx.conf from config.lua
# openresty -p . -c nginx.conf