How to Block AI Bots on Crystal Kemal: Complete 2026 Guide
Kemal is a Sinatra-inspired web framework for Crystal — a Ruby-like language that compiles to native binaries. Filters run before route handlers: before_all fires on every request. The key mechanism: halt(env, status_code: 403) stops the request — but after_all filters are not called after a halt. Set headers before halting.
Set headers BEFORE halt — after_all is skipped
halt() raises a Kemal::Exceptions::CustomException internally. After it fires, Kemal sends the response immediately and skips all subsequent filters including after_all. This differs from Sinatra where after filters run even after halt. Always set X-Robots-Tag and Content-Type in your before_all block before calling halt.
Protection layers
Step 1 — Bot list (src/ai_bots.cr)
A Crystal constant array — allocated at program start. String#includes? for substring matching, Array#any? for short-circuit evaluation. The method signature ai_bot?(ua : String?) accepts both String and Nil — Crystal's union type.
# src/ai_bots.cr — shared bot detection module
module AiBots
PATTERNS = [
# OpenAI
"gptbot", "chatgpt-user", "oai-searchbot",
# Anthropic
"claudebot", "claude-web",
# Common Crawl
"ccbot",
# Bytedance
"bytespider",
# Meta
"meta-externalagent",
# Perplexity
"perplexitybot",
# Google AI
"google-extended", "googleother",
# Cohere
"cohere-ai",
# Amazon
"amazonbot",
# Diffbot
"diffbot",
# AI2
"ai2bot",
# DeepSeek
"deepseekbot",
# Mistral
"mistralai-user",
# xAI
"xai-bot",
# You.com
"youbot",
# DuckDuckGo AI
"duckassistbot",
]
# Crystal nil-safety: ua is String? (String or Nil)
def self.ai_bot?(ua : String?) : Bool
return false if ua.nil? || ua.empty?
lower = ua.downcase
PATTERNS.any? { |pattern| lower.includes?(pattern) }
end
endStep 2 — Global before_all filter
The next keyword skips the rest of the filter for specific paths. env.request.headers["User-Agent"]? (with ?) is the nil-safe accessor — returns nil if absent, never raises.
# src/app.cr — before_all filter for global bot blocking
require "kemal"
require "./ai_bots"
# Serve static files from public/ — includes robots.txt
# static_folder runs BEFORE filters, so /robots.txt is always accessible.
static_folder "public"
# before_all: runs before EVERY route handler
# Set headers BEFORE calling halt — after_all is NOT called after halt.
before_all do |env|
# Skip the bot check for paths that must stay open
next if env.request.path == "/robots.txt"
next if env.request.path == "/health"
# env.request.headers["User-Agent"]? returns String? (nil if absent)
# The ? suffix is Crystal nil-safe hash access — never raises KeyError.
ua = env.request.headers["User-Agent"]? || ""
if AiBots.ai_bot?(ua)
# Set headers BEFORE halt — they won't be set after halt raises.
env.response.content_type = "text/plain; charset=utf-8"
env.response.headers["X-Robots-Tag"] = "noai, noimageai"
# halt raises Kemal::Exceptions::CustomException — route block never runs.
halt env, status_code: 403, response: "Forbidden"
end
# For legitimate requests, add X-Robots-Tag here.
# (after_all is skipped for halted requests, so we set it in before_all too)
env.response.headers["X-Robots-Tag"] = "noai, noimageai"
end
# Route handlers — only reached for legitimate requests
get "/" do |env|
env.response.content_type = "text/html; charset=utf-8"
<<-HTML
<!DOCTYPE html>
<html>
<head>
<meta name="robots" content="noai, noimageai">
<title>My Site</title>
</head>
<body><h1>Welcome</h1></body>
</html>
HTML
end
get "/health" do
"ok"
end
get "/api/data" do |env|
env.response.content_type = "application/json"
{data: "protected"}.to_json
end
Kemal.runStep 3 — Scoped filtering (before_get, before_all "/api/*")
Kemal supports method-specific filters and glob path patterns. Combine to protect only specific HTTP methods and path prefixes. The /api/* path filter only fires for routes matching that prefix.
# Per-route and per-method filter variants
require "kemal"
require "./ai_bots"
static_folder "public"
# before_get: only runs before GET handlers
# before_post, before_put, before_delete, before_patch also available
before_get do |env|
next if env.request.path == "/robots.txt"
next if env.request.path == "/health"
ua = env.request.headers["User-Agent"]? || ""
if AiBots.ai_bot?(ua)
env.response.headers["X-Robots-Tag"] = "noai, noimageai"
halt env, status_code: 403, response: "Forbidden"
end
env.response.headers["X-Robots-Tag"] = "noai, noimageai"
end
# Path-scoped filter — only fires for /api/* routes
before_all "/api/*" do |env|
ua = env.request.headers["User-Agent"]? || ""
if AiBots.ai_bot?(ua)
env.response.content_type = "application/json"
env.response.headers["X-Robots-Tag"] = "noai, noimageai"
halt env, status_code: 403, response: %({"error":"Forbidden"})
end
end
# Public routes — no filter applied for /health
get "/health" do
"ok"
end
# Protected API routes — filtered by before_all "/api/*"
get "/api/data" do |env|
env.response.content_type = "application/json"
{data: "protected"}.to_json
end
Kemal.runStep 4 — Using after_all for response headers
after_all only runs for legitimate (non-halted) requests. For blocked requests, headers must be set in before_all before calling halt. This pattern avoids duplicating the header for legitimate requests.
# after_all for X-Robots-Tag (legitimate requests only)
# Note: after_all does NOT run for halted requests.
# For blocked requests, set the header in before_all BEFORE halt.
require "kemal"
require "./ai_bots"
static_folder "public"
before_all do |env|
next if env.request.path == "/robots.txt"
ua = env.request.headers["User-Agent"]? || ""
if AiBots.ai_bot?(ua)
# Must set headers here — after_all won't fire after halt
env.response.headers["X-Robots-Tag"] = "noai, noimageai"
env.response.content_type = "text/plain"
halt env, status_code: 403, response: "Forbidden"
end
end
# after_all only runs for NON-halted requests (legitimate traffic)
after_all do |env|
env.response.headers["X-Robots-Tag"] = "noai, noimageai"
end
get "/" do
"Welcome"
end
Kemal.runStep 5 — robots.txt via static_folder
static_folder "public" serves all files in public/ at their path. Kemal's static file handler runs before filters — so robots.txt is always accessible, even to AI bots that would otherwise be blocked.
# public/robots.txt — served automatically by static_folder "public"
# Place this file in your public/ directory.
# Kemal's static_folder handler runs before filters — so AI bots
# can always read robots.txt even if they'd be blocked on other routes.
User-agent: *
Allow: /
# AI training bots — blocked
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Meta-ExternalAgent
Disallow: /
User-agent: YouBot
Disallow: /
User-agent: AmazonBot
Disallow: /
User-agent: Diffbot
Disallow: /
# shard.yml — Crystal dependencies
# name: my-app
# dependencies:
# kemal:
# github: kemalcr/kemal
# version: "~> 1.6"Step 6 — noai meta tag in HTML responses
# noai meta tag in Kemal HTML responses
require "kemal"
require "ecr" # Crystal's Embedded Crystal (ERB-like) template engine
# Option A: Inline heredoc
get "/" do |env|
env.response.content_type = "text/html; charset=utf-8"
<<-HTML
<!DOCTYPE html>
<html>
<head>
<meta name="robots" content="noai, noimageai">
<title>My Site</title>
</head>
<body><h1>Welcome</h1></body>
</html>
HTML
end
# Option B: ECR template file (views/index.ecr)
# In views/index.ecr:
# <meta name="robots" content="noai, noimageai">
#
# In route handler:
# get "/" do |env|
# render "views/index.ecr"
# end
# The X-Robots-Tag header set in before_all/after_all covers non-HTML
# responses (JSON, binary). The meta tag covers scrapers that parse HTML.Kemal vs Lucky vs Amber vs Ruby Sinatra
| Feature | Kemal (Crystal) | Lucky (Crystal) | Amber (Crystal) | Sinatra (Ruby) |
|---|---|---|---|---|
| Filter type | before_all / before_get / before_all "/path/*" blocks | Pipes — Lucky::Action pipe with call method, include in actions | before_action filter block at controller level | before filter block (Ruby Sinatra) — same concept, different syntax |
| Short-circuit | halt(env, status_code: 403) — raises exception, route block never runs | return redirect_to ... or custom response — halts action pipeline | respond_with { ... } and return — stops before_action chain | halt 403, "Forbidden" — identical concept to Kemal (Kemal is Sinatra-inspired) |
| after_* filters post-halt | NOT called after halt — set headers BEFORE halt in before_all | after pipes run regardless — safe to set headers there | after_action does not run if before_action halts | after filter runs even after halt — safe to set response headers |
| UA header access | env.request.headers["User-Agent"]? || "" (nil-safe Crystal) | context.request.headers["User-Agent"]?.try(&.first?) || "" | request.headers["User-Agent"]? (Amber wraps Crystal HTTP::Request) | request.env["HTTP_USER_AGENT"] || "" (Rack env) |
| robots.txt | static_folder "public" — serves public/robots.txt before filters run | Lucky::StaticFileHandler or explicit action route | Amber static middleware or explicit route | enable :static; set :public_folder, "public" (Rack::Static) |
| Compiled | Yes — Crystal compiles to native binary, no GC pauses | Yes — Crystal compiled, with Lucky's type-safe routing | Yes — Crystal compiled | No — Ruby interpreted, MRI GC |
| Nil safety | Crystal: String? type, [] raises KeyError, []? returns nil | Same Crystal nil safety via Lucky wrappers | Same Crystal nil safety | Ruby: nil is valid, no nil safety enforcement |
Summary
- Set headers BEFORE halt —
after_allis skipped after a halt. X-Robots-Tag and Content-Type must be set inbefore_allbefore callinghalt. []?for nil-safe header access — Crystal's nil safety: always useenv.request.headers["User-Agent"]?(with?), not[].- static_folder before filters — robots.txt is always served, regardless of bot status.
- before_all "/api/*" — scoped glob filter. Combine with
nextfor path exclusions. - Compiled binary — Crystal compiles to native code. The bot check runs at native speed with no interpreter overhead.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.