How to Block AI Bots on Sinatra (Ruby): Complete 2026 Guide
Sinatra is a lightweight Ruby web framework built directly on Rack — the universal Ruby HTTP interface. Bot blocking uses a standard Rack middleware class with call(env): read env["HTTP_USER_AGENT"], return [403, headers, ["Forbidden"]] to block, or call @app.call(env) to pass through.
Rack middleware — portable across all Ruby frameworks
Sinatra, Rails, Hanami, Roda, and Grape all sit on Rack. The AiBotBlocker class below works unchanged in all of them: use AiBotBlocker in Sinatra, config.middleware.use AiBotBlocker in Rails, use AiBotBlocker in a plain config.ru. Factor it into a shared gem and reuse it across every Ruby project.
Protection layers
Layer 1: robots.txt
Sinatra automatically serves files from the public/ directory. Place robots.txt there — no route required.
# public/robots.txt User-agent: * Allow: / User-agent: GPTBot User-agent: ClaudeBot User-agent: anthropic-ai User-agent: Google-Extended User-agent: CCBot User-agent: Bytespider User-agent: Applebot-Extended User-agent: PerplexityBot User-agent: Diffbot User-agent: cohere-ai User-agent: FacebookBot User-agent: omgili User-agent: omgilibot User-agent: Amazonbot User-agent: DeepSeekBot User-agent: MistralBot User-agent: xAI-Bot User-agent: AI2Bot Disallow: /
Sinatra's static file serving happens via Rack::Static (enabled by default). It checks public/ before routing, so /robots.txt is served before any middleware or route handler runs. Exempt it in your middleware's path check as a belt-and-suspenders measure.
Layer 2: noai meta tag
If your Sinatra app renders HTML with ERB, add the noai meta tag to your base layout and use @robots for per-route overrides:
views/layout.erb
<!DOCTYPE html> <html> <head> <meta name="robots" content="<%= @robots || 'noai, noimageai' %>"> <title><%= @title || 'My App' %></title> </head> <body><%= yield %></body> </html>
Route handler — per-page override
# Default: layout uses "noai, noimageai" get '/' do erb :index # robots = "noai, noimageai" end # Override for a specific public page get '/blog' do @robots = 'index, follow' erb :blog end
If Sinatra is a JSON API, use the X-Robots-Tag header approach (Layer 3) in middleware instead of meta tags.
Layers 3 & 4: Rack middleware
A Rack middleware class needs three things: an initialize(app) that stores the next app, a call(env) method that returns the Rack triplet, and nothing else.
middleware/ai_bot_blocker.rb
# middleware/ai_bot_blocker.rb
AI_BOT_PATTERNS = %w[
gptbot chatgpt-user oai-searchbot
claudebot anthropic-ai claude-web
google-extended ccbot bytespider
applebot-extended perplexitybot diffbot
cohere-ai facebookbot meta-externalagent
omgili omgilibot amazonbot
deepseekbot mistralbot xai-bot ai2-bot
].freeze
EXEMPT_PATHS = %w[/robots.txt /sitemap.xml /favicon.ico].freeze
class AiBotBlocker
def initialize(app)
@app = app
end
def call(env)
# Always pass through exempt paths
path = env['PATH_INFO']
return @app.call(env) if EXEMPT_PATHS.include?(path)
# Read User-Agent — Rack normalises to env['HTTP_USER_AGENT']
ua = env['HTTP_USER_AGENT'].to_s.downcase
if AI_BOT_PATTERNS.any? { |pattern| ua.include?(pattern) }
# Layer 4: hard 403 block — return triplet, do NOT call @app
return [403, { 'content-type' => 'text/plain' }, ['Forbidden']]
end
# Layer 3: pass through, then inject X-Robots-Tag into response
status, headers, body = @app.call(env)
headers['x-robots-tag'] = 'noai, noimageai'
[status, headers, body]
end
endKey points
- Blocking:
[403, {'content-type' => 'text/plain'}, ['Forbidden']]— return the triplet directly. The body is an array of strings (any object witheachworks). Do NOT call@app.call(env). - Reading User-Agent:
env['HTTP_USER_AGENT']— Rack upcases all HTTP headers and adds theHTTP_prefix. Hyphens become underscores..to_sguards against nil (bots that send no User-Agent). - X-Robots-Tag: Call
@app.call(env)to get the triplet, then mutate the headers hash before returning. Unlike Go's net/http (headers must be set before writing), Rack headers are a mutable Ruby Hash returned alongside the body — safe to modify after the inner app returns. - Header case: Rack header keys are lowercase by convention (e.g.,
'content-type','x-robots-tag'). HTTP/2 requires lowercase; Rack normalises on the way out.
Registering the middleware
In a classic-style Sinatra app (top-level DSL):
# app.rb (classic style)
require 'sinatra'
require_relative 'middleware/ai_bot_blocker'
use AiBotBlocker
get '/' do
'Hello, World!'
end
get '/api/data' do
content_type :json
'{"status":"ok"}'
endIn a modular-style Sinatra app (subclassing Sinatra::Base):
# app.rb (modular style)
require 'sinatra/base'
require_relative 'middleware/ai_bot_blocker'
class MyApp < Sinatra::Base
use AiBotBlocker
get '/' do
'Hello, World!'
end
run! if app_file == $0
endIn config.ru (Rack up — works for both styles and any Rack app):
# config.ru require_relative 'app' require_relative 'middleware/ai_bot_blocker' use AiBotBlocker run MyApp
Middleware order in Rack
Rack middleware is a stack — use calls are wrapped in order (first use = outermost = runs first). Register AiBotBlocker before auth, sessions, or body parsing so blocked requests are rejected before any expensive processing. This is the same FIFO order as Express and Gin.
Route-scoped blocking
Use Rack::Builder to apply the middleware only to a specific path prefix. This lets you block bots on /api while leaving the marketing homepage unblocked:
# config.ru — path-scoped middleware with Rack::Builder
require_relative 'app'
require_relative 'middleware/ai_bot_blocker'
# Public routes — no bot blocking
public_app = Rack::Builder.new do
run ->(env) { [200, {'content-type' => 'text/html'}, ['Welcome']] }
end
# API routes — bot blocking applied
api_app = Rack::Builder.new do
use AiBotBlocker
run ->(env) { [200, {'content-type' => 'application/json'}, ['{"ok":true}']] }
end
run Rack::URLMap.new(
'/' => public_app,
'/api' => api_app
)Rack::URLMap routes by path prefix. The AiBotBlocker middleware only wraps the/api sub-application. Public marketing pages at / are unaffected.
Comparison: Sinatra vs Rails vs plain Rack
The AiBotBlocker class is identical across all three — only registration differs:
Sinatra (classic or modular)
# app.rb use AiBotBlocker # config.ru use AiBotBlocker run MyApp
Rails — application.rb
# config/application.rb
module MyApp
class Application < Rails::Application
config.middleware.use AiBotBlocker
# or insert before a specific middleware:
# config.middleware.insert_before Rack::Sendfile, AiBotBlocker
end
endPlain Rack — config.ru
# config.ru
require_relative 'middleware/ai_bot_blocker'
use AiBotBlocker
run ->(env) { [200, {'content-type' => 'text/plain'}, ['OK']] }The middleware class file is the same in all cases. Only the use / registration call differs by framework. This portability is Rack's core value.
Deployment with Puma
# Gemfile source 'https://rubygems.org' gem 'sinatra' gem 'puma' # Run locally bundle exec ruby app.rb # Classic style bundle exec rackup config.ru # Modular / config.ru # Production with Puma bundle exec puma -p 3000 -e production config.ru # Or via Foreman # Procfile: # web: bundle exec puma -p $PORT -e production config.ru
Deploys to Render, Fly.io, Railway, Heroku, and any VPS with Ruby. For server-level blocking before Ruby runs, place nginx in front and use a map $http_user_agent block. See the nginx guide.
Verification
# Should return 403 (blocked AI bot) curl -I -A "GPTBot" http://localhost:4567/ # Should return 200 (regular browser) curl -I -A "Mozilla/5.0" http://localhost:4567/ # robots.txt must always return 200 curl -I -A "GPTBot" http://localhost:4567/robots.txt # Check X-Robots-Tag on legitimate request curl -si -A "Mozilla/5.0" http://localhost:4567/ | grep -i x-robots
Default Sinatra port is 4567. Expected: GPTBot → 403. Mozilla/5.0 → 200 with x-robots-tag: noai, noimageai. robots.txt → 200 for any user agent.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.