Skip to content
Guides/Sinatra (Ruby)

How to Block AI Bots on Sinatra (Ruby): Complete 2026 Guide

Sinatra is a lightweight Ruby web framework built directly on Rack — the universal Ruby HTTP interface. Bot blocking uses a standard Rack middleware class with call(env): read env["HTTP_USER_AGENT"], return [403, headers, ["Forbidden"]] to block, or call @app.call(env) to pass through.

Rack middleware — portable across all Ruby frameworks

Sinatra, Rails, Hanami, Roda, and Grape all sit on Rack. The AiBotBlocker class below works unchanged in all of them: use AiBotBlocker in Sinatra, config.middleware.use AiBotBlocker in Rails, use AiBotBlocker in a plain config.ru. Factor it into a shared gem and reuse it across every Ruby project.

Protection layers

1
robots.txtPlace at public/robots.txt — Sinatra auto-serves public/ at the web root, no route needed
2
noai meta tagERB layout.erb with <%= @robots || "noai, noimageai" %> — per-route override via @robots instance var
3
X-Robots-Tag headerInject into status/headers/body triplet after @app.call(env) returns
4
Hard 403 blockReturn [403, {"content-type" => "text/plain"}, ["Forbidden"]] — do not call @app.call(env)

Layer 1: robots.txt

Sinatra automatically serves files from the public/ directory. Place robots.txt there — no route required.

# public/robots.txt

User-agent: *
Allow: /

User-agent: GPTBot
User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: Google-Extended
User-agent: CCBot
User-agent: Bytespider
User-agent: Applebot-Extended
User-agent: PerplexityBot
User-agent: Diffbot
User-agent: cohere-ai
User-agent: FacebookBot
User-agent: omgili
User-agent: omgilibot
User-agent: Amazonbot
User-agent: DeepSeekBot
User-agent: MistralBot
User-agent: xAI-Bot
User-agent: AI2Bot
Disallow: /

Sinatra's static file serving happens via Rack::Static (enabled by default). It checks public/ before routing, so /robots.txt is served before any middleware or route handler runs. Exempt it in your middleware's path check as a belt-and-suspenders measure.

Layer 2: noai meta tag

If your Sinatra app renders HTML with ERB, add the noai meta tag to your base layout and use @robots for per-route overrides:

views/layout.erb

<!DOCTYPE html>
<html>
<head>
  <meta name="robots" content="<%= @robots || 'noai, noimageai' %>">
  <title><%= @title || 'My App' %></title>
</head>
<body><%= yield %></body>
</html>

Route handler — per-page override

# Default: layout uses "noai, noimageai"
get '/' do
  erb :index  # robots = "noai, noimageai"
end

# Override for a specific public page
get '/blog' do
  @robots = 'index, follow'
  erb :blog
end

If Sinatra is a JSON API, use the X-Robots-Tag header approach (Layer 3) in middleware instead of meta tags.

Layers 3 & 4: Rack middleware

A Rack middleware class needs three things: an initialize(app) that stores the next app, a call(env) method that returns the Rack triplet, and nothing else.

middleware/ai_bot_blocker.rb

# middleware/ai_bot_blocker.rb

AI_BOT_PATTERNS = %w[
  gptbot chatgpt-user oai-searchbot
  claudebot anthropic-ai claude-web
  google-extended ccbot bytespider
  applebot-extended perplexitybot diffbot
  cohere-ai facebookbot meta-externalagent
  omgili omgilibot amazonbot
  deepseekbot mistralbot xai-bot ai2-bot
].freeze

EXEMPT_PATHS = %w[/robots.txt /sitemap.xml /favicon.ico].freeze

class AiBotBlocker
  def initialize(app)
    @app = app
  end

  def call(env)
    # Always pass through exempt paths
    path = env['PATH_INFO']
    return @app.call(env) if EXEMPT_PATHS.include?(path)

    # Read User-Agent — Rack normalises to env['HTTP_USER_AGENT']
    ua = env['HTTP_USER_AGENT'].to_s.downcase

    if AI_BOT_PATTERNS.any? { |pattern| ua.include?(pattern) }
      # Layer 4: hard 403 block — return triplet, do NOT call @app
      return [403, { 'content-type' => 'text/plain' }, ['Forbidden']]
    end

    # Layer 3: pass through, then inject X-Robots-Tag into response
    status, headers, body = @app.call(env)
    headers['x-robots-tag'] = 'noai, noimageai'
    [status, headers, body]
  end
end

Key points

  • Blocking: [403, {'content-type' => 'text/plain'}, ['Forbidden']] — return the triplet directly. The body is an array of strings (any object with each works). Do NOT call @app.call(env).
  • Reading User-Agent: env['HTTP_USER_AGENT'] — Rack upcases all HTTP headers and adds the HTTP_ prefix. Hyphens become underscores. .to_s guards against nil (bots that send no User-Agent).
  • X-Robots-Tag: Call @app.call(env) to get the triplet, then mutate the headers hash before returning. Unlike Go's net/http (headers must be set before writing), Rack headers are a mutable Ruby Hash returned alongside the body — safe to modify after the inner app returns.
  • Header case: Rack header keys are lowercase by convention (e.g., 'content-type', 'x-robots-tag'). HTTP/2 requires lowercase; Rack normalises on the way out.

Registering the middleware

In a classic-style Sinatra app (top-level DSL):

# app.rb (classic style)
require 'sinatra'
require_relative 'middleware/ai_bot_blocker'

use AiBotBlocker

get '/' do
  'Hello, World!'
end

get '/api/data' do
  content_type :json
  '{"status":"ok"}'
end

In a modular-style Sinatra app (subclassing Sinatra::Base):

# app.rb (modular style)
require 'sinatra/base'
require_relative 'middleware/ai_bot_blocker'

class MyApp < Sinatra::Base
  use AiBotBlocker

  get '/' do
    'Hello, World!'
  end

  run! if app_file == $0
end

In config.ru (Rack up — works for both styles and any Rack app):

# config.ru
require_relative 'app'
require_relative 'middleware/ai_bot_blocker'

use AiBotBlocker
run MyApp

Middleware order in Rack

Rack middleware is a stack — use calls are wrapped in order (first use = outermost = runs first). Register AiBotBlocker before auth, sessions, or body parsing so blocked requests are rejected before any expensive processing. This is the same FIFO order as Express and Gin.

Route-scoped blocking

Use Rack::Builder to apply the middleware only to a specific path prefix. This lets you block bots on /api while leaving the marketing homepage unblocked:

# config.ru — path-scoped middleware with Rack::Builder

require_relative 'app'
require_relative 'middleware/ai_bot_blocker'

# Public routes — no bot blocking
public_app = Rack::Builder.new do
  run ->(env) { [200, {'content-type' => 'text/html'}, ['Welcome']] }
end

# API routes — bot blocking applied
api_app = Rack::Builder.new do
  use AiBotBlocker
  run ->(env) { [200, {'content-type' => 'application/json'}, ['{"ok":true}']] }
end

run Rack::URLMap.new(
  '/'    => public_app,
  '/api' => api_app
)

Rack::URLMap routes by path prefix. The AiBotBlocker middleware only wraps the/api sub-application. Public marketing pages at / are unaffected.

Comparison: Sinatra vs Rails vs plain Rack

The AiBotBlocker class is identical across all three — only registration differs:

Sinatra (classic or modular)

# app.rb
use AiBotBlocker

# config.ru
use AiBotBlocker
run MyApp

Rails — application.rb

# config/application.rb
module MyApp
  class Application < Rails::Application
    config.middleware.use AiBotBlocker
    # or insert before a specific middleware:
    # config.middleware.insert_before Rack::Sendfile, AiBotBlocker
  end
end

Plain Rack — config.ru

# config.ru
require_relative 'middleware/ai_bot_blocker'

use AiBotBlocker
run ->(env) { [200, {'content-type' => 'text/plain'}, ['OK']] }

The middleware class file is the same in all cases. Only the use / registration call differs by framework. This portability is Rack's core value.

Deployment with Puma

# Gemfile
source 'https://rubygems.org'
gem 'sinatra'
gem 'puma'

# Run locally
bundle exec ruby app.rb         # Classic style
bundle exec rackup config.ru    # Modular / config.ru

# Production with Puma
bundle exec puma -p 3000 -e production config.ru

# Or via Foreman
# Procfile:
# web: bundle exec puma -p $PORT -e production config.ru

Deploys to Render, Fly.io, Railway, Heroku, and any VPS with Ruby. For server-level blocking before Ruby runs, place nginx in front and use a map $http_user_agent block. See the nginx guide.

Verification

# Should return 403 (blocked AI bot)
curl -I -A "GPTBot" http://localhost:4567/

# Should return 200 (regular browser)
curl -I -A "Mozilla/5.0" http://localhost:4567/

# robots.txt must always return 200
curl -I -A "GPTBot" http://localhost:4567/robots.txt

# Check X-Robots-Tag on legitimate request
curl -si -A "Mozilla/5.0" http://localhost:4567/ | grep -i x-robots

Default Sinatra port is 4567. Expected: GPTBot → 403. Mozilla/5.0 → 200 with x-robots-tag: noai, noimageai. robots.txt → 200 for any user agent.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.