Skip to content

How to Block AI Bots in Ruby Roda

Roda is a routing-tree web framework for Ruby built on Rack. Unlike Rails or Sinatra, Roda evaluates routes as a tree — each r.on / r.get call consumes path segments progressively. Bot blocking uses the plugin :hooks plugin's before block to intercept all requests before routing begins. request.halt([status, headers, body]) throws :halt — Roda catches it and returns the given Rack response immediately, bypassing the entire routing tree. The header key is request.env['HTTP_USER_AGENT'] — Rack stores HTTP headers with an HTTP_ prefix and underscores in place of hyphens.

1. Bot detection

Pure Ruby, no gems. String#include? for literal substring matching — no regex, no external dependencies. Enumerable#any? short-circuits on first match.

# bot_utils.rb — AI bot detection, no gems required

AI_BOT_PATTERNS = %w[
  gptbot
  chatgpt-user
  claudebot
  anthropic-ai
  ccbot
  google-extended
  cohere-ai
  meta-externalagent
  bytespider
  omgili
  diffbot
  imagesiftbot
  magpie-crawler
  amazonbot
  dataprovider
  netcraft
].freeze

# Returns true if the User-Agent string matches a known AI crawler.
# String#include? — literal substring match, no regex.
def ai_bot?(ua)
  return false if ua.nil? || ua.empty?
  lower = ua.downcase
  AI_BOT_PATTERNS.any? { |p| lower.include?(p) }
end

2. Before hook — plugin :hooks

plugin :hooks must be declared before the before block — it is not part of Roda's minimal core. Inside the hook, use request (not r — that is only the route block parameter). Use next to skip the block and continue to routing; use request.halt to terminate immediately with a Rack response array.

# app.rb — Roda application with before hook bot blocking
require 'roda'
require_relative 'bot_utils'

class App < Roda
  # plugin :hooks must be declared before using before/after blocks.
  # It is not part of Roda's minimal core — omitting it raises NoMethodError.
  plugin :hooks

  before do
    # Pass robots.txt through — crawlers read it to discover Disallow rules.
    # Ruby next exits the before block; routing continues normally.
    next if request.path == '/robots.txt'

    # Rack env — headers are stored with HTTP_ prefix, uppercased, hyphens → underscores.
    # request.env is the raw Rack environment hash. Returns nil when absent.
    ua = request.env['HTTP_USER_AGENT'] || ''

    if ai_bot?(ua)
      # request.halt takes a Rack-compatible response array: [status, headers, body].
      # Body MUST be an array of strings — Rack spec requirement.
      # halt throws :halt, caught by Roda — stops all routing immediately.
      # Inside before hooks use request, not r (r is the route block parameter).
      request.halt [
        403,
        {
          'Content-Type'  => 'text/plain',
          'X-Robots-Tag'  => 'noai, noimageai',
        },
        ['Forbidden'],
      ]
    else
      # Non-bot: set X-Robots-Tag then fall through to routing.
      # response[] = sets a header on the outgoing response.
      response['X-Robots-Tag'] = 'noai, noimageai'
    end
  end

  route do |r|
    r.get 'robots.txt' do
      response['Content-Type'] = 'text/plain'
      <<~TXT
        User-agent: *
        Allow: /

        User-agent: GPTBot
        Disallow: /

        User-agent: ClaudeBot
        Disallow: /

        User-agent: CCBot
        Disallow: /

        User-agent: Google-Extended
        Disallow: /
      TXT
    end

    r.root do
      response['Content-Type'] = 'application/json'
      '{"message":"Hello"}'
    end

    r.on 'api' do
      r.get 'data' do
        response['Content-Type'] = 'application/json'
        '{"data":"value"}'
      end
    end
  end
end

3. config.ru

Roda is a Rack application. Run with bundle exec rackup (Puma, Falcon, or WEBrick) or pass to any Rack-compatible server.

# config.ru — Rack entry point
require_relative 'app'

run App

4. Routing tree guard — r.halt inside route

If you prefer not to use plugin :hooks, put the check at the top of the route block. Inside the route block, r is the request object — r.halt and request.halt are the same method. This approach is simpler when you only need one check point.

# Alternative: guard inside the routing tree instead of a before hook.
# No plugin :hooks needed — r.halt short-circuits the routing tree directly.
# Use this when you want path-based scoping (e.g., only block under /api).

class App < Roda
  route do |r|
    # Serve robots.txt unconditionally — comes before the bot check
    r.get 'robots.txt' do
      response['Content-Type'] = 'text/plain'
      "User-agent: GPTBot\nDisallow: /\n"
    end

    # Bot check at the top of the routing tree — fires for all remaining paths.
    ua = request.env['HTTP_USER_AGENT'] || ''
    if ai_bot?(ua)
      r.halt [
        403,
        { 'Content-Type' => 'text/plain', 'X-Robots-Tag' => 'noai, noimageai' },
        ['Forbidden'],
      ]
    end

    response['X-Robots-Tag'] = 'noai, noimageai'

    r.root { '{"message":"Hello"}' }

    r.on 'api' do
      r.get('data') { '{"data":"value"}' }
    end
  end
end

5. Rack middleware via plugin :middleware

plugin :middleware lets you use a Roda app as Rack middleware with use BotBlockerMiddleware in config.ru. When the route block returns without halting, Roda calls the downstream app. This is useful for inserting bot blocking in front of a non-Roda Rack application (Rails, Sinatra, Hanami, etc.).

# Roda as Rack middleware via plugin :middleware.
# Useful when embedding bot blocking in front of another Rack application.
# The middleware passes through to the downstream app when no halt is thrown.

require 'roda'
require_relative 'bot_utils'

class BotBlockerMiddleware < Roda
  # plugin :middleware enables use as Rack middleware: use BotBlockerMiddleware
  plugin :middleware

  route do |r|
    ua = request.env['HTTP_USER_AGENT'] || ''

    if ai_bot?(ua) && request.path != '/robots.txt'
      r.halt [
        403,
        { 'Content-Type' => 'text/plain', 'X-Robots-Tag' => 'noai, noimageai' },
        ['Forbidden'],
      ]
    end

    # No explicit match — Roda middleware passes the request to the next app.
  end
end

# config.ru with a downstream Rack app:
#
# require_relative 'bot_blocker_middleware'
# require_relative 'main_app'
#
# use BotBlockerMiddleware
# run MainApp

6. robots.txt

# public/robots.txt
User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Key points

Framework comparison — Ruby web frameworks

FrameworkHook / FilterBlockUA header
Rodaplugin :hooks; before { }request.halt([403, h, ['Forbidden']])request.env['HTTP_USER_AGENT']
Sinatrabefore do ... end (built-in)halt 403, 'Forbidden'request.user_agent
Grapebefore do ... end (built-in)error!('Forbidden', 403)headers['User-Agent']
Railsbefore_action :check_bothead :forbiddenrequest.user_agent

Roda's request.halt takes a full Rack response array, giving precise control over status, headers, and body. Sinatra's halt accepts a bare status and string body — more convenient but less explicit. The key Roda-specific requirement is plugin :hooks; Sinatra and Grape include before-hook support in their cores.

Dependencies & running

# Gemfile
gem 'roda'
gem 'puma'    # recommended production server

# Install
bundle install

# Run
bundle exec rackup            # uses WEBrick by default
bundle exec rackup -s puma    # Puma

# Roda version: 3.x (hooks plugin stable since 3.0)
# Ruby: 2.5+ supported; 3.1+ recommended