Skip to content
Ruby on RailsRubyNew8 min read

How to Block AI Bots on Ruby on Rails

Every new Rails app ships with a public/robots.txt — most developers never edit it. Replace its contents in 30 seconds for the quickest fix, then layer in before_action blocking and Rack middleware for defence-in-depth.

Rails ships public/robots.txt by default

Run ls public/ in any Rails project — you'll find robots.txt already there. The default content is minimal. Just edit it — no new files, no routes, no controllers needed.

Replace public/robots.txt

Rails serves public/ as static assets before the stack runs — this request never touches Ruby.

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

All Methods

public/robots.txt (Recommended)

Easy

All deployments

public/robots.txt

Rails ships with a default public/robots.txt in every new app. Edit it — Rails serves public/ as static assets before the stack runs. No controller, no route, no config needed.

The default file has generic content. Replace it entirely. Plain text only — no ERB syntax.

RobotsController — dynamic robots.txt

Easy

All deployments

app/controllers/robots_controller.rb

A dedicated controller that renders plain text. Useful for environment-based rules (block all in staging) or generating rules from config. Remove public/robots.txt first — it takes precedence.

Use render plain: or a .text.erb template. Register route with format: false to avoid .txt format matching issues.

noai meta tag in application layout

Easy

All deployments

app/views/layouts/application.html.erb

Add <meta name="robots" content="noai, noimageai"> to the application layout's <head>. Applies to all pages using the default layout. Use content_for for per-page override.

Works for HTML responses only. Bots that ignore meta tags still need robots.txt or middleware.

before_action in ApplicationController

Easy

All deployments

app/controllers/application_controller.rb

A before_action that checks request.user_agent and calls head :forbidden for matched bots. Runs before any controller action — no page is rendered for blocked bots.

Use skip_before_action :block_ai_bots in specific controllers to exclude them.

Rack middleware — pre-Rails blocking

Intermediate

All deployments

config/application.rb → config.middleware.use

A Rack middleware class inserted before the Rails stack. Blocks AI bots before routing, session loading, or any Rails processing — the most efficient Ruby-layer method.

Slightly more complex than before_action but runs earlier in the request lifecycle.

nginx reverse proxy

Intermediate

nginx deployments

nginx server block config

Block AI bots in nginx before the request reaches Puma/Unicorn and Rails. Zero Ruby overhead for blocked bots. Standard for VPS deployments via Capistrano or Kamal.

Not available on Heroku without custom buildpacks. Use middleware approach on PaaS.

Method 1: public/robots.txt

Rails serves everything in public/ as static assets — Puma (or your web server) handles these requests before Rails routing runs. Open public/robots.txt and replace its contents:

User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: DeepSeekBot
Disallow: /

User-agent: MistralBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: AI2Bot
Disallow: /

User-agent: Ai2Bot-Dolma
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: DuckAssistBot
Disallow: /

User-agent: omgili
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: webzio-extended
Disallow: /

User-agent: gemini-deep-research
Disallow: /

Method 2: RobotsController

For dynamic robots.txt — different rules per environment, reading from config — create a dedicated controller. First, delete public/robots.txt (it takes precedence over the route), then:

# Generate the controller
rails generate controller Robots index
# config/routes.rb
Rails.application.routes.draw do
  get '/robots.txt', to: 'robots#index', format: false
  # ... rest of routes
end
# app/controllers/robots_controller.rb
class RobotsController < ApplicationController
  skip_before_action :verify_authenticity_token
  skip_before_action :block_ai_bots, raise: false  # don't block robots.txt itself

  AI_BOTS = %w[
    GPTBot ChatGPT-User OAI-SearchBot
    ClaudeBot anthropic-ai Google-Extended
    Bytespider CCBot PerplexityBot
    meta-externalagent Amazonbot Applebot-Extended
    xAI-Bot DeepSeekBot MistralBot Diffbot
    cohere-ai AI2Bot Ai2Bot-Dolma YouBot
    DuckAssistBot omgili omgilibot
    webzio-extended gemini-deep-research
  ].freeze

  def index
    lines = ["User-agent: *", "Allow: /", ""]

    if Rails.env.production?
      AI_BOTS.each do |bot|
        lines << "User-agent: #{bot}" << "Disallow: /" << ""
      end
    else
      # Block all crawlers outside production
      lines = ["User-agent: *", "Disallow: /"]
    end

    lines << "Sitemap: #{request.base_url}/sitemap.xml"

    render plain: lines.join("\n"),
           content_type: "text/plain",
           layout: false
  end
end

format: false in routes

Without format: false, Rails may try to serve the route only for .txt format requests and fail for bare /robots.txt paths in some configurations. Always add format: false to the robots.txt route to prevent unexpected 404s.

Method 3: noai Meta Tag in Application Layout

Add the noai meta tag to app/views/layouts/application.html.erb. All pages using this layout (the default for all controllers) will include it:

<%# app/views/layouts/application.html.erb (excerpt) %>
<!DOCTYPE html>
<html>
  <head>
    <title><%= content_for?(:title) ? yield(:title) : "My App" %></title>
    <meta name="viewport" content="width=device-width,initial-scale=1">

    <%# Block AI training crawlers on every page %>
    <%= yield :robots_meta_override do %>
      <meta name="robots" content="noai, noimageai">
    <% end %>

    <%= csrf_meta_tags %>
    <%= csp_meta_tag %>
    <%= stylesheet_link_tag "application" %>
  </head>
  <body>
    <%= yield %>
  </body>
</html>

To allow AI indexing on a specific page, override in that view:

<%# app/views/blog/show.html.erb %>
<% content_for :robots_meta_override do %>
  <meta name="robots" content="index, follow">
<% end %>

<%# ... rest of view %>

Method 4: before_action in ApplicationController

Add a before_action to app/controllers/application_controller.rb. Since every controller inherits from ApplicationController, this intercepts all requests before any action runs:

# app/controllers/application_controller.rb
class ApplicationController < ActionController::Base
  before_action :block_ai_bots

  private

  BLOCKED_UA_PATTERN = /
    GPTBot|ChatGPT-User|OAI-SearchBot|
    ClaudeBot|anthropic-ai|Google-Extended|
    Bytespider|CCBot|PerplexityBot|
    meta-externalagent|Amazonbot|Applebot-Extended|
    xAI-Bot|DeepSeekBot|MistralBot|Diffbot|
    cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|
    DuckAssistBot|omgili|omgilibot|
    webzio-extended|gemini-deep-research
  /xi

  def block_ai_bots
    ua = request.user_agent.to_s
    head :forbidden if BLOCKED_UA_PATTERN.match?(ua)
  end
end

To allow AI bots to reach a specific controller (e.g. your robots.txt controller or a public API), use skip_before_action:

# app/controllers/robots_controller.rb
class RobotsController < ApplicationController
  skip_before_action :block_ai_bots, raise: false
  # ...
end

# app/controllers/api/v1/base_controller.rb
class Api::V1::BaseController < ApplicationController
  skip_before_action :block_ai_bots, raise: false
  # Public API — let AI bots access if desired
end

Method 5: Rack Middleware

A Rack middleware class runs before the Rails stack — before routing, before session loading, before ActionController. Create the file and insert it in config/application.rb:

# lib/middleware/block_ai_bots.rb
module Middleware
  class BlockAiBots
    BLOCKED_UAS = /GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|webzio-extended|gemini-deep-research/i

    def initialize(app)
      @app = app
    end

    def call(env)
      ua = env['HTTP_USER_AGENT'].to_s
      if BLOCKED_UAS.match?(ua)
        [403, { 'Content-Type' => 'text/plain' }, ['Forbidden']]
      else
        @app.call(env)
      end
    end
  end
end
# config/application.rb
require_relative '../../lib/middleware/block_ai_bots'

module YourApp
  class Application < Rails::Application
    # Insert before the Rails stack
    config.middleware.insert_before 0, Middleware::BlockAiBots
  end
end

Method 6: nginx Reverse Proxy

Production Rails apps typically run Puma behind nginx (deployed via Capistrano, Kamal, or manually). Add a user agent check to nginx — matched bots never reach Puma or Ruby:

# /etc/nginx/sites-available/yourapp
upstream puma {
    server unix:///var/www/yourapp/shared/tmp/sockets/puma.sock;
}

server {
    listen 80;
    server_name yourdomain.com;
    root /var/www/yourapp/current/public;

    # Block AI training crawlers — before Puma/Ruby
    if ($http_user_agent ~* "(GPTBot|ClaudeBot|anthropic-ai|CCBot|Bytespider|Google-Extended|PerplexityBot|Diffbot|DeepSeekBot|MistralBot|cohere-ai|meta-externalagent|Amazonbot|xAI-Bot|AI2Bot|omgili|webzio-extended|gemini-deep-research|OAI-SearchBot|ChatGPT-User)") {
        return 403;
    }

    # Serve public/ directly (including robots.txt) — bypass Rails
    try_files $uri/index.html $uri @puma;

    location @puma {
        proxy_pass http://puma;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

AI Bots to Block

25 user agents covering AI training crawlers and AI search bots. The robots.txt, before_action, and middleware patterns above include all of them.

GPTBotChatGPT-UserOAI-SearchBotClaudeBotanthropic-aiGoogle-ExtendedBytespiderCCBotPerplexityBotmeta-externalagentAmazonbotApplebot-ExtendedxAI-BotDeepSeekBotMistralBotDiffbotcohere-aiAI2BotAi2Bot-DolmaYouBotDuckAssistBotomgiliomgilibotwebzio-extendedgemini-deep-research

Frequently Asked Questions

Does Rails have a robots.txt file by default?

Yes. Every new Rails app generated with rails new includes a public/robots.txt file. Rails serves the public/ directory as static assets before the Rails stack processes requests — no route or controller needed. The default file contains only a generic comment. Simply replace its content with your AI bot blocking rules. This is the fastest method and works on all Rails deployment targets.

How do I create a dynamic robots.txt controller in Rails?

Generate a controller with rails generate controller Robots index, then add get '/robots.txt', to: 'robots#index', format: false to config/routes.rb. In the action, call render plain: content to return a text/plain response. Alternatively, create an ERB template at app/views/robots/index.text.erb — Rails serves it with the correct Content-Type when the route format is :text. Remove public/robots.txt first — the static file takes precedence over the controller if both exist.

How do I use before_action to block AI bots in Rails?

Add a private method to ApplicationController that checks request.user_agent against a regex of AI bot names and calls head :forbidden (HTTP 403) if matched. Register it with before_action :block_ai_bots to apply it to all controller actions. Since ApplicationController is the base class for all controllers in a Rails app, this blocks bots before any action runs. You can exclude specific controllers using skip_before_action :block_ai_bots.

What is the difference between before_action and Rack middleware for bot blocking in Rails?

before_action runs inside the Rails stack — after routing, after session handling, but before your controller action. It has access to the full Rails request object. Rack middleware runs before Rails itself — before routing, before session loading, before any Rails processing. Rack middleware is more efficient (slightly faster, uses less memory) because it short-circuits earlier. For most Rails apps, before_action is simpler to implement and maintain. Rack middleware is preferable for high-traffic apps or when you want to block bots before any Rails overhead.

How do I add noai meta tags to every Rails page?

Add the meta tag to your application layout template at app/views/layouts/application.html.erb. Place <meta name="robots" content="noai, noimageai"> inside the <head> section, typically after the existing meta charset and viewport tags. This applies to every page that uses the application layout (all pages by default). For per-page control, use a content_for block: define <%= yield :head %> in the layout, then use <% content_for :head do %><meta name="robots" content="index, follow"><% end %> in specific views.

Does blocking AI bots affect Rails caching or ActionView?

No. Rails page caching, fragment caching, and ActionView are completely unaffected by robots.txt directives or noai meta tags. If you use before_action to return 403, the response bypasses ActionView rendering entirely — no views are rendered, no cache is read or written for blocked requests. If you use public/robots.txt (static file), the Rails stack never runs for that request at all.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

Related Guides