How to Block AI Bots in Perl Dancer2
Dancer2 is a lightweight Perl web framework designed around a clean DSL and a PSGI-native architecture. It is widely used in enterprise Perl shops, internal tooling, and APIs where Mojolicious feels too heavyweight. Dancer2 provides a hook before keyword that fires before every route handler. The Dancer2-specific detail: short-circuiting in a before hook means calling send_error() — a single call that creates a Dancer2::Core::Error, marks the response as halted, and skips the route handler. This is simpler than Mojolicious (which needs both $c->render() and return) but produces the same outcome.
1. Bot pattern module
Define patterns in a separate module, exported with Exporter. Shared by hooks, route handlers, and tests. index() is a literal substring search — no regex engine overhead. Apply lc() to the UA string once before iterating.
# lib/MyApp/BotFilter.pm
package MyApp::BotFilter;
use strict;
use warnings;
use Exporter 'import';
our @EXPORT_OK = qw(is_ai_bot);
# All lowercase — matched against lc($ua)
my @AI_BOT_PATTERNS = qw(
gptbot
chatgpt-user
claudebot
anthropic-ai
ccbot
google-extended
cohere-ai
meta-externalagent
bytespider
omgili
diffbot
imagesiftbot
magpie-crawler
amazonbot
dataprovider
netcraft
);
sub is_ai_bot {
my ($ua) = @_;
return 0 unless defined $ua && length $ua;
my $ua_lower = lc $ua;
# index() — literal substring, no regex engine, no backtracking
return 1 if grep { index($ua_lower, $_) >= 0 } @AI_BOT_PATTERNS;
return 0;
}
1;2. hook before + hook after — global protection
Register before and after hooks in the application. The before hook sets the header on the response object before calling send_error() — headers must be set before the error is raised. The after hook injects X-Robots-Tag into every response that was not blocked.
# lib/MyApp.pm — Dancer2 application
package MyApp;
use Dancer2;
use MyApp::BotFilter qw(is_ai_bot);
# ── AI-bot before hook — runs before every route handler ─────────────────────
hook before => sub {
# Path guard: include this for safety across all deployment modes.
# In PSGI deployments with Plack::Middleware::Static, robots.txt is
# served before Dancer2 — this guard is a no-op there.
# With the built-in dev server, robots.txt goes through this hook —
# the guard ensures bots can still read the file and see they're blocked.
return if request->path eq '/robots.txt';
my $ua = request->header('User-Agent') // ''; # undef-safe
if (is_ai_bot($ua)) {
# Set header on the response object before short-circuiting
response->header('X-Robots-Tag' => 'noai, noimageai');
# send_error creates a Dancer2::Core::Error, marks response halted,
# and skips the route handler — a single call stops everything
send_error('Forbidden', 403);
}
# Pass-through: no explicit action needed here — after hook handles header
};
# ── after hook — add X-Robots-Tag to all passing responses ───────────────────
# Fires after the route handler for requests that were not blocked.
# Blocked requests are halted before reaching the route handler,
# so their header is set in the before hook above.
hook after => sub {
response->header('X-Robots-Tag' => 'noai, noimageai');
};
# ── Routes ────────────────────────────────────────────────────────────────────
get '/' => sub {
'Hello'
};
get '/api/data' => sub {
content_type 'application/json';
return '{"data":"value"}';
};
# Explicit robots.txt route (optional — public/robots.txt auto-served in most setups)
get '/robots.txt' => sub {
content_type 'text/plain';
send_file 'robots.txt', system_path => 1;
};
1;3. Single-file script variant
Dancer2 also works as a single-file script — the DSL keywords are available at the script level without a class. Call dance at the end to start the server. Useful for quick prototypes or small internal tools.
#!/usr/bin/env perl
# app.pl — Dancer2 script (single-file variant)
use Dancer2;
use strict;
use warnings;
my @AI_BOT_PATTERNS = qw(
gptbot chatgpt-user claudebot anthropic-ai ccbot
google-extended cohere-ai meta-externalagent bytespider
omgili diffbot imagesiftbot magpie-crawler amazonbot
dataprovider netcraft
);
sub is_ai_bot {
my ($ua) = @_;
return 0 unless defined $ua && length $ua;
my $lower = lc $ua;
return grep { index($lower, $_) >= 0 } @AI_BOT_PATTERNS;
}
hook before => sub {
return if request->path eq '/robots.txt';
my $ua = request->header('User-Agent') // '';
if (is_ai_bot($ua)) {
response->header('X-Robots-Tag' => 'noai, noimageai');
send_error('Forbidden', 403);
}
};
hook after => sub {
response->header('X-Robots-Tag' => 'noai, noimageai');
};
get '/' => sub { 'Hello' };
dance;4. Dancer2 plugin — reusable across apps
Encapsulate the hooks in a Dancer2::Plugin subclass so the bot blocker can be added to any Dancer2 application with a single use statement. The on_plugin_import block runs when the plugin is loaded and registers hooks directly on the app object.
# lib/MyApp/Plugin/BotBlocker.pm — reusable Dancer2 plugin
# Encapsulates the bot-blocker hook for use across multiple Dancer2 apps.
package MyApp::Plugin::BotBlocker;
use Dancer2::Plugin;
use MyApp::BotFilter qw(is_ai_bot);
on_plugin_import {
my $dsl = shift;
$dsl->app->add_hook(
Dancer2::Core::Hook->new(
name => 'before',
code => sub {
return if $dsl->app->request->path eq '/robots.txt';
my $ua = $dsl->app->request->header('User-Agent') // '';
if (is_ai_bot($ua)) {
$dsl->app->response->header('X-Robots-Tag' => 'noai, noimageai');
$dsl->send_error('Forbidden', 403);
}
},
)
);
$dsl->app->add_hook(
Dancer2::Core::Hook->new(
name => 'after',
code => sub {
$dsl->app->response->header('X-Robots-Tag' => 'noai, noimageai');
},
)
);
};
register_plugin;
1;
# Usage in any Dancer2 app:
# use MyApp::Plugin::BotBlocker;5. PSGI deployment with Plack::Middleware::Static
In production, Dancer2 is deployed as a PSGI app via to_app(). Wrapping it with Plack::Middleware::Static serves public/ files at the Plack layer — before Dancer2 handles the request. This means robots.txt is served without triggering Dancer2 hooks at all. The path guard in the before hook becomes a no-op but costs nothing to keep.
# app.psgi — production PSGI deployment
use strict;
use warnings;
use Plack::Builder;
use MyApp;
# Plack::Middleware::Static serves public/ BEFORE Dancer2 handles the request.
# robots.txt is served here — Dancer2 before hooks never fire for it.
# The path guard in the before hook is a no-op, but costs nothing to keep.
builder {
# Serve public/ directory (includes robots.txt) before Dancer2
enable 'Static',
path => qr{^/robots.txt$|^/favicon.ico$|^/assets/},
root => './public';
# Optional: gzip compression
enable 'Deflater',
content_type => ['text/html', 'application/json'];
# Dancer2 application
MyApp->to_app;
};6. public/robots.txt
Place robots.txt in the public/ directory. In PSGI deployments with Plack::Middleware::Static, it is served at the Plack layer and never reaches Dancer2. With the built-in development server (dancer2 -a MyApp), the path guard in the before hook ensures AI crawlers can still fetch the file.
# public/robots.txt
# Served from public/ — accessible to all crawlers even when blocker is active.
User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /Key points
- send_error() is a single-call halt:
send_error('Forbidden', 403)creates aDancer2::Core::Error, marks the response halted, and skips the route handler. Unlike Mojolicious, you do not need a separatereturn—send_error()throws an exception that Dancer2 catches internally. - Set headers before send_error(): Call
response->header('X-Robots-Tag' => ...)before callingsend_error(). After the error is raised, Dancer2's error-handling chain controls the response — headers set after the call may not appear. - after hook for pass-through: The
afterhook fires after the route handler for all requests that completed normally. It does not fire for halted (blocked) requests — there is no duplication. - Path guard for dev server: With
dancer2 -a MyApporplackup app.plwithoutPlack::Middleware::Static, all requests including/robots.txtgo through thebeforehook. Therequest->path eq '/robots.txt'guard ensures AI crawlers can fetch the file in all deployment modes. - request->header() vs request->user_agent(): Both work.
request->header('User-Agent')is the generic form;request->user_agent()is a convenience alias. The// ''defined-or default is required for the generic form since the header may be absent. - PSGI-native: Dancer2 is built on PSGI from the ground up.
MyApp->to_app()returns a PSGI coderef — no adapter needed. This makes Plack middleware integration first-class.
Framework comparison — Perl web frameworks
| Framework | Hook | Short-circuit | UA header |
|---|---|---|---|
| Dancer2 | hook before => sub | send_error('Forbidden', 403) | request->header('User-Agent') |
| Mojolicious | hook before_dispatch => sub | $c->render(status=>403); return | $c->req->headers->user_agent |
| Plack/PSGI (raw) | middleware closure | return [403, [...], ['Forbidden']] | $env->{HTTP_USER_AGENT} |
| Catalyst | auto action | $c->res->status(403); $c->detach() | $c->req->header('User-Agent') |
Dancer2's send_error() is the most concise short-circuit across Perl frameworks — one call vs Mojolicious's render+return or Catalyst's status+detach. All three frameworks are PSGI-native and support Plack middleware layering, which is where Plack::Middleware::Static can bypass hooks for static file requests entirely.
Dependencies
# Install from CPAN
cpanm Dancer2
cpanm Plack # PSGI server runner
cpanm Plack::Middleware::Static # production static file serving
# cpanfile:
requires 'Dancer2', '>= 1.0.0';
requires 'Plack', '>= 1.0047';
# Run development server
dancer2 -a MyApp
# Run with Plack
plackup app.psgi
# Production (Starman multi-process)
cpanm Starman
starman app.psgi --workers 4 --port 8080