How to Block AI Bots on Axum (Rust): Complete 2026 Guide
Axum is the Tokio team's Rust web framework — built on Tower, ergonomic extractors, and zero-cost async. Its middleware system uses axum::middleware::from_fn() for simple async closures and Tower's full Layer ecosystem for advanced use cases. The key architectural distinction: Router::layer() fires on all requests including 404s — critical for blocking bots that probe non-existent paths.
layer() vs route_layer() — the blocking distinction
Router::layer() applies middleware to every request — matched routes, 404s, and OPTIONS preflight. Router::route_layer() only fires on routes that match. For bot blocking, use .layer() — AI crawlers frequently probe /wp-admin, /sitemap.xml, and other paths that may not exist. Both short-circuit before the handler runs.
Protection layers
Dependencies (Cargo.toml)
# Cargo.toml — required dependencies
[dependencies]
axum = "0.7"
tokio = { version = "1", features = ["full"] }
tower = "0.4"
tower-http = { version = "0.5", features = ["fs", "trace", "compression-full", "set-header"] }Step 1 — Shared bot list (src/bots.rs)
A &[&str] slice — zero runtime cost, embedded in the binary. Case-insensitive substring matching handles all User-Agent capitalisation variants.
// src/bots.rs — shared AI bot list
pub const AI_BOTS: &[&str] = &[
// OpenAI
"gptbot", "chatgpt-user", "oai-searchbot",
// Anthropic
"claudebot", "claude-web",
// Common Crawl
"ccbot",
// Bytedance
"bytespider",
// Meta
"meta-externalagent",
// Perplexity
"perplexitybot",
// Google AI
"google-extended", "googleother",
// Cohere
"cohere-ai",
// Amazon
"amazonbot",
// Diffbot
"diffbot",
// AI2
"ai2bot",
// DeepSeek
"deepseekbot",
// Mistral
"mistralai-user",
// xAI
"xai-bot",
// You.com
"youbot",
// DuckDuckGo AI
"duckassistbot",
];
pub fn is_ai_bot(user_agent: &str) -> bool {
let ua = user_agent.to_lowercase();
AI_BOTS.iter().any(|bot| ua.contains(bot))
}Step 2 — Middleware via from_fn (recommended)
An async function that returns Response. For AI bots, return immediately with StatusCode::FORBIDDEN — the handler and all downstream Tower layers never run. For legitimate requests, call next.run(req).await and add the X-Robots-Tag header to the response.
// src/middleware.rs — AI bot blocking middleware
use axum::{
body::Body,
extract::Request,
http::{header, HeaderName, HeaderValue, StatusCode},
middleware::Next,
response::{IntoResponse, Response},
};
use crate::bots::is_ai_bot;
/// Global AI bot blocker.
/// Apply with Router::layer(from_fn(ai_bot_blocker)).
pub async fn ai_bot_blocker(req: Request, next: Next) -> Response {
let ua = req
.headers()
.get(header::USER_AGENT)
.and_then(|v| v.to_str().ok())
.unwrap_or("");
if is_ai_bot(ua) {
// Short-circuit — handler never runs, zero computation wasted
return (
StatusCode::FORBIDDEN,
[(
HeaderName::from_static("x-robots-tag"),
HeaderValue::from_static("noai, noimageai"),
)],
"Forbidden",
)
.into_response();
}
// Pass through to the handler
let mut res = next.run(req).await;
// Add X-Robots-Tag to all legitimate responses (belt + suspenders)
res.headers_mut().insert(
HeaderName::from_static("x-robots-tag"),
HeaderValue::from_static("noai, noimageai"),
);
res
}Step 3 — Router wiring: layer() vs route_layer()
Use Router::layer() on the outermost router for global coverage (including 404 probing). Use nest().route_layer() for scoped protection of specific route groups. You can combine both.
// src/main.rs — global and scoped blocking with Router::layer vs route_layer
use axum::{middleware::from_fn, Router, routing::get};
use tower_http::services::ServeDir;
mod bots;
mod middleware;
mod handlers;
#[tokio::main]
async fn main() {
let app = Router::new()
.route("/", get(handlers::index))
.route("/health", get(handlers::health))
// Protected API routes — scoped middleware via route_layer
// (route_layer only fires on matched routes — not on 404s)
.nest(
"/api",
Router::new()
.route("/data", get(handlers::api_data))
.route("/users", get(handlers::users))
// route_layer: fires on /api/* matches only
// Use route_layer here because /api/* is already isolated
.route_layer(from_fn(middleware::ai_bot_blocker)),
)
// Static files — robots.txt auto-served at /robots.txt
.nest_service("/", ServeDir::new("./static"))
// GLOBAL: Router::layer fires on ALL requests, including unmatched
// routes that produce 404s — important: bots probe non-existent paths.
// This is the key difference from route_layer.
.layer(from_fn(middleware::ai_bot_blocker));
let listener = tokio::net::TcpListener::bind("0.0.0.0:8080").await.unwrap();
axum::serve(listener, app).await.unwrap();
}Step 4 — Dynamic block-list with from_fn_with_state
For a block-list that updates without redeploying — fetched from Redis, a database, or a remote API. Wrap in Arc<RwLock<...>> and pass via from_fn_with_state(). The state is cloned cheaply (Arc clone = reference count increment).
// Dynamic block-list via from_fn_with_state — updated at runtime
use axum::{
extract::{Request, State},
middleware::Next,
response::Response,
http::{header, HeaderName, HeaderValue, StatusCode},
response::IntoResponse,
};
use std::collections::HashSet;
use std::sync::{Arc, RwLock};
// App state with dynamic bot block-list
#[derive(Clone)]
pub struct AppState {
pub bot_patterns: Arc<RwLock<HashSet<String>>>,
}
impl AppState {
pub fn new() -> Self {
let mut patterns = HashSet::new();
// Seed with known bots — can be updated from DB/Redis at runtime
for bot in crate::bots::AI_BOTS {
patterns.insert(bot.to_string());
}
AppState {
bot_patterns: Arc::new(RwLock::new(patterns)),
}
}
pub fn is_bot(&self, user_agent: &str) -> bool {
let ua = user_agent.to_lowercase();
let patterns = self.bot_patterns.read().unwrap();
patterns.iter().any(|p| ua.contains(p.as_str()))
}
}
/// Middleware with access to shared state.
/// Use from_fn_with_state(state.clone(), dynamic_bot_blocker).
pub async fn dynamic_bot_blocker(
State(state): State<AppState>,
req: Request,
next: Next,
) -> Response {
let ua = req
.headers()
.get(header::USER_AGENT)
.and_then(|v| v.to_str().ok())
.unwrap_or("");
if state.is_bot(ua) {
return (StatusCode::FORBIDDEN, "Forbidden").into_response();
}
next.run(req).await
}
// Wiring it up:
use axum::middleware::from_fn_with_state;
let state = AppState::new();
let app = Router::new()
.route("/", get(index))
// State must match what the middleware extracts
.layer(from_fn_with_state(state.clone(), dynamic_bot_blocker))
.with_state(state);Step 5 — Stacking middleware with ServiceBuilder
ServiceBuilder makes middleware ordering explicit. Outermost layer runs first on requests. Put bot blocking first — blocked requests skip tracing, compression, and all other layers. No wasted CPU.
// ServiceBuilder — stacking multiple middleware in order
use axum::Router;
use axum::middleware::from_fn;
use tower::ServiceBuilder;
use tower_http::trace::TraceLayer;
use tower_http::compression::CompressionLayer;
use tower_http::set_header::{SetResponseHeaderLayer};
use axum::http::{HeaderName, HeaderValue};
let app = Router::new()
.route("/", get(index))
.layer(
ServiceBuilder::new()
// Order: outermost layer runs first on request, last on response.
// Bot blocking first — blocked requests skip all other middleware.
.layer(from_fn(ai_bot_blocker))
// Tracing second — only traces requests that passed bot check.
.layer(TraceLayer::new_for_http())
// Compression last — only compress legitimate responses.
.layer(CompressionLayer::new())
// Declarative X-Robots-Tag — alternative to setting it in middleware.
// SetResponseHeaderLayer::overriding() replaces existing values.
// SetResponseHeaderLayer::appending() keeps existing values.
.layer(SetResponseHeaderLayer::overriding(
HeaderName::from_static("x-robots-tag"),
HeaderValue::from_static("noai, noimageai"),
)),
);Step 6 — robots.txt
Use ServeDir for static file serving or a dedicated route. The include_str!() option bakes the file into the binary at compile time — zero filesystem reads at runtime, works in serverless and container environments with no mounted volumes.
// Option A: tower_http ServeDir — static file serving
// Cargo.toml: tower-http = { version = "0.5", features = ["fs"] }
// Place robots.txt in ./static/robots.txt
use tower_http::services::ServeDir;
let app = Router::new()
.route("/", get(index))
// Named routes take priority over ServeDir — register them first
.nest_service("/", ServeDir::new("./static"));
// robots.txt is auto-served at GET /robots.txt
// Option B: Route handler — dynamic or compile-time embedded
use axum::{routing::get, response::IntoResponse, http::header};
async fn robots_handler() -> impl IntoResponse {
(
[(header::CONTENT_TYPE, "text/plain; charset=utf-8")],
ROBOTS_TXT,
)
}
const ROBOTS_TXT: &str = "User-agent: *
Allow: /
# AI training bots — blocked
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Meta-ExternalAgent
Disallow: /
User-agent: YouBot
Disallow: /
User-agent: AmazonBot
Disallow: /
User-agent: Diffbot
Disallow: /";
// Option C: compile-time embed
async fn robots_embedded() -> impl IntoResponse {
(
[(header::CONTENT_TYPE, "text/plain; charset=utf-8")],
include_str!("../static/robots.txt"),
)
}
// Register in router:
let app = Router::new()
.route("/robots.txt", get(robots_handler))
.route("/", get(index));Step 7 — noai meta tag in HTML responses
Add <meta name="robots" content="noai, noimageai"> to every HTML page. With Askama or MiniJinja templates, put it in your base layout. For inline responses, use Axum's Html() responder.
// noai meta tag with Askama or MiniJinja templates
// With Askama (compile-time templates) — add to base template:
// <meta name="robots" content="noai, noimageai">
// Or inject directly in a handler:
use axum::response::Html;
async fn index() -> impl IntoResponse {
Html(r#"<!DOCTYPE html>
<html>
<head>
<meta name="robots" content="noai, noimageai">
<title>My Site</title>
</head>
<body>
<h1>Welcome</h1>
</body>
</html>"#)
}
// The X-Robots-Tag header (set in middleware) plus the noai meta tag
// in the HTML body gives belt-and-suspenders AI content protection.
// The meta tag applies to scrapers that render JS; the header applies
// to any HTTP client regardless of JS rendering.Axum vs Actix-web vs Warp vs Rocket
| Feature | Axum | Actix-web | Warp | Rocket |
|---|---|---|---|---|
| Middleware model | Tower Layer/Service — from_fn() wraps async fn as Tower middleware | wrap_fn closure or Transform+Service trait pair | Filter combinators — composable with .and() / .map() / .and_then() | Request Guards (per-route) + Fairings (lifecycle, cannot abort) |
| Can abort request? | Yes — return Response early without calling next.run(req) | Yes — return HttpResponse early without calling next.call(req) | Yes — return Rejection; handle with recover() | Guards: yes. Fairings: no (on_request returns ()). |
| Global vs scoped | Router::layer() = global (incl 404s). route_layer() = matched routes only | App::wrap() = global. Scope::wrap() = per prefix | Compose filters: all share the same filter chain | No native scope concept — add guard per route or use fairing |
| State in middleware | from_fn_with_state(state, fn) — State<T> extractor | web::Data<T> via App::app_data() — accessible in middleware via req.app_data() | Closure capture or warp::any().map(move || state.clone()) | rocket::State<T> managed via manage() — in guards via Request::guard() |
| UA header access | req.headers().get(header::USER_AGENT) | req.headers().get("user-agent") | warp::header::optional("user-agent") | req.headers().get_one("User-Agent") |
| Hard 403 | (StatusCode::FORBIDDEN, "Forbidden").into_response() | HttpResponse::Forbidden().finish() | warp::reject::custom(AiBotRejection) | Outcome::Error((Status::Forbidden, ())) |
| Static files / robots.txt | tower_http::services::ServeDir (nest_service) | actix_files::Files::new("/", "./static") | warp::fs::dir("./static") | FileServer::from("./static") or #[get] route |
| Middleware ordering | ServiceBuilder — outermost runs first on request | App::wrap() calls — last registered runs first (stack order) | Filter composition order — left-to-right evaluation | Attach order for fairings; guard resolution order for guards |
Summary
- Router::layer(from_fn(ai_bot_blocker)) — global, fires on all requests including 404 probes. Use this as your default.
- route_layer() — scoped to matched routes only. Use inside
nest()for per-prefix protection without catching 404s at the top level. - from_fn_with_state() — dynamic block-list via
Arc<RwLock<...>>. Update patterns at runtime without redeploying. - ServiceBuilder — explicit ordering. Bot blocking first = no wasted tracing, compression, or auth work on blocked requests.
- ServeDir — static file serving for robots.txt. Named routes registered before
nest_servicetake priority.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.