Skip to content
Guides/Axum

How to Block AI Bots on Axum (Rust): Complete 2026 Guide

Axum is the Tokio team's Rust web framework — built on Tower, ergonomic extractors, and zero-cost async. Its middleware system uses axum::middleware::from_fn() for simple async closures and Tower's full Layer ecosystem for advanced use cases. The key architectural distinction: Router::layer() fires on all requests including 404s — critical for blocking bots that probe non-existent paths.

layer() vs route_layer() — the blocking distinction

Router::layer() applies middleware to every request — matched routes, 404s, and OPTIONS preflight. Router::route_layer() only fires on routes that match. For bot blocking, use .layer() — AI crawlers frequently probe /wp-admin, /sitemap.xml, and other paths that may not exist. Both short-circuit before the handler runs.

Protection layers

1
robots.txttower_http::services::ServeDir or a dedicated route — always unblocked for legitimate crawlers
2
noai meta tagIn HTML responses — Askama/MiniJinja templates or inline via Html() responder
3
X-Robots-Tag headerSet in middleware after next.run(req) — present on all non-blocked responses. Or use SetResponseHeaderLayer.
4
Hard 403 — Router::layer() (global)from_fn middleware on the router — fires on ALL requests including 404 probes. Zero-cost abort.
5
Hard 403 — route_layer() (scoped)from_fn middleware on a nested scope — only matched routes within the nest are protected.

Dependencies (Cargo.toml)

# Cargo.toml — required dependencies
[dependencies]
axum = "0.7"
tokio = { version = "1", features = ["full"] }
tower = "0.4"
tower-http = { version = "0.5", features = ["fs", "trace", "compression-full", "set-header"] }

Step 1 — Shared bot list (src/bots.rs)

A &[&str] slice — zero runtime cost, embedded in the binary. Case-insensitive substring matching handles all User-Agent capitalisation variants.

// src/bots.rs — shared AI bot list
pub const AI_BOTS: &[&str] = &[
    // OpenAI
    "gptbot", "chatgpt-user", "oai-searchbot",
    // Anthropic
    "claudebot", "claude-web",
    // Common Crawl
    "ccbot",
    // Bytedance
    "bytespider",
    // Meta
    "meta-externalagent",
    // Perplexity
    "perplexitybot",
    // Google AI
    "google-extended", "googleother",
    // Cohere
    "cohere-ai",
    // Amazon
    "amazonbot",
    // Diffbot
    "diffbot",
    // AI2
    "ai2bot",
    // DeepSeek
    "deepseekbot",
    // Mistral
    "mistralai-user",
    // xAI
    "xai-bot",
    // You.com
    "youbot",
    // DuckDuckGo AI
    "duckassistbot",
];

pub fn is_ai_bot(user_agent: &str) -> bool {
    let ua = user_agent.to_lowercase();
    AI_BOTS.iter().any(|bot| ua.contains(bot))
}

Step 2 — Middleware via from_fn (recommended)

An async function that returns Response. For AI bots, return immediately with StatusCode::FORBIDDEN — the handler and all downstream Tower layers never run. For legitimate requests, call next.run(req).await and add the X-Robots-Tag header to the response.

// src/middleware.rs — AI bot blocking middleware
use axum::{
    body::Body,
    extract::Request,
    http::{header, HeaderName, HeaderValue, StatusCode},
    middleware::Next,
    response::{IntoResponse, Response},
};

use crate::bots::is_ai_bot;

/// Global AI bot blocker.
/// Apply with Router::layer(from_fn(ai_bot_blocker)).
pub async fn ai_bot_blocker(req: Request, next: Next) -> Response {
    let ua = req
        .headers()
        .get(header::USER_AGENT)
        .and_then(|v| v.to_str().ok())
        .unwrap_or("");

    if is_ai_bot(ua) {
        // Short-circuit — handler never runs, zero computation wasted
        return (
            StatusCode::FORBIDDEN,
            [(
                HeaderName::from_static("x-robots-tag"),
                HeaderValue::from_static("noai, noimageai"),
            )],
            "Forbidden",
        )
            .into_response();
    }

    // Pass through to the handler
    let mut res = next.run(req).await;

    // Add X-Robots-Tag to all legitimate responses (belt + suspenders)
    res.headers_mut().insert(
        HeaderName::from_static("x-robots-tag"),
        HeaderValue::from_static("noai, noimageai"),
    );

    res
}

Step 3 — Router wiring: layer() vs route_layer()

Use Router::layer() on the outermost router for global coverage (including 404 probing). Use nest().route_layer() for scoped protection of specific route groups. You can combine both.

// src/main.rs — global and scoped blocking with Router::layer vs route_layer
use axum::{middleware::from_fn, Router, routing::get};
use tower_http::services::ServeDir;

mod bots;
mod middleware;
mod handlers;

#[tokio::main]
async fn main() {
    let app = Router::new()
        .route("/", get(handlers::index))
        .route("/health", get(handlers::health))
        // Protected API routes — scoped middleware via route_layer
        // (route_layer only fires on matched routes — not on 404s)
        .nest(
            "/api",
            Router::new()
                .route("/data", get(handlers::api_data))
                .route("/users", get(handlers::users))
                // route_layer: fires on /api/* matches only
                // Use route_layer here because /api/* is already isolated
                .route_layer(from_fn(middleware::ai_bot_blocker)),
        )
        // Static files — robots.txt auto-served at /robots.txt
        .nest_service("/", ServeDir::new("./static"))
        // GLOBAL: Router::layer fires on ALL requests, including unmatched
        // routes that produce 404s — important: bots probe non-existent paths.
        // This is the key difference from route_layer.
        .layer(from_fn(middleware::ai_bot_blocker));

    let listener = tokio::net::TcpListener::bind("0.0.0.0:8080").await.unwrap();
    axum::serve(listener, app).await.unwrap();
}

Step 4 — Dynamic block-list with from_fn_with_state

For a block-list that updates without redeploying — fetched from Redis, a database, or a remote API. Wrap in Arc<RwLock<...>> and pass via from_fn_with_state(). The state is cloned cheaply (Arc clone = reference count increment).

// Dynamic block-list via from_fn_with_state — updated at runtime
use axum::{
    extract::{Request, State},
    middleware::Next,
    response::Response,
    http::{header, HeaderName, HeaderValue, StatusCode},
    response::IntoResponse,
};
use std::collections::HashSet;
use std::sync::{Arc, RwLock};

// App state with dynamic bot block-list
#[derive(Clone)]
pub struct AppState {
    pub bot_patterns: Arc<RwLock<HashSet<String>>>,
}

impl AppState {
    pub fn new() -> Self {
        let mut patterns = HashSet::new();
        // Seed with known bots — can be updated from DB/Redis at runtime
        for bot in crate::bots::AI_BOTS {
            patterns.insert(bot.to_string());
        }
        AppState {
            bot_patterns: Arc::new(RwLock::new(patterns)),
        }
    }

    pub fn is_bot(&self, user_agent: &str) -> bool {
        let ua = user_agent.to_lowercase();
        let patterns = self.bot_patterns.read().unwrap();
        patterns.iter().any(|p| ua.contains(p.as_str()))
    }
}

/// Middleware with access to shared state.
/// Use from_fn_with_state(state.clone(), dynamic_bot_blocker).
pub async fn dynamic_bot_blocker(
    State(state): State<AppState>,
    req: Request,
    next: Next,
) -> Response {
    let ua = req
        .headers()
        .get(header::USER_AGENT)
        .and_then(|v| v.to_str().ok())
        .unwrap_or("");

    if state.is_bot(ua) {
        return (StatusCode::FORBIDDEN, "Forbidden").into_response();
    }

    next.run(req).await
}

// Wiring it up:
use axum::middleware::from_fn_with_state;

let state = AppState::new();

let app = Router::new()
    .route("/", get(index))
    // State must match what the middleware extracts
    .layer(from_fn_with_state(state.clone(), dynamic_bot_blocker))
    .with_state(state);

Step 5 — Stacking middleware with ServiceBuilder

ServiceBuilder makes middleware ordering explicit. Outermost layer runs first on requests. Put bot blocking first — blocked requests skip tracing, compression, and all other layers. No wasted CPU.

// ServiceBuilder — stacking multiple middleware in order
use axum::Router;
use axum::middleware::from_fn;
use tower::ServiceBuilder;
use tower_http::trace::TraceLayer;
use tower_http::compression::CompressionLayer;
use tower_http::set_header::{SetResponseHeaderLayer};
use axum::http::{HeaderName, HeaderValue};

let app = Router::new()
    .route("/", get(index))
    .layer(
        ServiceBuilder::new()
            // Order: outermost layer runs first on request, last on response.
            // Bot blocking first — blocked requests skip all other middleware.
            .layer(from_fn(ai_bot_blocker))
            // Tracing second — only traces requests that passed bot check.
            .layer(TraceLayer::new_for_http())
            // Compression last — only compress legitimate responses.
            .layer(CompressionLayer::new())
            // Declarative X-Robots-Tag — alternative to setting it in middleware.
            // SetResponseHeaderLayer::overriding() replaces existing values.
            // SetResponseHeaderLayer::appending() keeps existing values.
            .layer(SetResponseHeaderLayer::overriding(
                HeaderName::from_static("x-robots-tag"),
                HeaderValue::from_static("noai, noimageai"),
            )),
    );

Step 6 — robots.txt

Use ServeDir for static file serving or a dedicated route. The include_str!() option bakes the file into the binary at compile time — zero filesystem reads at runtime, works in serverless and container environments with no mounted volumes.

// Option A: tower_http ServeDir — static file serving
// Cargo.toml: tower-http = { version = "0.5", features = ["fs"] }
// Place robots.txt in ./static/robots.txt

use tower_http::services::ServeDir;

let app = Router::new()
    .route("/", get(index))
    // Named routes take priority over ServeDir — register them first
    .nest_service("/", ServeDir::new("./static"));
    // robots.txt is auto-served at GET /robots.txt


// Option B: Route handler — dynamic or compile-time embedded
use axum::{routing::get, response::IntoResponse, http::header};

async fn robots_handler() -> impl IntoResponse {
    (
        [(header::CONTENT_TYPE, "text/plain; charset=utf-8")],
        ROBOTS_TXT,
    )
}

const ROBOTS_TXT: &str = "User-agent: *
Allow: /

# AI training bots — blocked
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: AmazonBot
Disallow: /

User-agent: Diffbot
Disallow: /";

// Option C: compile-time embed
async fn robots_embedded() -> impl IntoResponse {
    (
        [(header::CONTENT_TYPE, "text/plain; charset=utf-8")],
        include_str!("../static/robots.txt"),
    )
}

// Register in router:
let app = Router::new()
    .route("/robots.txt", get(robots_handler))
    .route("/", get(index));

Step 7 — noai meta tag in HTML responses

Add <meta name="robots" content="noai, noimageai"> to every HTML page. With Askama or MiniJinja templates, put it in your base layout. For inline responses, use Axum's Html() responder.

// noai meta tag with Askama or MiniJinja templates

// With Askama (compile-time templates) — add to base template:
// <meta name="robots" content="noai, noimageai">

// Or inject directly in a handler:
use axum::response::Html;

async fn index() -> impl IntoResponse {
    Html(r#"<!DOCTYPE html>
<html>
<head>
  <meta name="robots" content="noai, noimageai">
  <title>My Site</title>
</head>
<body>
  <h1>Welcome</h1>
</body>
</html>"#)
}

// The X-Robots-Tag header (set in middleware) plus the noai meta tag
// in the HTML body gives belt-and-suspenders AI content protection.
// The meta tag applies to scrapers that render JS; the header applies
// to any HTTP client regardless of JS rendering.

Axum vs Actix-web vs Warp vs Rocket

FeatureAxumActix-webWarpRocket
Middleware modelTower Layer/Service — from_fn() wraps async fn as Tower middlewarewrap_fn closure or Transform+Service trait pairFilter combinators — composable with .and() / .map() / .and_then()Request Guards (per-route) + Fairings (lifecycle, cannot abort)
Can abort request?Yes — return Response early without calling next.run(req)Yes — return HttpResponse early without calling next.call(req)Yes — return Rejection; handle with recover()Guards: yes. Fairings: no (on_request returns ()).
Global vs scopedRouter::layer() = global (incl 404s). route_layer() = matched routes onlyApp::wrap() = global. Scope::wrap() = per prefixCompose filters: all share the same filter chainNo native scope concept — add guard per route or use fairing
State in middlewarefrom_fn_with_state(state, fn) — State<T> extractorweb::Data<T> via App::app_data() — accessible in middleware via req.app_data()Closure capture or warp::any().map(move || state.clone())rocket::State<T> managed via manage() — in guards via Request::guard()
UA header accessreq.headers().get(header::USER_AGENT)req.headers().get("user-agent")warp::header::optional("user-agent")req.headers().get_one("User-Agent")
Hard 403(StatusCode::FORBIDDEN, "Forbidden").into_response()HttpResponse::Forbidden().finish()warp::reject::custom(AiBotRejection)Outcome::Error((Status::Forbidden, ()))
Static files / robots.txttower_http::services::ServeDir (nest_service)actix_files::Files::new("/", "./static")warp::fs::dir("./static")FileServer::from("./static") or #[get] route
Middleware orderingServiceBuilder — outermost runs first on requestApp::wrap() calls — last registered runs first (stack order)Filter composition order — left-to-right evaluationAttach order for fairings; guard resolution order for guards

Summary

  • Router::layer(from_fn(ai_bot_blocker)) — global, fires on all requests including 404 probes. Use this as your default.
  • route_layer() — scoped to matched routes only. Use inside nest() for per-prefix protection without catching 404s at the top level.
  • from_fn_with_state() — dynamic block-list via Arc<RwLock<...>>. Update patterns at runtime without redeploying.
  • ServiceBuilder — explicit ordering. Bot blocking first = no wasted tracing, compression, or auth work on blocked requests.
  • ServeDir — static file serving for robots.txt. Named routes registered before nest_service take priority.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.