Skip to content

How to Block AI Bots in Rust Salvo

Salvo is a Rust web framework built on hyper and Tokio with a macro-driven handler model. Middleware uses the #[handler] macro and a FlowCtrl argument to control the handler chain. req.headers().get() returns Option<&HeaderValue> — chain .and_then(|v| v.to_str().ok()).unwrap_or("") for a safe &str. To block: set the response then call ctrl.skip_rest() — without skip_rest(), subsequent handlers still run and can overwrite the response. Middleware is registered with .hoop().

1. Bot detection

Pure Rust, no dependencies. str::contains() for literal substring matching. Iterator::any() short-circuits on first match.

// bot_utils.rs — AI bot detection, no external dependencies

pub const AI_BOT_PATTERNS: &[&str] = &[
    "gptbot",
    "chatgpt-user",
    "claudebot",
    "anthropic-ai",
    "ccbot",
    "google-extended",
    "cohere-ai",
    "meta-externalagent",
    "bytespider",
    "omgili",
    "diffbot",
    "imagesiftbot",
    "magpie-crawler",
    "amazonbot",
    "dataprovider",
    "netcraft",
];

/// Returns true if ua matches a known AI crawler pattern.
/// str::contains() — literal substring match, no regex.
/// to_lowercase() allocates; for hot paths consider a case-fold comparison.
pub fn is_ai_bot(ua: &str) -> bool {
    if ua.is_empty() {
        return false;
    }
    let lower = ua.to_lowercase();
    AI_BOT_PATTERNS.iter().any(|&pat| lower.contains(pat))
}

2. #[handler] middleware with FlowCtrl

The #[handler] macro generates the boilerplate needed to use an async function as a Salvo handler or middleware. ctrl.skip_rest() is the key — it halts all remaining handlers in the chain. Without it, execution falls through to route handlers regardless of the response you set.

// middleware.rs — Salvo bot-blocking handler
use salvo::http::StatusCode;
use salvo::prelude::*;

use crate::bot_utils::is_ai_bot;

/// #[handler] turns this async function into a Salvo Handler.
/// Middleware handlers receive (&mut Request, &mut Response, &mut FlowCtrl).
/// Set the response and call ctrl.skip_rest() to block.
/// Do nothing to ctrl to let the chain continue (pass through).
#[handler]
pub async fn bot_blocker(req: &mut Request, res: &mut Response, ctrl: &mut FlowCtrl) {
    // Path guard: robots.txt must be reachable so bots can read Disallow rules.
    if req.uri().path() == "/robots.txt" {
        // No ctrl.skip_rest() — chain continues to the robots.txt handler.
        return;
    }

    // req.headers().get() returns Option<&HeaderValue>.
    // .and_then(|v| v.to_str().ok()) — to_str() fails if the value contains non-ASCII.
    // .unwrap_or("") — safe empty-string fallback when the header is absent.
    // Header name lookup is case-insensitive (hyper normalises to lowercase).
    let ua = req
        .headers()
        .get("user-agent")
        .and_then(|v| v.to_str().ok())
        .unwrap_or("");

    if is_ai_bot(ua) {
        // Block: set status, inject headers, write body.
        // ctrl.skip_rest() MUST be called to stop subsequent handlers from running.
        // Without it, the next handler in the chain could overwrite this response.
        res.status_code(StatusCode::FORBIDDEN);
        res.headers_mut().insert(
            "x-robots-tag",
            "noai, noimageai".parse().unwrap(),
        );
        res.headers_mut().insert(
            "content-type",
            "text/plain; charset=utf-8".parse().unwrap(),
        );
        res.render(Text::Plain("Forbidden"));
        ctrl.skip_rest(); // halt the handler chain — return immediately after
        return;
    }

    // Pass: inject X-Robots-Tag and let the chain continue.
    // Do NOT call ctrl.skip_rest() here — the route handler must run.
    res.headers_mut().insert(
        "x-robots-tag",
        "noai, noimageai".parse().unwrap(),
    );
    // No ctrl.skip_rest() — execution continues to the next handler.
}

3. main.rs — global .hoop() registration

Router::new().hoop(bot_blocker) registers the middleware on the root router — it runs for every request before any route handler. Multiple .hoop() calls chain in order.

// main.rs — Salvo app with global bot-blocking middleware
use salvo::prelude::*;

mod bot_utils;
mod middleware;

use middleware::bot_blocker;

#[handler]
async fn robots_txt(res: &mut Response) {
    res.headers_mut().insert(
        "content-type",
        "text/plain; charset=utf-8".parse().unwrap(),
    );
    res.render(Text::Plain(
        "User-agent: *\nAllow: /\n\n\
         User-agent: GPTBot\nDisallow: /\n\n\
         User-agent: ClaudeBot\nDisallow: /\n\n\
         User-agent: CCBot\nDisallow: /\n\n\
         User-agent: Google-Extended\nDisallow: /\n",
    ));
}

#[handler]
async fn index(res: &mut Response) {
    res.render(Json(serde_json::json!({ "message": "Hello" })));
}

#[handler]
async fn api_data(res: &mut Response) {
    res.render(Json(serde_json::json!({ "data": "value" })));
}

#[tokio::main]
async fn main() {
    // .hoop() registers middleware on the router.
    // Middleware runs before route handlers in registration order.
    // bot_blocker is registered globally — applies to ALL routes on this router.
    let router = Router::new()
        .hoop(bot_blocker)          // global — runs for every request
        .get("/robots.txt", robots_txt)
        .get("/", index)
        .get("/api/data", api_data);

    let acceptor = TcpListener::new("0.0.0.0:8080").bind().await;
    Server::new(acceptor).serve(router).await;
}

4. Scoped middleware — nested Router

Router::with_path("/api").hoop(bot_blocker) scopes the middleware to /api/**. Push it onto the root router with .push(). Routes on the root are unaffected.

// Scoped middleware — protect /api routes using a nested Router.
// Routes on the root router are NOT affected by api-scoped middleware.

#[tokio::main]
async fn main() {
    // Root router — no bot blocking
    let root = Router::new()
        .get("/robots.txt", robots_txt)
        .get("/", index);

    // Nested /api router — bot blocker scoped to /api/**
    // .with_path() sets the path prefix for all routes pushed onto this router.
    let api = Router::with_path("/api")
        .hoop(bot_blocker)           // only /api/** routes are protected
        .get("/data", api_data)
        .get("/status", api_status);

    // .push() mounts the nested router onto the root router.
    let router = root.push(api);

    let acceptor = TcpListener::new("0.0.0.0:8080").bind().await;
    Server::new(acceptor).serve(router).await;
}

5. Cargo.toml

# Cargo.toml
[package]
name = "bot-blocker"
version = "0.1.0"
edition = "2021"

[dependencies]
salvo = { version = "0.70", features = ["full"] }
tokio = { version = "1", features = ["full"] }
serde_json = "1"

# Run: cargo run
# Build release: cargo build --release

Key points

Framework comparison — Rust web frameworks

FrameworkMiddleware styleBlockPass
Salvo#[handler] + FlowCtrlset response then ctrl.skip_rest()return without skip_rest()
Axumtower Service or from_fnreturn Response::builder().status(403)...next.run(req).await
Actix-webTransform + Service traitok(HttpResponse::Forbidden().finish())self.service.call(req).await
WarpFilter compositionErr(warp::reject::custom(Blocked))Ok(req) (filter passes value through)

Salvo's FlowCtrl model is the most distinctive — blocking requires an explicit skip_rest() call rather than a return value or an error type. Axum and Actix-web both use return-value blocking (return a Response directly), while Warp uses filter rejection. Salvo's #[handler] macro avoids the boilerplate of Axum's tower Service trait or Actix-web's Transform + Service pair.