Skip to content
openshadow.io/guides/blocking-ai-bots-rust

How to Block AI Bots on Rust (Actix-web & Axum): Complete 2026 Guide

Use include_str!() to embed robots.txt into your binary at compile time — no static directory at runtime, clean for Docker. Store the compiled regex in a LazyLock (stable since Rust 1.80) so it's compiled once and reused on every request. For Actix-web, use App::wrap_fn(); for Axum, use axum::middleware::from_fn().

8 min read·Updated April 2026·Rust 1.80+ · Actix-web 4 · Axum 0.7

Methods overview

Method
robots.txt via include_str! (compile-time embed)

Always — self-contained binary, ideal for Docker

Actix-web wrap_fn middleware

Actix-web servers

Axum from_fn middleware (Tower)

Axum servers

X-Robots-Tag response header

Complement to robots.txt

nginx reverse proxy block

nginx in front of Rust server

1. robots.txt via include_str!

include_str!() reads a file at compile time and embeds its content as a &'static str in the binary. The path is relative to the current source file. No file I/O at runtime — your binary is self-contained.

The robots.txt file

static/robots.txtembedded into binary at compile time
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: DeepSeekBot
Disallow: /

User-agent: MistralBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: AI2Bot
Disallow: /

User-agent: Ai2Bot-Dolma
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: DuckAssistBot
Disallow: /

User-agent: omgili
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: webzio-extended
Disallow: /

User-agent: gemini-deep-research
Disallow: /

User-agent: *
Allow: /

Embed and serve in Actix-web

src/main.rs (Actix-web)
use actix_web::{get, web, App, HttpResponse, HttpServer, Responder};

// Embedded at compile time — &'static str, no runtime file I/O
const ROBOTS_TXT: &str = include_str!("../static/robots.txt");

#[get("/robots.txt")]
async fn robots_txt() -> impl Responder {
    HttpResponse::Ok()
        .content_type("text/plain; charset=utf-8")
        .body(ROBOTS_TXT)
}

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    HttpServer::new(|| {
        App::new()
            .service(robots_txt)
            // ... other routes
    })
    .bind("0.0.0.0:8080")?
    .run()
    .await
}

Embed and serve in Axum

src/main.rs (Axum)
use axum::{routing::get, Router};
use axum::response::{IntoResponse, Response};
use axum::http::header;

const ROBOTS_TXT: &str = include_str!("../static/robots.txt");

async fn robots_txt() -> Response {
    (
        [(header::CONTENT_TYPE, "text/plain; charset=utf-8")],
        ROBOTS_TXT,
    ).into_response()
}

#[tokio::main]
async fn main() {
    let app = Router::new()
        .route("/robots.txt", get(robots_txt))
        // ... other routes
        ;

    let listener = tokio::net::TcpListener::bind("0.0.0.0:8080").await.unwrap();
    axum::serve(listener, app).await.unwrap();
}

2. Actix-web wrap_fn middleware

App::wrap_fn() accepts a closure that intercepts every request. The closure receives the request and a ServiceRequest, checks the user agent, and either returns early with 403 or calls srv.call(req).await to continue.

src/main.rs (Actix-web)
use actix_web::{
    dev::{forward_ready, Service, ServiceRequest, ServiceResponse, Transform},
    get, web, App, Error, HttpResponse, HttpServer, Responder,
};
use futures_util::future::LocalBoxFuture;
use regex::Regex;
use std::sync::LazyLock;

// Compiled once at startup — LazyLock is stable since Rust 1.80
static BLOCKED_UAS: LazyLock<Regex> = LazyLock::new(|| {
    Regex::new(
        r"(?i)GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|webzio-extended|gemini-deep-research"
    )
    .expect("invalid regex pattern")
});

const ROBOTS_TXT: &str = include_str!("../static/robots.txt");

#[get("/robots.txt")]
async fn robots_txt() -> impl Responder {
    HttpResponse::Ok()
        .content_type("text/plain; charset=utf-8")
        .body(ROBOTS_TXT)
}

#[get("/")]
async fn index() -> impl Responder {
    HttpResponse::Ok().body("Hello, World!")
}

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    HttpServer::new(|| {
        App::new()
            // robots.txt — registered before the middleware so it's always served
            .service(robots_txt)
            // Block AI bots on all other routes
            .wrap_fn(|req, srv| {
                // Allow robots.txt through
                if req.path() == "/robots.txt" {
                    return Box::pin(srv.call(req));
                }

                let ua = req
                    .headers()
                    .get("user-agent")
                    .and_then(|v| v.to_str().ok())
                    .unwrap_or("");

                if BLOCKED_UAS.is_match(ua) {
                    let response = req.error_response(
                        HttpResponse::Forbidden().body("Forbidden")
                    );
                    return Box::pin(async move { Ok(response) });
                }

                Box::pin(srv.call(req))
            })
            .service(index)
    })
    .bind("0.0.0.0:8080")?
    .run()
    .await
}
Cargo.toml dependencies
[dependencies]
actix-web = "4"
regex = "1"
futures-util = "0.3"

LazyLock vs lazy_static: std::sync::LazyLock is stable since Rust 1.80 and requires no external dependency. If you're on an older toolchain, use the lazy_static or once_cell crate instead. Either way, compile the Regex exactly once — never inside the request handler.

3. Axum from_fn middleware (Tower)

Axum uses Tower's middleware system. axum::middleware::from_fn() wraps an async function as a middleware layer. Apply it to the whole router with Router::layer(), or to specific routes with route_layer().

src/main.rs (Axum)
use axum::{
    body::Body,
    extract::Request,
    http::{header, StatusCode},
    middleware::{self, Next},
    response::{IntoResponse, Response},
    routing::get,
    Router,
};
use regex::Regex;
use std::sync::LazyLock;

static BLOCKED_UAS: LazyLock<Regex> = LazyLock::new(|| {
    Regex::new(
        r"(?i)GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|webzio-extended|gemini-deep-research"
    )
    .expect("invalid regex pattern")
});

const ROBOTS_TXT: &str = include_str!("../static/robots.txt");

async fn block_ai_bots(request: Request, next: Next) -> Response {
    // Always allow robots.txt through
    if request.uri().path() == "/robots.txt" {
        return next.run(request).await;
    }

    let ua = request
        .headers()
        .get(header::USER_AGENT)
        .and_then(|v| v.to_str().ok())
        .unwrap_or("");

    if BLOCKED_UAS.is_match(ua) {
        return (StatusCode::FORBIDDEN, "Forbidden").into_response();
    }

    next.run(request).await
}

async fn robots_txt_handler() -> Response {
    (
        [(header::CONTENT_TYPE, "text/plain; charset=utf-8")],
        ROBOTS_TXT,
    )
        .into_response()
}

async fn index() -> &'static str {
    "Hello, World!"
}

#[tokio::main]
async fn main() {
    let app = Router::new()
        .route("/robots.txt", get(robots_txt_handler))
        .route("/", get(index))
        // Apply middleware to all routes
        .layer(middleware::from_fn(block_ai_bots));

    let listener = tokio::net::TcpListener::bind("0.0.0.0:8080").await.unwrap();
    axum::serve(listener, app).await.unwrap();
}
Cargo.toml dependencies
[dependencies]
axum = "0.7"
tokio = { version = "1", features = ["full"] }
regex = "1"

Router::layer vs route_layer: Router::layer() applies middleware to all routes including 404 responses. Router::route_layer() applies only to matched routes — 404 responses bypass it. For bot blocking, layer() is correct: you want to block even requests to non-existent paths.

4. X-Robots-Tag response header

Add X-Robots-Tag: noai, noimageai to all responses via middleware.

Actix-web

src/main.rs (Actix-web)
// Chain with the bot-blocking wrap_fn — or add as a second wrap_fn
.wrap_fn(|req, srv| {
    let fut = srv.call(req);
    async move {
        let mut res = fut.await?;
        res.headers_mut().insert(
            actix_web::http::header::HeaderName::from_static("x-robots-tag"),
            actix_web::http::header::HeaderValue::from_static("noai, noimageai"),
        );
        Ok(res)
    }
})

Axum — via middleware or tower-http

src/main.rs (Axum — inline middleware)
use axum::http::HeaderValue;

async fn add_x_robots_tag(request: Request, next: Next) -> Response {
    let mut response = next.run(request).await;
    response.headers_mut().insert(
        "x-robots-tag",
        HeaderValue::from_static("noai, noimageai"),
    );
    response
}

// Add to your router:
.layer(middleware::from_fn(add_x_robots_tag))

tower-http alternative: The tower-http crate provides a SetResponseHeader layer that adds headers without writing middleware manually: .layer(SetResponseHeaderLayer::overriding(header::X_ROBOTS_TAG, ...)). Add tower-http = { version = "0.5", features = ["set-response-headers"] } to Cargo.toml.

5. nginx reverse proxy block

In production, nginx typically handles TLS and sits in front of your Rust server. Block at the nginx layer — the request never reaches your Rust process.

/etc/nginx/sites-available/yourapp
map $http_user_agent $block_ai_bot {
    default                 0;
    ~*GPTBot                1;
    ~*ChatGPT-User          1;
    ~*OAI-SearchBot         1;
    ~*ClaudeBot             1;
    ~*anthropic-ai          1;
    ~*Google-Extended       1;
    ~*Bytespider            1;
    ~*CCBot                 1;
    ~*PerplexityBot         1;
    ~*meta-externalagent    1;
    ~*Amazonbot             1;
    ~*Applebot-Extended     1;
    ~*xAI-Bot               1;
    ~*DeepSeekBot           1;
    ~*MistralBot            1;
    ~*Diffbot               1;
    ~*cohere-ai             1;
    ~*AI2Bot                1;
    ~*YouBot                1;
    ~*DuckAssistBot         1;
    ~*omgili                1;
    ~*webzio-extended       1;
    ~*gemini-deep-research  1;
}

server {
    listen 443 ssl;
    server_name yourapp.com;

    ssl_certificate     /etc/letsencrypt/live/yourapp.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourapp.com/privkey.pem;

    # Always pass robots.txt to the Rust server
    location = /robots.txt {
        proxy_pass http://127.0.0.1:8080;
    }

    location / {
        if ($block_ai_bot) {
            return 403 "Forbidden";
        }

        proxy_pass         http://127.0.0.1:8080;
        proxy_http_version 1.1;
        proxy_set_header   Host              $host;
        proxy_set_header   X-Real-IP         $remote_addr;
        proxy_set_header   X-Forwarded-For   $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Proto $scheme;
    }
}

Actix-web vs Axum: middleware comparison

Aspect
Simple middleware
Middleware ecosystem
All routes covered
robots.txt exemption
Regex storage
robots.txt serving

Frequently asked questions

How do I embed robots.txt into my Rust binary?

Use the include_str!("../static/robots.txt") macro — the path is relative to your source file (src/main.rs). The file content is embedded as a &'static str at compile time. No file I/O at runtime, no need to ship a static/ directory alongside your binary. This is the Rust equivalent of Go's go:embed.

Where should I compile the regex for bot matching in Rust?

In a static LazyLock<Regex> at module level (stable since Rust 1.80). Never call Regex::new() inside a request handler — it re-compiles on every request. LazyLock initializes on first access and is safe for concurrent use without any locking overhead on subsequent calls.

What is the difference between Router::layer and Router::route_layer in Axum?

Router::layer() applies middleware to all routes including 404 responses. Router::route_layer() applies only to matched routes — 404s bypass it. For AI bot blocking, use layer(): you want to block even requests to paths that don't exist in your router.

Does Actix-web wrap_fn apply to all routes?

Yes — wrap_fn applied to App wraps the entire application, including routes registered before and after the wrap_fn call. The middleware runs before any service handler. You can also apply wrap_fn to a Scope for per-scope middleware.

Can I use tower-http for the X-Robots-Tag header in Axum?

Yes. Add tower-http with the set-response-headers feature to Cargo.toml and use SetResponseHeaderLayer to add X-Robots-Tag: noai, noimageai to all responses. This is cleaner than writing a from_fn middleware for headers alone. The layer composes with your bot-blocking middleware via Router::layer().