Use include_str!() to embed robots.txt into your binary at compile time — no static directory at runtime, clean for Docker. Store the compiled regex in a LazyLock (stable since Rust 1.80) so it's compiled once and reused on every request. For Actix-web, use App::wrap_fn(); for Axum, use axum::middleware::from_fn().
| Method |
|---|
| robots.txt via include_str! (compile-time embed) Always — self-contained binary, ideal for Docker |
| Actix-web wrap_fn middleware Actix-web servers |
| Axum from_fn middleware (Tower) Axum servers |
| X-Robots-Tag response header Complement to robots.txt |
| nginx reverse proxy block nginx in front of Rust server |
include_str!() reads a file at compile time and embeds its content as a &'static str in the binary. The path is relative to the current source file. No file I/O at runtime — your binary is self-contained.
User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Google-Extended Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: meta-externalagent Disallow: / User-agent: Amazonbot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: xAI-Bot Disallow: / User-agent: DeepSeekBot Disallow: / User-agent: MistralBot Disallow: / User-agent: Diffbot Disallow: / User-agent: cohere-ai Disallow: / User-agent: AI2Bot Disallow: / User-agent: Ai2Bot-Dolma Disallow: / User-agent: YouBot Disallow: / User-agent: DuckAssistBot Disallow: / User-agent: omgili Disallow: / User-agent: omgilibot Disallow: / User-agent: webzio-extended Disallow: / User-agent: gemini-deep-research Disallow: / User-agent: * Allow: /
use actix_web::{get, web, App, HttpResponse, HttpServer, Responder};
// Embedded at compile time — &'static str, no runtime file I/O
const ROBOTS_TXT: &str = include_str!("../static/robots.txt");
#[get("/robots.txt")]
async fn robots_txt() -> impl Responder {
HttpResponse::Ok()
.content_type("text/plain; charset=utf-8")
.body(ROBOTS_TXT)
}
#[actix_web::main]
async fn main() -> std::io::Result<()> {
HttpServer::new(|| {
App::new()
.service(robots_txt)
// ... other routes
})
.bind("0.0.0.0:8080")?
.run()
.await
}use axum::{routing::get, Router};
use axum::response::{IntoResponse, Response};
use axum::http::header;
const ROBOTS_TXT: &str = include_str!("../static/robots.txt");
async fn robots_txt() -> Response {
(
[(header::CONTENT_TYPE, "text/plain; charset=utf-8")],
ROBOTS_TXT,
).into_response()
}
#[tokio::main]
async fn main() {
let app = Router::new()
.route("/robots.txt", get(robots_txt))
// ... other routes
;
let listener = tokio::net::TcpListener::bind("0.0.0.0:8080").await.unwrap();
axum::serve(listener, app).await.unwrap();
}App::wrap_fn() accepts a closure that intercepts every request. The closure receives the request and a ServiceRequest, checks the user agent, and either returns early with 403 or calls srv.call(req).await to continue.
use actix_web::{
dev::{forward_ready, Service, ServiceRequest, ServiceResponse, Transform},
get, web, App, Error, HttpResponse, HttpServer, Responder,
};
use futures_util::future::LocalBoxFuture;
use regex::Regex;
use std::sync::LazyLock;
// Compiled once at startup — LazyLock is stable since Rust 1.80
static BLOCKED_UAS: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(
r"(?i)GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|webzio-extended|gemini-deep-research"
)
.expect("invalid regex pattern")
});
const ROBOTS_TXT: &str = include_str!("../static/robots.txt");
#[get("/robots.txt")]
async fn robots_txt() -> impl Responder {
HttpResponse::Ok()
.content_type("text/plain; charset=utf-8")
.body(ROBOTS_TXT)
}
#[get("/")]
async fn index() -> impl Responder {
HttpResponse::Ok().body("Hello, World!")
}
#[actix_web::main]
async fn main() -> std::io::Result<()> {
HttpServer::new(|| {
App::new()
// robots.txt — registered before the middleware so it's always served
.service(robots_txt)
// Block AI bots on all other routes
.wrap_fn(|req, srv| {
// Allow robots.txt through
if req.path() == "/robots.txt" {
return Box::pin(srv.call(req));
}
let ua = req
.headers()
.get("user-agent")
.and_then(|v| v.to_str().ok())
.unwrap_or("");
if BLOCKED_UAS.is_match(ua) {
let response = req.error_response(
HttpResponse::Forbidden().body("Forbidden")
);
return Box::pin(async move { Ok(response) });
}
Box::pin(srv.call(req))
})
.service(index)
})
.bind("0.0.0.0:8080")?
.run()
.await
}[dependencies] actix-web = "4" regex = "1" futures-util = "0.3"
LazyLock vs lazy_static: std::sync::LazyLock is stable since Rust 1.80 and requires no external dependency. If you're on an older toolchain, use the lazy_static or once_cell crate instead. Either way, compile the Regex exactly once — never inside the request handler.
Axum uses Tower's middleware system. axum::middleware::from_fn() wraps an async function as a middleware layer. Apply it to the whole router with Router::layer(), or to specific routes with route_layer().
use axum::{
body::Body,
extract::Request,
http::{header, StatusCode},
middleware::{self, Next},
response::{IntoResponse, Response},
routing::get,
Router,
};
use regex::Regex;
use std::sync::LazyLock;
static BLOCKED_UAS: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(
r"(?i)GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|webzio-extended|gemini-deep-research"
)
.expect("invalid regex pattern")
});
const ROBOTS_TXT: &str = include_str!("../static/robots.txt");
async fn block_ai_bots(request: Request, next: Next) -> Response {
// Always allow robots.txt through
if request.uri().path() == "/robots.txt" {
return next.run(request).await;
}
let ua = request
.headers()
.get(header::USER_AGENT)
.and_then(|v| v.to_str().ok())
.unwrap_or("");
if BLOCKED_UAS.is_match(ua) {
return (StatusCode::FORBIDDEN, "Forbidden").into_response();
}
next.run(request).await
}
async fn robots_txt_handler() -> Response {
(
[(header::CONTENT_TYPE, "text/plain; charset=utf-8")],
ROBOTS_TXT,
)
.into_response()
}
async fn index() -> &'static str {
"Hello, World!"
}
#[tokio::main]
async fn main() {
let app = Router::new()
.route("/robots.txt", get(robots_txt_handler))
.route("/", get(index))
// Apply middleware to all routes
.layer(middleware::from_fn(block_ai_bots));
let listener = tokio::net::TcpListener::bind("0.0.0.0:8080").await.unwrap();
axum::serve(listener, app).await.unwrap();
}[dependencies]
axum = "0.7"
tokio = { version = "1", features = ["full"] }
regex = "1"Router::layer vs route_layer: Router::layer() applies middleware to all routes including 404 responses. Router::route_layer() applies only to matched routes — 404 responses bypass it. For bot blocking, layer() is correct: you want to block even requests to non-existent paths.
Add X-Robots-Tag: noai, noimageai to all responses via middleware.
// Chain with the bot-blocking wrap_fn — or add as a second wrap_fn
.wrap_fn(|req, srv| {
let fut = srv.call(req);
async move {
let mut res = fut.await?;
res.headers_mut().insert(
actix_web::http::header::HeaderName::from_static("x-robots-tag"),
actix_web::http::header::HeaderValue::from_static("noai, noimageai"),
);
Ok(res)
}
})use axum::http::HeaderValue;
async fn add_x_robots_tag(request: Request, next: Next) -> Response {
let mut response = next.run(request).await;
response.headers_mut().insert(
"x-robots-tag",
HeaderValue::from_static("noai, noimageai"),
);
response
}
// Add to your router:
.layer(middleware::from_fn(add_x_robots_tag))tower-http alternative: The tower-http crate provides a SetResponseHeader layer that adds headers without writing middleware manually: .layer(SetResponseHeaderLayer::overriding(header::X_ROBOTS_TAG, ...)). Add tower-http = { version = "0.5", features = ["set-response-headers"] } to Cargo.toml.
In production, nginx typically handles TLS and sits in front of your Rust server. Block at the nginx layer — the request never reaches your Rust process.
map $http_user_agent $block_ai_bot {
default 0;
~*GPTBot 1;
~*ChatGPT-User 1;
~*OAI-SearchBot 1;
~*ClaudeBot 1;
~*anthropic-ai 1;
~*Google-Extended 1;
~*Bytespider 1;
~*CCBot 1;
~*PerplexityBot 1;
~*meta-externalagent 1;
~*Amazonbot 1;
~*Applebot-Extended 1;
~*xAI-Bot 1;
~*DeepSeekBot 1;
~*MistralBot 1;
~*Diffbot 1;
~*cohere-ai 1;
~*AI2Bot 1;
~*YouBot 1;
~*DuckAssistBot 1;
~*omgili 1;
~*webzio-extended 1;
~*gemini-deep-research 1;
}
server {
listen 443 ssl;
server_name yourapp.com;
ssl_certificate /etc/letsencrypt/live/yourapp.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/yourapp.com/privkey.pem;
# Always pass robots.txt to the Rust server
location = /robots.txt {
proxy_pass http://127.0.0.1:8080;
}
location / {
if ($block_ai_bot) {
return 403 "Forbidden";
}
proxy_pass http://127.0.0.1:8080;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}| Aspect |
|---|
| Simple middleware |
| Middleware ecosystem |
| All routes covered |
| robots.txt exemption |
| Regex storage |
| robots.txt serving |
Use the include_str!("../static/robots.txt") macro — the path is relative to your source file (src/main.rs). The file content is embedded as a &'static str at compile time. No file I/O at runtime, no need to ship a static/ directory alongside your binary. This is the Rust equivalent of Go's go:embed.
In a static LazyLock<Regex> at module level (stable since Rust 1.80). Never call Regex::new() inside a request handler — it re-compiles on every request. LazyLock initializes on first access and is safe for concurrent use without any locking overhead on subsequent calls.
Router::layer() applies middleware to all routes including 404 responses. Router::route_layer() applies only to matched routes — 404s bypass it. For AI bot blocking, use layer(): you want to block even requests to paths that don't exist in your router.
Yes — wrap_fn applied to App wraps the entire application, including routes registered before and after the wrap_fn call. The middleware runs before any service handler. You can also apply wrap_fn to a Scope for per-scope middleware.
Yes. Add tower-http with the set-response-headers feature to Cargo.toml and use SetResponseHeaderLayer to add X-Robots-Tag: noai, noimageai to all responses. This is cleaner than writing a from_fn middleware for headers alone. The layer composes with your bot-blocking middleware via Router::layer().
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.