How to Block AI Bots on Warp (Rust): Complete 2026 Guide
Warp is a composable, type-safe Rust web framework from Cloudflare. It has no traditional middleware — everything is a Filter. Bot blocking uses Warp's filter system: warp::reject::custom(AiBotRejection) short-circuits the filter chain before any handler runs, and .recover() converts all rejections into proper HTTP responses.
No middleware — only Filters
In Axum and Actix-web, middleware wraps route handlers as a chain. In Warp, a Filter is a composable unit that either succeeds (extracting a value and passing it downstream) or rejects (producing a Rejection that bypasses all downstream filters). Chain .and(bot_check()) to any route and the handler never executes for AI bots. The entire type chain is verified at compile time.
Protection layers
Dependencies (Cargo.toml)
# Cargo.toml — required dependencies
[dependencies]
warp = "0.3"
tokio = { version = "1", features = ["full"] }
serde_json = "1" # optional — for JSON repliesStep 1 — Shared bot list (src/bots.rs)
Same zero-cost slice pattern as all Rust frameworks — strings baked into the binary, lowercased substring matching at request time.
// src/bots.rs — shared AI bot list
pub const AI_BOTS: &[&str] = &[
// OpenAI
"gptbot", "chatgpt-user", "oai-searchbot",
// Anthropic
"claudebot", "claude-web",
// Common Crawl
"ccbot",
// Bytedance
"bytespider",
// Meta
"meta-externalagent",
// Perplexity
"perplexitybot",
// Google AI
"google-extended", "googleother",
// Cohere
"cohere-ai",
// Amazon
"amazonbot",
// Diffbot
"diffbot",
// AI2
"ai2bot",
// DeepSeek
"deepseekbot",
// Mistral
"mistralai-user",
// xAI
"xai-bot",
// You.com
"youbot",
// DuckDuckGo AI
"duckassistbot",
];
pub fn is_ai_bot(user_agent: &str) -> bool {
let ua = user_agent.to_lowercase();
AI_BOTS.iter().any(|bot| ua.contains(bot))
}Step 2 — Custom Rejection and bot-check filter
impl Reject for AiBotRejection is the marker that makes a type usable as a Warp rejection. The bot_check() filter uses warp::header::optional() to extract the User-Agent without requiring it — missing UA passes as an empty string. .untuple_one() strips the wrapper tuple so .and(bot_check()) composes cleanly without adding an extra value to the handler's argument list.
// src/filters.rs — custom Rejection and bot-check filter
use warp::reject::Reject;
use warp::Filter;
use crate::bots::is_ai_bot;
/// Custom rejection type for AI bot blocking.
/// Reject::impl is a marker trait — no methods needed.
#[derive(Debug)]
pub struct AiBotRejection;
impl Reject for AiBotRejection {}
/// Filter that rejects AI bots with a custom Rejection.
///
/// Usage: route.and(bot_check())
/// - AI bot → warp::reject::custom(AiBotRejection) — handler never runs
/// - Legit → passes () to the next filter in the chain
pub fn bot_check() -> impl Filter<Extract = (), Error = warp::Rejection> + Clone {
warp::header::optional::<String>("user-agent")
.and_then(|ua: Option<String>| async move {
let ua_str = ua.as_deref().unwrap_or("");
if is_ai_bot(ua_str) {
Err(warp::reject::custom(AiBotRejection))
} else {
Ok(())
}
})
// untuple_one: converts Filter<Extract=((),)> to Filter<Extract=()>
// so it can be chained with .and() cleanly
.untuple_one()
}Step 3 — recover() — rejection to HTTP response
Without recover(), Warp returns a 405 for unhandled custom rejections. The err.find::<AiBotRejection>() pattern downcasts the rejection type. Always include a fallback arm — missing it silently swallows unrelated errors.
// src/handlers.rs — recover() converts Rejections to HTTP responses
use std::convert::Infallible;
use warp::http::StatusCode;
use warp::Rejection;
use crate::filters::AiBotRejection;
/// Global rejection handler — converts all Rejections to HTTP responses.
/// Must be passed to .recover() on your top-level route.
pub async fn handle_rejection(err: Rejection) -> Result<impl warp::Reply, Infallible> {
if err.find::<AiBotRejection>().is_some() {
// AI bot — return 403 with X-Robots-Tag
return Ok(warp::reply::with_status(
warp::reply::with_header(
"Forbidden",
"x-robots-tag",
"noai, noimageai",
),
StatusCode::FORBIDDEN,
));
}
if err.is_not_found() {
return Ok(warp::reply::with_status(
warp::reply::with_header("Not Found", "x-robots-tag", "noai, noimageai"),
StatusCode::NOT_FOUND,
));
}
// Fallback — don't swallow unknown rejections
Ok(warp::reply::with_status(
warp::reply::with_header("Internal Server Error", "x-robots-tag", "noai, noimageai"),
StatusCode::INTERNAL_SERVER_ERROR,
))
}Step 4 — Route composition with .and(bot_check())
Chain .and(bot_check()) before the handler closure on every route you want to protect. robots.txt and /health are intentionally left unprotected — legitimate search crawlers need robots.txt access.
// src/main.rs — composing routes with bot_check filter
use warp::Filter;
mod bots;
mod filters;
mod handlers;
#[tokio::main]
async fn main() {
// robots.txt — MUST be unblocked for legitimate crawlers.
// Register before bot_check routes so it doesn't get caught.
let robots = warp::path("robots.txt")
.and(warp::get())
.map(|| {
warp::reply::with_header(
warp::reply::html(include_str!("../static/robots.txt")),
"content-type",
"text/plain; charset=utf-8",
)
});
// Health — no bot check, always accessible
let health = warp::path("health")
.and(warp::get())
.map(|| "ok");
// Protected index — bot_check() runs before the handler.
// AI bots → AiBotRejection (handler never runs).
// Legit requests → passes through to the closure.
let index = warp::path::end()
.and(warp::get())
.and(filters::bot_check()) // ← chains the filter
.map(|| "Welcome to my site");
// Protected API — same pattern, scoped to /api
let api_data = warp::path!("api" / "data")
.and(warp::get())
.and(filters::bot_check())
.map(|| warp::reply::json(&serde_json::json!({ "data": "protected" })));
// Static files (fallback) — serves ./static directory
let static_files = warp::fs::dir("./static");
// Combine all routes — .or() tries each in order
let routes = robots
.or(health)
.or(index)
.or(api_data)
.or(static_files)
// Map all successful responses to include X-Robots-Tag
.map(|reply| {
warp::reply::with_header(reply, "x-robots-tag", "noai, noimageai")
})
// Convert all Rejections (including AiBotRejection) to HTTP responses
.recover(handlers::handle_rejection);
println!("Listening on 0.0.0.0:8080");
warp::serve(routes).run(([0, 0, 0, 0], 8080)).await;
}Step 5 — Global blocking note (Warp's limitation)
No single-point global middleware: Unlike Axum's Router::layer() or Actix-web's App::wrap(), Warp has no single location to apply a filter to all routes. You must chain .and(bot_check()) to each route you want protected. For large route sets, factor this into a helper function that wraps route definitions. Alternatively, put a reverse proxy (nginx, Caddy) in front of Warp for zero-configuration global blocking.
// Alternative: global bot check applied to all protected routes at once
use warp::Filter;
// Wrap a group of routes with bot_check applied to all of them
fn protected_routes() -> impl Filter<Extract = impl warp::Reply, Error = warp::Rejection> + Clone {
let index = warp::path::end()
.and(warp::get())
.map(|| "Home page");
let dashboard = warp::path("dashboard")
.and(warp::get())
.map(|| "Dashboard");
let api = warp::path("api")
.and(warp::path("data"))
.and(warp::get())
.map(|| warp::reply::json(&serde_json::json!({ "data": "ok" })));
// Apply bot_check to the entire group via .and() before combining
// Note: in Warp, you can't wrap multiple routes with a single shared filter
// the way Axum or Actix use scope middleware.
// The idiomatic approach is to chain .and(bot_check()) per-route,
// or wrap the combined routes in a boxed filter.
index.or(dashboard).or(api)
}
// For truly global blocking at the top level, use recover() to intercept
// any request that doesn't match a route AND add bot_check to every route.
// This is more verbose than Axum::layer() but achieves the same effect.Step 6 — robots.txt
Always register the robots.txt route without bot_check() — legitimate search crawlers need to read it. Register it before warp::fs::dir() so the explicit handler takes priority over the static file fallback.
// robots.txt in Warp — three approaches
// Option A: warp::fs::dir — serves ./static directory including robots.txt
// Place robots.txt in ./static/robots.txt
// warp::fs::dir serves any file at its path: /robots.txt → ./static/robots.txt
let static_files = warp::fs::dir("./static");
// Option B: Explicit route with inline content
const ROBOTS_TXT: &str = "User-agent: *
Allow: /
# AI training bots — blocked
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Meta-ExternalAgent
Disallow: /
User-agent: YouBot
Disallow: /
User-agent: AmazonBot
Disallow: /
User-agent: Diffbot
Disallow: /";
let robots = warp::path("robots.txt")
.and(warp::get())
.map(|| {
warp::http::Response::builder()
.header("content-type", "text/plain; charset=utf-8")
.body(ROBOTS_TXT)
});
// Option C: Compile-time embed with include_str!()
let robots_embedded = warp::path("robots.txt")
.and(warp::get())
.map(|| {
warp::http::Response::builder()
.header("content-type", "text/plain; charset=utf-8")
.body(include_str!("../static/robots.txt"))
});Step 7 — noai meta tag in HTML responses
// noai meta tag in HTML responses
use warp::Filter;
// Return HTML with noai meta tag directly from a handler
let index = warp::path::end()
.and(warp::get())
.and(filters::bot_check())
.map(|| {
warp::reply::html(r#"<!DOCTYPE html>
<html>
<head>
<meta name="robots" content="noai, noimageai">
<title>My Site</title>
</head>
<body>
<h1>Welcome</h1>
</body>
</html>"#)
});
// With Tera templates:
// In your base template, add the meta tag inside <head>.
// Pass the rendered string to warp::reply::html().Warp vs Axum vs Actix-web vs Rocket
| Feature | Warp | Axum | Actix-web | Rocket |
|---|---|---|---|---|
| Abstraction model | Filter combinators — .and(), .or(), .map(), .and_then() | Router + Tower layers — from_fn() middleware | App + wrap_fn closures or Transform+Service | Routes + Request Guards + Fairings (lifecycle) |
| Request blocking | warp::reject::custom(AiBotRejection) — Rejection bubbles to recover() | Return Response early without calling next.run(req) | Return HttpResponse early without calling next.call(req) | Outcome::Error((Status::Forbidden, ())) in FromRequest |
| Global blocking | Chain .and(bot_check()) per-route; recover() handles all rejections globally | Router::layer(from_fn(blocker)) — single declaration | App::wrap(middleware) — single declaration | No true global middleware — fairing override (handler still runs) |
| Blocking scope | Per-route with .and() — no scope/group middleware concept | Router::layer() (global) or route_layer() (matched) | App::wrap() (global) or Scope::wrap() (prefixed group) | Per-route guards only; no group scope |
| Add response header | .map(|r| warp::reply::with_header(r, "x-robots-tag", "noai, noimageai")) | Modify res.headers_mut() after next.run() or SetResponseHeaderLayer | Modify res.headers_mut() after next.call() | Fairing on_response: res.set_raw_header() |
| UA header access | warp::header::optional::<String>("user-agent") | req.headers().get(header::USER_AGENT) | req.headers().get("user-agent") | req.headers().get_one("User-Agent") |
| Static files / robots.txt | warp::fs::dir("./static") or explicit route | tower_http::services::ServeDir | actix_files::Files::new("/", "./static") | FileServer::from("./static") or #[get] route |
| Compile-time checks | Yes — filter types fully checked at compile time (verbose errors) | Partial — extractor types checked, some runtime checks remain | Partial — wrap_fn checked; Transform implementation partially runtime | Yes — route types and guard types checked at compile time |
Summary
- .and(bot_check()) — chain before any handler to block AI bots per-route. Handler never runs for blocked requests.
- warp::reject::custom(AiBotRejection) — the Warp-idiomatic way to short-circuit. Always pair with
.recover(). - recover() — global rejection handler. Converts
AiBotRejectionto 403. Handles 404s and errors in one place. - No group middleware — must chain bot_check() per-route or use nginx/Caddy in front for zero-config global blocking.
- robots.txt unblocked — always register it before bot_check() routes so legitimate crawlers can read it.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.