How to Block AI Bots in Akka HTTP
Akka HTTP is Lightbend's Scala HTTP toolkit built on Akka Streams. Its routing DSL is purely functional — routes are values composed with ~, and cross-cutting logic is expressed as directives. The correct primitive for a bot-blocking directive is mapInnerRoute combined with optionalHeaderValueByName. optionalHeaderValueByName("User-Agent") returns Option[String] — None when absent, Some(value) when present. The robots.txt route must precede botBlocker in the route tree; if it sits inside the directive, AI bots hit the 403 before they can read Disallow rules. The same patterns apply to Pekko HTTP (the Apache community fork) with only package name changes.
1. Bot detection
Pure Scala singleton object, no external dependencies. Lowercase the User-Agent string once, then test each known pattern with String.contains.
// AiBotDetector.scala — pure Scala, no dependencies
package middleware
object AiBotDetector {
// Known AI crawler User-Agent substrings, all lowercase.
private val patterns: List[String] = List(
"gptbot",
"chatgpt-user",
"claudebot",
"anthropic-ai",
"ccbot",
"google-extended",
"cohere-ai",
"meta-externalagent",
"bytespider",
"omgili",
"diffbot",
"imagesiftbot",
"magpie-crawler",
"amazonbot",
"dataprovider",
"netcraft",
)
/** Returns true when userAgent matches a known AI crawler.
* Safe on an empty string — no special-casing required.
*/
def isAiBot(userAgent: String): Boolean =
if (userAgent.isEmpty) false
else {
val lower = userAgent.toLowerCase
patterns.exists(lower.contains)
}
}2. Directive and server setup
mapInnerRoute is a Directive0 constructor that wraps the inner Route. Inside, optionalHeaderValueByName extracts the UA as Option[String]. On a match, complete(Forbidden, ...) short-circuits. On a pass, mapResponseHeaders appends X-Robots-Tag before invoking the inner route. The robots.txt route sits outside and before botBlocker.
// Main.scala — Akka HTTP server with AI bot blocking
// build.sbt:
// "com.typesafe.akka" %% "akka-http" % "10.5.3"
// "com.typesafe.akka" %% "akka-stream" % "2.8.5"
package server
import akka.actor.typed.ActorSystem
import akka.actor.typed.scaladsl.Behaviors
import akka.http.scaladsl.Http
import akka.http.scaladsl.model.StatusCodes
import akka.http.scaladsl.model.headers.RawHeader
import akka.http.scaladsl.server.Directives._
import akka.http.scaladsl.server.{Directive0, Route}
import middleware.AiBotDetector
import scala.concurrent.ExecutionContextExecutor
import scala.io.StdIn
// ── Bot-blocking directive ──────────────────────────────────────────────────
//
// mapInnerRoute receives the inner Route and returns a new Route.
// This is the correct primitive when you need to either:
// • short-circuit — complete(StatusCodes.Forbidden, ...)
// • pass through — mapResponseHeaders(...)(inner)
//
// optionalHeaderValueByName("User-Agent") extracts the UA header as
// Option[String] — None when absent, Some(value) when present.
// .getOrElse("") converts to String safely; no null check required.
//
// NOTE: complete(StatusCodes.Forbidden, headers, body) uses Akka HTTP's
// overload that accepts (StatusCode, Seq[HttpHeader], String). Akka HTTP
// infers Content-Type: text/plain; charset=UTF-8 for the String body.
def botBlocker: Directive0 = mapInnerRoute { inner =>
optionalHeaderValueByName("User-Agent") { uaOpt =>
val ua = uaOpt.getOrElse("")
if (AiBotDetector.isAiBot(ua)) {
// Block: respond with 403 + X-Robots-Tag. Do NOT invoke inner.
complete(
StatusCodes.Forbidden,
List(RawHeader("X-Robots-Tag", "noai, noimageai")),
"Forbidden: AI crawlers are not permitted",
)
} else {
// Pass: append X-Robots-Tag to the response, then run inner route.
// mapResponseHeaders returns a Directive0 — apply it to inner.
mapResponseHeaders(_ :+ RawHeader("X-Robots-Tag", "noai, noimageai"))(inner)
}
}
}
// ── Robots.txt ──────────────────────────────────────────────────────────────
val robotsTxt: String =
"""User-agent: *
|Allow: /
|
|User-agent: GPTBot
|Disallow: /
|
|User-agent: ClaudeBot
|Disallow: /
|
|User-agent: CCBot
|Disallow: /
|
|User-agent: Google-Extended
|Disallow: /""".stripMargin
// ── Route tree ──────────────────────────────────────────────────────────────
//
// CRITICAL: the robots.txt route MUST appear BEFORE botBlocker in the
// route tree. Akka HTTP tries alternatives left-to-right with ~.
// If /robots.txt sits inside botBlocker, AI bots hit the 403 before
// they can read the Disallow rules — defeating the purpose.
val route: Route =
path("robots.txt") { // ← always allow
get { complete(robotsTxt) }
} ~
botBlocker { // ← all other routes protected
pathSingleSlash {
get { complete("""{"message":"ok"}""") }
} ~
path("api" / "data") {
get { complete("""{"data":"value"}""") }
}
}
// ── Server startup ───────────────────────────────────────────────────────────
object Main extends App {
implicit val system: ActorSystem[Nothing] =
ActorSystem(Behaviors.empty, "ai-bot-blocker")
implicit val ec: ExecutionContextExecutor = system.executionContext
val bindingFuture = Http().newServerAt("0.0.0.0", 8080).bind(route)
println("Server online at http://0.0.0.0:8080/ — press RETURN to stop")
StdIn.readLine()
bindingFuture.flatMap(_.unbind()).onComplete(_ => system.terminate())
}3. Scoped protection — /api/* only
Wrap only the pathPrefix("api") subtree in botBlocker. Routes at the root remain unprotected.
// Scope bot blocking to /api/* only — public routes are unprotected.
//
// pathPrefix("api") matches any path starting with /api.
// botBlocker is applied only inside that prefix.
val route: Route =
path("robots.txt") {
get { complete(robotsTxt) }
} ~
pathSingleSlash {
// Public — no bot blocking
get { complete("""{"message":"ok"}""") }
} ~
pathPrefix("api") {
botBlocker { // only /api/* is protected
path("data") {
get { complete("""{"data":"value"}""") }
} ~
path("users") {
get { complete("""{"users":[]}""") }
}
}
}4. Pekko HTTP (Apache fork)
Pekko HTTP is the Apache-licensed community fork created after Lightbend moved Akka to BSL in September 2022. The directive API is word-for-word identical — only import paths change.
// Pekko HTTP — Apache community fork of Akka HTTP (BSL-free).
// API is identical; only package names differ.
//
// Replace in build.sbt:
// "com.typesafe.akka" %% "akka-http" → "org.apache.pekko" %% "pekko-http" % "1.1.0"
// "com.typesafe.akka" %% "akka-stream" → "org.apache.pekko" %% "pekko-stream" % "1.1.0"
//
// Replace in source imports:
// akka.actor.typed → org.apache.pekko.actor.typed
// akka.http.scaladsl → org.apache.pekko.http.scaladsl
// akka.http.scaladsl.model → org.apache.pekko.http.scaladsl.model
//
// The botBlocker directive, route DSL, and all directive combinators
// (mapInnerRoute, optionalHeaderValueByName, mapResponseHeaders) are
// word-for-word identical between Akka HTTP and Pekko HTTP.
import org.apache.pekko.actor.typed.ActorSystem
import org.apache.pekko.actor.typed.scaladsl.Behaviors
import org.apache.pekko.http.scaladsl.Http
import org.apache.pekko.http.scaladsl.model.StatusCodes
import org.apache.pekko.http.scaladsl.model.headers.RawHeader
import org.apache.pekko.http.scaladsl.server.Directives._
import org.apache.pekko.http.scaladsl.server.{Directive0, Route}
// botBlocker implementation is identical to the Akka HTTP version above.Key points
- Use
mapInnerRoute, notflatMap:flatMapon a directive expects its callback to return aDirective, butcomplete(...)returns aRoute.mapInnerRouteis the right primitive when you need to inspect the request and either short-circuit or delegate to the inner route. optionalHeaderValueByNamereturnsOption[String]:optionalHeaderValueByName("User-Agent")is aDirective1[Option[String]]—Nonewhen the header is absent,Some(value)when present. Call.getOrElse("")to get a safe empty string. The Option type enforces safe handling at compile time.robots.txtbeforebotBlocker: Akka HTTP evaluates alternatives in order with~. If/robots.txtsits insidebotBlocker, AI bots receive a 403 before they can read theDisallowrules — which means they cannot self-exclude. Putrobots.txtfirst, outside the directive.complete(StatusCode, Seq[HttpHeader], String)sets headers on the block response: Akka HTTP's overloadedcompleteaccepts a status code, a header list, and a string body. Akka HTTP infersContent-Type: text/plain; charset=UTF-8for the string body automatically.mapResponseHeadersappendsX-Robots-Tagto passing responses: For legitimate visitors, applymapResponseHeaders(_ :+ RawHeader("X-Robots-Tag", "noai, noimageai"))to the inner route. This adds the header regardless of what the inner handler sets, because it transforms the response stream after the handler runs.- Pekko HTTP is a drop-in replacement: The Apache fork has identical directives, types, and combinators. Only the top-level package prefix changes from
akkatoorg.apache.pekko. Prefer Pekko HTTP for new projects to avoid Lightbend BSL licensing on Akka 2.7+.
Framework comparison — JVM HTTP middleware models
| Framework | Middleware model | Block request | Header access |
|---|---|---|---|
| Akka HTTP | Directive0 / mapInnerRoute | complete(Forbidden, headers, body) | optionalHeaderValueByName → Option[String] |
| Spring Boot | OncePerRequestFilter / @WebFilter | response.sendError(403); skip doFilter | request.getHeader() → null absent |
| Vert.x Web | Route.handler() — event-loop chain | ctx.fail(403); skip ctx.next() | ctx.request().getHeader() → null |
| Play Framework | EssentialFilter / ActionBuilder | Return Future(Forbidden(...)) without calling next | req.headers.get() → Option[String] |
Akka HTTP and Play Framework both use Option[String] for absent headers, making null-safety a compile-time guarantee. Spring Boot and Vert.x return null for missing headers — always guard with a null check. Akka HTTP's directive model is the most compositional: directives are first-class values that can be combined, tested in isolation, and reused across route trees.