How to Block AI Bots on http4s (Scala): Complete 2026 Guide
http4s is the purely functional Scala HTTP library — built on Cats Effect and Fs2. Routes are HttpRoutes[F], a Kleisli from Request[F] to OptionT[F, Response[F]]. Middleware is a function HttpRoutes[F] => HttpRoutes[F] — wrap routes in a new Kleisli that checks the User-Agent and short-circuits without calling the inner routes for AI bots.
Short-circuit = don't call inner routes
In the bot-blocking middleware, returning OptionT.some(forbiddenResponse) immediately — without calling routes(req) — means the inner HttpRoutes[F] never executes. No pattern matching, no database calls, no template rendering. For legitimate requests, call routes(req).map(_.putHeaders(...)) to add the X-Robots-Tag to every response in one place.
Protection layers
Step 1 — Bot list (AiBots.scala)
A plain List[String] — immutable, JVM-allocated once at startup. exists short-circuits on the first match. toLowerCase before checking handles all User-Agent capitalisation variants.
// AiBots.scala — shared bot detection
package myapp
object AiBots {
private val patterns: List[String] = List(
// OpenAI
"gptbot", "chatgpt-user", "oai-searchbot",
// Anthropic
"claudebot", "claude-web",
// Common Crawl
"ccbot",
// Bytedance
"bytespider",
// Meta
"meta-externalagent",
// Perplexity
"perplexitybot",
// Google AI
"google-extended", "googleother",
// Cohere
"cohere-ai",
// Amazon
"amazonbot",
// Diffbot
"diffbot",
// AI2
"ai2bot",
// DeepSeek
"deepseekbot",
// Mistral
"mistralai-user",
// xAI
"xai-bot",
// You.com
"youbot",
// DuckDuckGo AI
"duckassistbot",
)
def isAiBot(userAgent: String): Boolean = {
val ua = userAgent.toLowerCase
patterns.exists(ua.contains)
}
}Step 2 — Bot-blocking middleware (Kleisli)
The middleware takes routes: HttpRoutes[F] and returns a new Kleisli. Inside, check the User-Agent using CIString — it is case-insensitive and matches regardless of how the client capitalised the header name.
// BotBlockerMiddleware.scala — http4s Kleisli middleware
package myapp
import cats.Monad
import cats.data.{Kleisli, OptionT}
import cats.syntax.all.*
import org.http4s.*
import org.http4s.Status.Forbidden
import org.typelevel.ci.CIString
object BotBlockerMiddleware {
private val xRobotsTag = Header.Raw(
CIString("X-Robots-Tag"),
"noai, noimageai",
)
/** Middleware that wraps HttpRoutes[F] and blocks AI bots with 403.
*
* Type: HttpRoutes[F] => HttpRoutes[F]
* HttpRoutes[F] = Kleisli[OptionT[F, *], Request[F], Response[F]]
*
* For AI bots: returns OptionT.some(403 response) — inner routes never called.
* For legit: calls inner routes(req) and appends X-Robots-Tag header.
*/
def apply[F[_]: Monad](routes: HttpRoutes[F]): HttpRoutes[F] =
Kleisli { (req: Request[F]) =>
// CIString is case-insensitive — matches "User-Agent", "user-agent", etc.
val ua = req.headers
.get(CIString("User-Agent"))
.map(_.head.value)
.getOrElse("")
if (AiBots.isAiBot(ua)) {
// Short-circuit — inner routes never run
OptionT.some[F](
Response[F](Forbidden)
.putHeaders(
xRobotsTag,
Header.Raw(CIString("Content-Type"), "text/plain; charset=utf-8"),
)
.withEntity("Forbidden"),
)
} else {
// Pass through and add X-Robots-Tag to all legitimate responses
routes(req).map(_.putHeaders(xRobotsTag))
}
}
}Step 3 — Route definitions and <+> composition
Place robotsRoutes and /health outside the bot-blocker. Compose with <+> (SemigroupK — first match wins) then call .orNotFound to produce the final HttpApp[F] required by the server.
// AppRoutes.scala — route definitions and composition
package myapp
import cats.effect.{Concurrent, IO}
import org.http4s.*
import org.http4s.dsl.io.*
import org.http4s.syntax.all.*
import org.typelevel.ci.CIString
object AppRoutes {
// robots.txt — must be OUTSIDE the bot-blocker so crawlers can read it
val robotsRoutes: HttpRoutes[IO] = HttpRoutes.of[IO] {
case GET -> Root / "robots.txt" =>
Ok(
"""User-agent: *
|Allow: /
|User-agent: GPTBot
|Disallow: /
|User-agent: ClaudeBot
|Disallow: /
|User-agent: CCBot
|Disallow: /
|User-agent: Bytespider
|Disallow: /
|User-agent: Google-Extended
|Disallow: /
|User-agent: PerplexityBot
|Disallow: /
|User-agent: Meta-ExternalAgent
|Disallow: /
|User-agent: AmazonBot
|Disallow: /
|""".stripMargin,
"text/plain",
)
case GET -> Root / "health" =>
Ok("ok")
}
// Protected routes — wrapped in BotBlockerMiddleware
val protectedRoutes: HttpRoutes[IO] = HttpRoutes.of[IO] {
case GET -> Root =>
Ok(
"""<!DOCTYPE html>
|<html><head>
| <meta name="robots" content="noai, noimageai">
| <title>My Site</title>
|</head><body><h1>Welcome</h1></body></html>
|""".stripMargin,
"text/html",
)
case GET -> Root / "api" / "data" =>
Ok("""{"data": "protected"}""", "application/json")
}
// Compose: robots (unblocked) + protected (bot-blocked)
// <+> is SemigroupK.combine — tries routes in order, first match wins
val app: HttpApp[IO] =
(robotsRoutes <+> BotBlockerMiddleware(protectedRoutes)).orNotFound
}Step 4 — EmberServer setup (build.sbt)
EmberServer is the recommended http4s server backend — pure Scala, HTTP/2, and WebSocket support. Extend IOApp.Simple for the entry point; Cats Effect handles the runtime.
// Main.scala — EmberServer setup (http4s + Cats Effect)
package myapp
import cats.effect.{IO, IOApp}
import com.comcast.ip4s.*
import org.http4s.ember.server.EmberServerBuilder
object Main extends IOApp.Simple {
override def run: IO[Unit] =
EmberServerBuilder
.default[IO]
.withHost(ipv4"0.0.0.0")
.withPort(port"8080")
.withHttpApp(AppRoutes.app)
.build
.useForever
}
// build.sbt — required dependencies
// scalaVersion := "3.4.0"
//
// libraryDependencies ++= Seq(
// "org.http4s" %% "http4s-ember-server" % "0.23.27",
// "org.http4s" %% "http4s-ember-client" % "0.23.27",
// "org.http4s" %% "http4s-dsl" % "0.23.27",
// "org.typelevel" %% "cats-effect" % "3.5.4",
// "org.typelevel" %% "log4cats-noop" % "2.7.0",
// )Step 5 — robots.txt via StaticFile or FileService
StaticFile.fromPath streams a file from the filesystem without loading it into memory — efficient for large files. FileService serves an entire directory. Both integrate as HttpRoutes[F] and compose with <+> like any other routes.
// Serving robots.txt from the filesystem with StaticFile
import cats.effect.IO
import fs2.io.file.Path
import org.http4s.*
import org.http4s.dsl.io.*
import org.http4s.server.staticcontent.{FileService, fileService}
// Option A: StaticFile — single file from path
val robotsRoute: HttpRoutes[IO] = HttpRoutes.of[IO] {
case req @ GET -> Root / "robots.txt" =>
StaticFile
.fromPath[IO](Path("src/main/resources/robots.txt"), Some(req))
.getOrElseF(NotFound())
}
// Option B: FileService — serve an entire directory
// Serves all files under src/main/resources/public/ at their path names
val staticRoutes: HttpRoutes[IO] =
fileService[IO](FileService.Config("src/main/resources/public"))
// Option C: Inline string (compile-time embedded)
val robotsInline: HttpRoutes[IO] = HttpRoutes.of[IO] {
case GET -> Root / "robots.txt" =>
Ok(ROBOTS_TXT_CONTENT, "text/plain; charset=utf-8")
}
private val ROBOTS_TXT_CONTENT = """User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
"""Step 6 — Scoped protection (public routes + bot-blocked API)
Apply BotBlockerMiddleware only to the routes that need protection. Unprotected routes (health, robots.txt, home page) compose with <+> before the protected group.
// Scoped middleware — protect only /api/* routes
import cats.effect.IO
import org.http4s.*
import org.http4s.dsl.io.*
import org.http4s.syntax.all.*
// Public routes — no bot-blocker
val publicRoutes: HttpRoutes[IO] = HttpRoutes.of[IO] {
case GET -> Root / "health" => Ok("ok")
case GET -> Root / "robots.txt" => Ok(ROBOTS_TXT_CONTENT, "text/plain")
case GET -> Root => Ok("<html>...</html>", "text/html")
}
// API routes — wrapped individually in bot-blocker
val apiRoutes: HttpRoutes[IO] = HttpRoutes.of[IO] {
case GET -> Root / "api" / "data" => Ok("""{"data":"protected"}""", "application/json")
case GET -> Root / "api" / "users" => Ok("[]", "application/json")
}
// Compose: public (unblocked) + api (bot-blocked)
// The <+> combinator tries routes in order — first match wins.
val app: HttpApp[IO] =
(publicRoutes <+> BotBlockerMiddleware(apiRoutes)).orNotFoundhttp4s vs Play Framework vs Akka HTTP vs ZIO HTTP
| Feature | http4s | Play Framework | Akka HTTP | ZIO HTTP |
|---|---|---|---|---|
| Middleware type | HttpRoutes[F] => HttpRoutes[F] (Kleisli composition) | EssentialFilter: RequestHeader + raw bytes → short-circuit before body | Directive[T]: composable extractor and transformer, reject() to block | HttpMiddleware[R, E] = Http[R, E, Request, Response] => Http[R, E, Request, Response] |
| Short-circuit | OptionT.some(Response[F](Status.Forbidden)) — inner routes never called | Accumulator.done(Results.Forbidden) — body never read | reject(ValidationRejection("AI bot")) or complete(StatusCodes.Forbidden) | ZIO.succeed(Response.status(Status.Forbidden)) without calling next |
| Route composition | <+> (SemigroupK) combines routes; first match wins | Router.orElse or Action composition | Directive concatenation with ~ or path matching | Http.collect routes composed with ++ |
| Header access | req.headers.get(CIString("User-Agent")).map(_.head.value) | request.headers.get("User-Agent") | optionalHeaderValueByName("User-Agent") | request.header(Header.UserAgent).map(_.renderedValue) |
| robots.txt | StaticFile.fromPath or FileService or inline Ok() handler | Plug.Static equivalent via Assets controller or explicit Action | getFromFile("robots.txt") directive | Http.fromFile or explicit route handler |
| Effect type | F[_]: Concurrent — tagless final, works with IO, ZIO, Monix | Future[Result] or Action[A] | Future[T] — Akka actor system | ZIO[R, E, A] — environment, error, value |
| HTTP server | EmberServerBuilder (default), BlazeServer, or JettyBuilder | Netty (default) or Akka HTTP | Akka HTTP (Netty-based) | ZIO HTTP built-in server |
Summary
- Kleisli middleware = HttpRoutes[F] => HttpRoutes[F] — wrap inner routes; for AI bots return
OptionT.some(403)directly. - CIString for header names — case-insensitive by spec. Always wrap header name strings in
CIString(...). - <+> composition — place robots.txt and health routes before bot-blocker in the chain.
- .map(_.putHeaders(xRobotsTag)) — adds X-Robots-Tag to all legitimate responses in one place.
- .orNotFound — converts
HttpRoutes[F](can return None) toHttpApp[F](always returns a response) — required by the server builder.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.