How to Block AI Bots in Kotlin Http4k
Http4k is a Kotlin HTTP toolkit built on a single idea: everything is a function. An HttpHandler is (Request) -> Response. A Filter is (HttpHandler) -> HttpHandler — a function that wraps a handler. There is no reflection, no DI container, no annotations. Bot blocking is a Filter: the outer lambda receives next (the downstream handler), the inner lambda receives req — return Response(FORBIDDEN) to block or call next(req) to pass through. req.header("user-agent") is case-insensitive and returns String?. All Response methods (.header(), .body()) return a new immutable instance.
1. Bot detection
Pure Kotlin, no dependencies. String.contains() for literal substring matching — no regex. List.any() short-circuits on first match.
// BotUtils.kt — AI bot detection, no external dependencies
package com.example.botblocker
private val AI_BOT_PATTERNS = listOf(
"gptbot",
"chatgpt-user",
"claudebot",
"anthropic-ai",
"ccbot",
"google-extended",
"cohere-ai",
"meta-externalagent",
"bytespider",
"omgili",
"diffbot",
"imagesiftbot",
"magpie-crawler",
"amazonbot",
"dataprovider",
"netcraft",
)
/**
* Returns true if the User-Agent string matches a known AI crawler.
* String.contains() — literal substring match, no regex.
*/
fun isAiBot(ua: String): Boolean {
if (ua.isBlank()) return false
val lower = ua.lowercase()
// any() short-circuits on the first match
return AI_BOT_PATTERNS.any { lower.contains(it) }
}2. Filter — (HttpHandler) -> HttpHandler
The Filter lambda has two levels: the outer lambda receives next (called once when the filter is applied), the inner lambda is the actual per-request handler. Return a Response directly to block; call next(req) to delegate downstream.
// AiBotFilter.kt — Http4k Filter for blocking AI crawlers
package com.example.botblocker
import org.http4k.core.Filter
import org.http4k.core.Response
import org.http4k.core.Status.Companion.FORBIDDEN
import org.http4k.core.Status.Companion.OK
/**
* Http4k Filter type alias: (HttpHandler) -> HttpHandler
*
* A Filter is a function that wraps an HttpHandler:
* - Outer lambda receives `next` — the downstream handler
* - Inner lambda receives `req` — the incoming Request
*
* To BLOCK: return a Response without calling next(req)
* To PASS: call next(req) and optionally modify the Response
*/
val aiBotFilter = Filter { next ->
{ req ->
// Path guard: let /robots.txt through regardless of User-Agent.
// Bots read robots.txt to discover Disallow rules.
if (req.uri.path == "/robots.txt") {
next(req)
} else {
// req.header() is case-insensitive — "user-agent" == "User-Agent"
// Returns String? — use Elvis to provide a default empty string
val ua = req.header("user-agent") ?: ""
if (isAiBot(ua)) {
// Block: return Response directly — do NOT call next(req)
// Response is immutable — each .header() / .body() returns a new instance
Response(FORBIDDEN)
.header("Content-Type", "text/plain")
.header("X-Robots-Tag", "noai, noimageai")
.body("Forbidden")
} else {
// Pass: call next, then add X-Robots-Tag to the downstream response
// .header() on Response returns a new Response with the header appended
next(req).header("X-Robots-Tag", "noai, noimageai")
}
}
}
}3. App.kt — Filter.then(routes(...))
Filter.then(HttpHandler) applies the filter in front of the router, producing a new HttpHandler. SunHttp uses the JDK's built-in HTTP server — no extra runtime dependencies. Swap to Netty or Undertow by changing the import and adding the artifact.
// App.kt — Http4k application
package com.example.botblocker
import org.http4k.core.Method.GET
import org.http4k.core.Response
import org.http4k.core.Status.Companion.OK
import org.http4k.routing.bind
import org.http4k.routing.routes
import org.http4k.server.SunHttp
import org.http4k.server.asServer
fun main() {
val router = routes(
"/robots.txt" bind GET to { _ ->
Response(OK)
.header("Content-Type", "text/plain")
.body(
"""
User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
""".trimIndent()
)
},
"/" bind GET to { _ ->
Response(OK)
.header("Content-Type", "application/json")
.body("""{"message":"Hello"}""")
},
"/api/data" bind GET to { _ ->
Response(OK)
.header("Content-Type", "application/json")
.body("""{"data":"value"}""")
},
)
// Filter.then(HttpHandler) applies the filter in front of the router.
// aiBotFilter wraps router — all requests pass through the filter first.
val app = aiBotFilter.then(router)
// SunHttp uses the JDK built-in HTTP server — zero extra dependencies.
// Other backends: Netty, Undertow, Jetty, Apache (add the relevant artifact).
app.asServer(SunHttp(8080)).start().block()
}4. Chaining multiple filters
Filter.then(Filter) composes two filters into one. Filters are applied left-to-right — the first filter in the chain is the outermost wrapper and sees every request first.
// Chaining multiple filters with Filter.then(Filter)
// Filters are applied left-to-right: first.then(second).then(router)
// means first wraps second wraps router.
import org.http4k.core.Filter
import org.http4k.core.then
val loggingFilter = Filter { next ->
{ req ->
println(">> ${req.method} ${req.uri}")
val res = next(req)
println("<< ${res.status}")
res
}
}
val corsFilter = Filter { next ->
{ req ->
next(req).header("Access-Control-Allow-Origin", "*")
}
}
// Execution order: loggingFilter → aiBotFilter → corsFilter → router
val app = loggingFilter
.then(aiBotFilter)
.then(corsFilter)
.then(router)5. Scoped filter — protect /api only
Apply the filter to a sub-router rather than the top-level app. Routes outside the protected sub-router bypass the bot check entirely. Health checks and robots.txt remain unfiltered.
// Scoped filter — apply bot blocking only to /api routes
// Routes outside the apiRoutes block are unprotected.
import org.http4k.core.Method.GET
import org.http4k.routing.bind
import org.http4k.routing.routes
val apiRoutes = aiBotFilter.then(
routes(
"/api/data" bind GET to { _ ->
Response(OK)
.header("Content-Type", "application/json")
.body("""{"data":"value"}""")
},
"/api/status" bind GET to { _ ->
Response(OK).body("ok")
},
)
)
// Top-level router: /health bypasses the bot filter entirely
val app = routes(
"/health" bind GET to { _ -> Response(OK).body("ok") },
"/robots.txt" bind GET to robotsHandler,
// Mount the protected API sub-router at /api
apiRoutes,
)6. Unit testing — no server required
Http4k filters and handlers are plain functions — call them directly in tests. No mocking, no test server, no HTTP client.aiBotFilter.then(downstream) returns a regular function you invoke with a Request and inspect the returned Response.
// AiBotFilterTest.kt — unit-testing the filter with no server required
// Http4k filters are plain functions — test them directly without HTTP.
import org.http4k.core.Method.GET
import org.http4k.core.Request
import org.http4k.core.Response
import org.http4k.core.Status.Companion.OK
import org.http4k.core.Status.Companion.FORBIDDEN
import org.junit.jupiter.api.Assertions.assertEquals
import org.junit.jupiter.api.Test
class AiBotFilterTest {
// A minimal downstream handler — returns 200 OK
private val downstream = { _: Request -> Response(OK).body("Hello") }
// Apply the filter to the downstream handler — produces a testable HttpHandler
private val handler = aiBotFilter.then(downstream)
@Test
fun `blocks GPTBot with 403`() {
val req = Request(GET, "/").header("User-Agent", "Mozilla/5.0 (compatible; GPTBot/1.0)")
val res = handler(req)
assertEquals(FORBIDDEN, res.status)
assertEquals("noai, noimageai", res.header("X-Robots-Tag"))
}
@Test
fun `passes normal browser`() {
val req = Request(GET, "/").header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64)")
val res = handler(req)
assertEquals(OK, res.status)
assertEquals("noai, noimageai", res.header("X-Robots-Tag"))
}
@Test
fun `passes robots txt regardless of User-Agent`() {
val req = Request(GET, "/robots.txt").header("User-Agent", "GPTBot/1.0")
val res = handler(req)
// robots.txt guard fires before bot check — downstream returns 200
assertEquals(OK, res.status)
}
}7. build.gradle.kts
// build.gradle.kts — Http4k dependencies
plugins {
kotlin("jvm") version "2.0.0"
application
}
application {
mainClass.set("com.example.botblocker.AppKt")
}
repositories {
mavenCentral()
}
val http4kVersion = "5.32.1.0"
dependencies {
// Core — HttpHandler, Filter, Request, Response types
implementation("org.http4k:http4k-core:${http4kVersion}")
// Server backend — choose one:
implementation("org.http4k:http4k-server-sunhttp:${http4kVersion}") // JDK built-in, zero deps
// implementation("org.http4k:http4k-server-netty:${http4kVersion}")
// implementation("org.http4k:http4k-server-undertow:${http4kVersion}")
// implementation("org.http4k:http4k-server-jetty:${http4kVersion}")
testImplementation(kotlin("test"))
testImplementation("org.junit.jupiter:junit-jupiter:5.10.2")
}
tasks.test {
useJUnitPlatform()
}Key points
- Filter is a lambda-in-lambda:
Filter { next -> { req -> ... } }. The outer lambda receivesnextonce at assembly time. The inner lambda is invoked per request. Both levels are required — the inner lambda is theHttpHandlerthat replaces the downstream handler. req.header()is case-insensitive: Http4k normalises header name lookups — pass"user-agent"or"User-Agent", both work. ReturnsString?— use?: ""to default to empty string.Responseis immutable: Every call to.header(),.body(), or.status()returns a newResponseinstance. Chain them:Response(FORBIDDEN).header(...).body(...). The original is never mutated.- Do not call
next(req)after returning a blockingResponse: In Http4k this is structurally enforced — the inner lambda returns a singleResponse. You either return a newResponse(FORBIDDEN)or returnnext(req). There is no way to accidentally do both (unlike middleware with side-effectingnext()calls in Express/Polka). Filter.then()composition is left-to-right:a.then(b).then(c).then(handler)— requests seeafirst, thenb, thenc, thenhandler. Responses travel back in reverse.- Filters are fully testable without a server:
aiBotFilter.then(downstream)(request)is a plain function call. No test containers, no embedded servers, no HTTP clients — just construct aRequest, invoke the handler, assert on theResponse. This is one of Http4k's defining advantages over annotation-driven frameworks. - Multiple server backends, same code: The application logic (filters, routes, handlers) is backend- agnostic. Swap
SunHttpforNettyorUndertowby changing one import and one artifact inbuild.gradle.kts— no code changes required.
Framework comparison — Kotlin / JVM HTTP frameworks
| Framework | Middleware / Filter | Block | UA header |
|---|---|---|---|
| Http4k | Filter { next -> { req -> ... } } | Response(FORBIDDEN).header(...) | req.header("user-agent") |
| Ktor | install(plugin) { intercept(...)} | call.respond(HttpStatusCode.Forbidden); finish() | call.request.headers["User-Agent"] |
| Spring Boot | OncePerRequestFilter / HandlerInterceptor | response.sendError(403) | request.getHeader("User-Agent") |
| Javalin | app.before { ctx -> ... } | ctx.status(403).result("Forbidden"); ctx.skipRemainingHandlers() | ctx.userAgent() |
Http4k's Filter is the most minimal approach — no framework registration, no annotation processing, no coroutine context. The filter is a plain function value that can be composed, tested, and reused without any framework infrastructure. Ktor requires coroutines and plugin installation; Spring Boot requires a bean container; Javalin uses a mutable context object.