How to Block AI Bots in Javalin
Javalin is a lightweight JVM web framework for Kotlin and Java, popular for REST APIs, microservices, and developer tooling. It runs on embedded Jetty with a minimal configuration surface. Javalin provides app.before() handlers that run before every matched route — the idiomatic way to block AI crawlers. The Javalin-specific detail: short-circuiting in a before-handler means throwing an exception, typically ForbiddenResponse(), which Javalin catches, renders as a 403, and uses to stop all remaining before-handlers and the route handler for that request.
1. Bot pattern helper
Define patterns in a Kotlin object so they are shared by the before-handler and any tests. Apply ua.lowercase() once before calling contains() — literal substring search, no regex engine.
// src/main/kotlin/com/example/AiBotFilter.kt
package com.example
object AiBotFilter {
// Pattern list — all lowercase, matched against lowercased UA
private val AI_BOT_PATTERNS = listOf(
"gptbot",
"chatgpt-user",
"claudebot",
"anthropic-ai",
"ccbot",
"google-extended",
"cohere-ai",
"meta-externalagent",
"bytespider",
"omgili",
"diffbot",
"imagesiftbot",
"magpie-crawler",
"amazonbot",
"dataprovider",
"netcraft"
)
// contains() — literal substring, no regex overhead
fun isAiBot(ua: String): Boolean {
if (ua.isBlank()) return false
val lower = ua.lowercase()
return AI_BOT_PATTERNS.any { lower.contains(it) }
}
}2. app.before() — global blocker
Register a global before-handler and an after-handler. ForbiddenResponse is a subclass of HttpResponseException — throwing it stops all further processing. The after handler adds X-Robots-Tag to every passing response. Blocked requests get the header set directly on ctx before throwing.
// src/main/kotlin/com/example/App.kt
package com.example
import io.javalin.Javalin
import io.javalin.http.ForbiddenResponse
import io.javalin.http.staticfiles.Location
fun main() {
val app = Javalin.create { config ->
// Static files served at Jetty layer — before Javalin handlers.
// public/robots.txt is served here, bypassing before-handlers entirely.
config.staticFiles.add("/public", Location.CLASSPATH)
}
// ── Global AI-bot blocker ─────────────────────────────────────────────────
// app.before runs before every matching route handler.
// Static files (robots.txt) never reach this handler.
app.before { ctx ->
val ua = ctx.header("User-Agent") ?: ""
if (AiBotFilter.isAiBot(ua)) {
ctx.header("X-Robots-Tag", "noai, noimageai")
// ForbiddenResponse is HttpResponseException(403).
// Javalin catches it, renders the error response, and
// stops all remaining before-handlers and the route handler.
throw ForbiddenResponse("Forbidden")
}
}
// ── Add X-Robots-Tag to all passing responses ─────────────────────────────
app.after { ctx ->
ctx.header("X-Robots-Tag", "noai, noimageai")
}
// ── Routes ────────────────────────────────────────────────────────────────
app.get("/") { ctx -> ctx.result("Hello") }
app.get("/api/data") { ctx -> ctx.json(mapOf("data" to "value")) }
app.start(8080)
}3. Route-group scoping — protect /api/* only
Use Javalin's router DSL to scope a before-handler to a path() block. Handlers defined inside the block inherit the before-handler; routes outside it are unaffected. This pattern leaves public endpoints — health checks, webhooks, the root — open while protecting API routes.
// Route-group scoped blocker — protect /api/* only
import io.javalin.Javalin
import io.javalin.apibuilder.ApiBuilder.*
import io.javalin.http.ForbiddenResponse
import io.javalin.http.staticfiles.Location
fun main() {
val app = Javalin.create { config ->
config.staticFiles.add("/public", Location.CLASSPATH)
config.router.apiBuilder {
// Public — no blocker
get("/") { ctx -> ctx.result("Hello") }
get("/health") { ctx -> ctx.result("ok") }
// Protected — before-handler scoped to /api/*
path("/api") {
before { ctx ->
val ua = ctx.header("User-Agent") ?: ""
if (AiBotFilter.isAiBot(ua)) {
ctx.header("X-Robots-Tag", "noai, noimageai")
throw ForbiddenResponse("Forbidden")
}
ctx.header("X-Robots-Tag", "noai, noimageai")
}
get("/data") { ctx -> ctx.json(mapOf("data" to "value")) }
post("/submit") { ctx -> ctx.status(201) }
}
}
}
app.start(8080)
}4. Java variant — identical pattern
The same approach in Java. ua.toLowerCase() instead of ua.lowercase(), and ua == null check instead of the Elvis operator. List.of() requires Java 9+.
// Java variant — identical logic, Javalin 6.x
import io.javalin.Javalin;
import io.javalin.http.ForbiddenResponse;
import io.javalin.http.staticfiles.Location;
import java.util.List;
public class App {
private static final List<String> AI_BOT_PATTERNS = List.of(
"gptbot", "chatgpt-user", "claudebot", "anthropic-ai",
"ccbot", "google-extended", "cohere-ai", "meta-externalagent",
"bytespider", "omgili", "diffbot", "imagesiftbot",
"magpie-crawler", "amazonbot", "dataprovider", "netcraft"
);
private static boolean isAiBot(String ua) {
if (ua == null || ua.isBlank()) return false;
String lower = ua.toLowerCase();
return AI_BOT_PATTERNS.stream().anyMatch(lower::contains);
}
public static void main(String[] args) {
var app = Javalin.create(config ->
config.staticFiles.add("/public", Location.CLASSPATH)
);
app.before(ctx -> {
String ua = ctx.header("User-Agent");
if (isAiBot(ua)) {
ctx.header("X-Robots-Tag", "noai, noimageai");
throw new ForbiddenResponse("Forbidden");
}
});
app.after(ctx ->
ctx.header("X-Robots-Tag", "noai, noimageai")
);
app.get("/", ctx -> ctx.result("Hello"));
app.start(8080);
}
}5. robots.txt
Place robots.txt in src/main/resources/public/ and configure static file serving via config.staticFiles.add("/public", Location.CLASSPATH). Javalin serves these files at the embedded Jetty layer — before the Javalin handler lifecycle. This means the before-handler never runs for static file requests, and AI crawlers can always fetch robots.txt to discover they are disallowed.
# src/main/resources/public/robots.txt
# Served by Jetty at the static-file layer — bypasses all Javalin handlers.
# AI crawlers can always fetch this file even when the bot blocker is active.
User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /Key points
- Throw to short-circuit: In a Javalin before-handler, throwing
ForbiddenResponse()(or anyHttpResponseExceptionsubclass) stops all remaining before-handlers and the route handler. Javalin catches the exception and renders the error response automatically. Do not return early — only throwing stops further processing. - Set headers before throwing: Call
ctx.header("X-Robots-Tag", "noai, noimageai")beforethrow ForbiddenResponse(). Headers must be set on the context before the exception propagates. - after() for pass-through injection: The after-handler runs after the route handler completes — it adds
X-Robots-Tagto all responses that were not blocked. Blocked requests get the header from the before-handler. No duplication occurs because blocked requests never reach the after-handler. - Static files bypass before-handlers: Files configured via
config.staticFiles.add()are served by embedded Jetty before Javalin's handler pipeline runs. The before-handler does not fire for static file requests — no path bypass is needed forrobots.txt. - Kotlin nullable safety:
ctx.header("User-Agent")returnsString?in Kotlin. Use?: ""to safely default to empty string — callinglowercase()on null would throw a NullPointerException. - Route-group scope: Before-handlers registered inside a
path()block only apply to routes within that block. This is cleaner than a path-string check in a global handler — the scope is enforced by the router, not by anifstatement.
Framework comparison — JVM web frameworks
| Framework | Middleware hook | Short-circuit | UA header |
|---|---|---|---|
| Javalin | app.before { ctx } | throw ForbiddenResponse() | ctx.header("User-Agent") |
| Spring Boot | HandlerInterceptor.preHandle | response.sendError(403); return false | request.getHeader("User-Agent") |
| Micronaut | @Filter ServerFilterChain | Flux.just(HttpResponse.status(403)) | request.headers.get("User-Agent") |
| Vert.x Web | router.route().handler() | ctx.response().setStatus(403).end() | ctx.request().getHeader("User-Agent") |
Javalin's exception-based short-circuit (throw ForbiddenResponse()) is distinct from Spring's return-false pattern and Vert.x's explicit response termination. The static-file bypass behaviour is also unique: Javalin (via embedded Jetty) serves static files before the Javalin handler pipeline, so no path check is needed for robots.txt. Spring Boot with ResourceHttpRequestHandler behaves similarly, but Micronaut and Vert.x require explicit path guards.
Dependencies
# build.gradle.kts
plugins {
kotlin("jvm") version "2.0.0"
application
}
dependencies {
implementation("io.javalin:javalin:6.4.0")
implementation("org.slf4j:slf4j-simple:2.0.13")
// Optional — for ctx.json() serialisation
implementation("com.fasterxml.jackson.module:jackson-module-kotlin:2.17.1")
}
application {
mainClass.set("com.example.AppKt")
}