Skip to content

How to Block AI Bots in Javalin

Javalin is a lightweight JVM web framework for Kotlin and Java, popular for REST APIs, microservices, and developer tooling. It runs on embedded Jetty with a minimal configuration surface. Javalin provides app.before() handlers that run before every matched route — the idiomatic way to block AI crawlers. The Javalin-specific detail: short-circuiting in a before-handler means throwing an exception, typically ForbiddenResponse(), which Javalin catches, renders as a 403, and uses to stop all remaining before-handlers and the route handler for that request.

1. Bot pattern helper

Define patterns in a Kotlin object so they are shared by the before-handler and any tests. Apply ua.lowercase() once before calling contains() — literal substring search, no regex engine.

// src/main/kotlin/com/example/AiBotFilter.kt
package com.example

object AiBotFilter {

    // Pattern list — all lowercase, matched against lowercased UA
    private val AI_BOT_PATTERNS = listOf(
        "gptbot",
        "chatgpt-user",
        "claudebot",
        "anthropic-ai",
        "ccbot",
        "google-extended",
        "cohere-ai",
        "meta-externalagent",
        "bytespider",
        "omgili",
        "diffbot",
        "imagesiftbot",
        "magpie-crawler",
        "amazonbot",
        "dataprovider",
        "netcraft"
    )

    // contains() — literal substring, no regex overhead
    fun isAiBot(ua: String): Boolean {
        if (ua.isBlank()) return false
        val lower = ua.lowercase()
        return AI_BOT_PATTERNS.any { lower.contains(it) }
    }
}

2. app.before() — global blocker

Register a global before-handler and an after-handler. ForbiddenResponse is a subclass of HttpResponseException — throwing it stops all further processing. The after handler adds X-Robots-Tag to every passing response. Blocked requests get the header set directly on ctx before throwing.

// src/main/kotlin/com/example/App.kt
package com.example

import io.javalin.Javalin
import io.javalin.http.ForbiddenResponse
import io.javalin.http.staticfiles.Location

fun main() {
    val app = Javalin.create { config ->
        // Static files served at Jetty layer — before Javalin handlers.
        // public/robots.txt is served here, bypassing before-handlers entirely.
        config.staticFiles.add("/public", Location.CLASSPATH)
    }

    // ── Global AI-bot blocker ─────────────────────────────────────────────────
    // app.before runs before every matching route handler.
    // Static files (robots.txt) never reach this handler.
    app.before { ctx ->
        val ua = ctx.header("User-Agent") ?: ""
        if (AiBotFilter.isAiBot(ua)) {
            ctx.header("X-Robots-Tag", "noai, noimageai")
            // ForbiddenResponse is HttpResponseException(403).
            // Javalin catches it, renders the error response, and
            // stops all remaining before-handlers and the route handler.
            throw ForbiddenResponse("Forbidden")
        }
    }

    // ── Add X-Robots-Tag to all passing responses ─────────────────────────────
    app.after { ctx ->
        ctx.header("X-Robots-Tag", "noai, noimageai")
    }

    // ── Routes ────────────────────────────────────────────────────────────────
    app.get("/") { ctx -> ctx.result("Hello") }
    app.get("/api/data") { ctx -> ctx.json(mapOf("data" to "value")) }

    app.start(8080)
}

3. Route-group scoping — protect /api/* only

Use Javalin's router DSL to scope a before-handler to a path() block. Handlers defined inside the block inherit the before-handler; routes outside it are unaffected. This pattern leaves public endpoints — health checks, webhooks, the root — open while protecting API routes.

// Route-group scoped blocker — protect /api/* only
import io.javalin.Javalin
import io.javalin.apibuilder.ApiBuilder.*
import io.javalin.http.ForbiddenResponse
import io.javalin.http.staticfiles.Location

fun main() {
    val app = Javalin.create { config ->
        config.staticFiles.add("/public", Location.CLASSPATH)
        config.router.apiBuilder {

            // Public — no blocker
            get("/") { ctx -> ctx.result("Hello") }
            get("/health") { ctx -> ctx.result("ok") }

            // Protected — before-handler scoped to /api/*
            path("/api") {
                before { ctx ->
                    val ua = ctx.header("User-Agent") ?: ""
                    if (AiBotFilter.isAiBot(ua)) {
                        ctx.header("X-Robots-Tag", "noai, noimageai")
                        throw ForbiddenResponse("Forbidden")
                    }
                    ctx.header("X-Robots-Tag", "noai, noimageai")
                }
                get("/data") { ctx -> ctx.json(mapOf("data" to "value")) }
                post("/submit") { ctx -> ctx.status(201) }
            }
        }
    }
    app.start(8080)
}

4. Java variant — identical pattern

The same approach in Java. ua.toLowerCase() instead of ua.lowercase(), and ua == null check instead of the Elvis operator. List.of() requires Java 9+.

// Java variant — identical logic, Javalin 6.x
import io.javalin.Javalin;
import io.javalin.http.ForbiddenResponse;
import io.javalin.http.staticfiles.Location;
import java.util.List;

public class App {

    private static final List<String> AI_BOT_PATTERNS = List.of(
        "gptbot", "chatgpt-user", "claudebot", "anthropic-ai",
        "ccbot", "google-extended", "cohere-ai", "meta-externalagent",
        "bytespider", "omgili", "diffbot", "imagesiftbot",
        "magpie-crawler", "amazonbot", "dataprovider", "netcraft"
    );

    private static boolean isAiBot(String ua) {
        if (ua == null || ua.isBlank()) return false;
        String lower = ua.toLowerCase();
        return AI_BOT_PATTERNS.stream().anyMatch(lower::contains);
    }

    public static void main(String[] args) {
        var app = Javalin.create(config ->
            config.staticFiles.add("/public", Location.CLASSPATH)
        );

        app.before(ctx -> {
            String ua = ctx.header("User-Agent");
            if (isAiBot(ua)) {
                ctx.header("X-Robots-Tag", "noai, noimageai");
                throw new ForbiddenResponse("Forbidden");
            }
        });

        app.after(ctx ->
            ctx.header("X-Robots-Tag", "noai, noimageai")
        );

        app.get("/", ctx -> ctx.result("Hello"));
        app.start(8080);
    }
}

5. robots.txt

Place robots.txt in src/main/resources/public/ and configure static file serving via config.staticFiles.add("/public", Location.CLASSPATH). Javalin serves these files at the embedded Jetty layer — before the Javalin handler lifecycle. This means the before-handler never runs for static file requests, and AI crawlers can always fetch robots.txt to discover they are disallowed.

# src/main/resources/public/robots.txt
# Served by Jetty at the static-file layer — bypasses all Javalin handlers.
# AI crawlers can always fetch this file even when the bot blocker is active.

User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Key points

Framework comparison — JVM web frameworks

FrameworkMiddleware hookShort-circuitUA header
Javalinapp.before { ctx }throw ForbiddenResponse()ctx.header("User-Agent")
Spring BootHandlerInterceptor.preHandleresponse.sendError(403); return falserequest.getHeader("User-Agent")
Micronaut@Filter ServerFilterChainFlux.just(HttpResponse.status(403))request.headers.get("User-Agent")
Vert.x Webrouter.route().handler()ctx.response().setStatus(403).end()ctx.request().getHeader("User-Agent")

Javalin's exception-based short-circuit (throw ForbiddenResponse()) is distinct from Spring's return-false pattern and Vert.x's explicit response termination. The static-file bypass behaviour is also unique: Javalin (via embedded Jetty) serves static files before the Javalin handler pipeline, so no path check is needed for robots.txt. Spring Boot with ResourceHttpRequestHandler behaves similarly, but Micronaut and Vert.x require explicit path guards.

Dependencies

# build.gradle.kts
plugins {
    kotlin("jvm") version "2.0.0"
    application
}

dependencies {
    implementation("io.javalin:javalin:6.4.0")
    implementation("org.slf4j:slf4j-simple:2.0.13")

    // Optional — for ctx.json() serialisation
    implementation("com.fasterxml.jackson.module:jackson-module-kotlin:2.17.1")
}

application {
    mainClass.set("com.example.AppKt")
}