How to Block AI Bots in SparkJava
SparkJava is a lightweight Sinatra-inspired Java framework built on Jetty. Bot blocking uses the before() filter — a lambda that runs before every route handler. request.headers("User-Agent") is case-insensitive (wraps Jetty's HttpServletRequest.getHeader() which is case-insensitive per the HTTP specification). halt(403, "Forbidden") throws a HaltException — Spark catches it and sends the response immediately; code after halt() is unreachable. A plain return passes the request through.
1. Bot detection
Pure Java, no dependencies. Stream.anyMatch() short-circuits on first match. String.contains() for literal substring matching. Null-safe: returns false for null or empty input.
// AiBotDetector.java — AI bot detection, no external dependencies
import java.util.List;
public class AiBotDetector {
private static final List<String> AI_BOT_PATTERNS = List.of(
"gptbot",
"chatgpt-user",
"claudebot",
"anthropic-ai",
"ccbot",
"google-extended",
"cohere-ai",
"meta-externalagent",
"bytespider",
"omgili",
"diffbot",
"imagesiftbot",
"magpie-crawler",
"amazonbot",
"dataprovider",
"netcraft"
);
/**
* Returns true if the User-Agent string matches a known AI crawler.
* String.contains() — literal substring match, no regex.
* Null-safe: returns false for null or empty input.
*
* @param userAgent the raw User-Agent header value (may be null)
* @return true if the request is from a known AI bot
*/
public static boolean isAiBot(String userAgent) {
if (userAgent == null || userAgent.isEmpty()) return false;
final String lower = userAgent.toLowerCase();
return AI_BOT_PATTERNS.stream().anyMatch(lower::contains);
}
}2. Global before() filter
Register the filter before any route definitions. request.headers("User-Agent") returns null when the header is absent — the isAiBot() helper handles this. Set response headers before calling halt() because the response is committed when HaltException is thrown.
// App.java — SparkJava with global before() bot-blocking filter
import static spark.Spark.*;
public class App {
public static void main(String[] args) {
port(8080);
// before(filter) — runs for EVERY request before any route handler.
// Registered before route definitions so it fires first.
before((request, response) -> {
// Allow robots.txt so bots can discover Disallow rules.
if ("/robots.txt".equals(request.pathInfo())) {
return; // plain return = pass through, do not block
}
// request.headers("User-Agent") — case-insensitive (wraps Jetty
// HttpServletRequest.getHeader). Returns null when absent.
String ua = request.headers("User-Agent");
if (AiBotDetector.isAiBot(ua)) {
// Set headers BEFORE halt() — response is committed on throw.
response.header("X-Robots-Tag", "noai, noimageai");
response.type("text/plain");
// halt(statusCode, body) throws HaltException immediately.
// Spark catches it and sends the response.
// Code after halt() is UNREACHABLE — no return needed.
halt(403, "Forbidden");
}
// Pass: inject X-Robots-Tag on the way through, then continue.
response.header("X-Robots-Tag", "noai, noimageai");
// Plain return — Spark continues to the route handler.
});
// robots.txt — reachable by all crawlers.
get("/robots.txt", (request, response) -> {
response.type("text/plain");
return """
User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
""";
});
get("/", (request, response) -> {
response.type("application/json");
return "{\"message\": \"Hello\"}";
});
get("/api/data", (request, response) -> {
response.type("application/json");
return "{\"data\": \"value\"}";
});
}
}3. How halt() works
halt() throws a HaltException — it does not return. Spark's filter runner catches this exception, writes the status and body, and skips all remaining filters and the route handler. This means any statement after halt() in the same lambda is dead code.
// halt() internals — what happens under the hood.
// halt(int status, String body) is equivalent to:
// throw new HaltException(status, body);
//
// Spark's filter execution loop catches HaltException and:
// 1. Sets the HTTP status code on the response.
// 2. Writes the body string to the response.
// 3. Commits the response (no further writes possible).
// 4. Skips all remaining filters and the route handler.
//
// Because halt() throws, the JVM unwinds the stack immediately.
// Any statement after halt() in the same lambda is unreachable:
before((request, response) -> {
if (AiBotDetector.isAiBot(request.headers("User-Agent"))) {
response.header("X-Robots-Tag", "noai, noimageai");
halt(403, "Forbidden");
// ← Everything below is dead code. The compiler may warn.
response.type("text/plain"); // NEVER executes
System.out.println("blocked"); // NEVER executes
}
});
// Contrast with plain return — pass through:
before((request, response) -> {
if ("/public".equals(request.pathInfo())) {
return; // exits the lambda, Spark continues to next filter/route
}
// ... bot check
});4. Path-scoped before(path, filter)
before(path, filter) restricts the filter to requests whose path matches the pattern. SparkJava supports * wildcard globs — "/api/*" matches /api/data, /api/users, and any other /api/ subpath.
// Path-scoped filter — protect /api/* only.
// before(path, filter) — path supports * wildcard globs.
// The global before() guard (above) remains for full-site protection;
// this shows the scoped variant independently.
before("/api/*", (request, response) -> {
String ua = request.headers("User-Agent");
if (AiBotDetector.isAiBot(ua)) {
response.header("X-Robots-Tag", "noai, noimageai");
halt(403, "Forbidden");
}
response.header("X-Robots-Tag", "noai, noimageai");
});
// Public routes — not covered by the /api/* filter.
get("/", (request, response) -> "public");
get("/blog", (request, response) -> "public blog");
// Protected routes — before("/api/*", ...) fires for these.
get("/api/data", (request, response) -> "protected");
get("/api/users", (request, response) -> "protected");Key points
request.headers()is case-insensitive: SparkJava delegates to Jetty'sHttpServletRequest.getHeader(), which is case-insensitive per RFC 7230."User-Agent","user-agent", and"USER-AGENT"all return the same value. Returnsnullwhen the header is absent — handle accordingly.halt()throws — code after it is unreachable:halt(statusCode, body)isthrow new HaltException(statusCode, body). The JVM unwinds the call stack immediately. You do not needreturnafterhalt(); any code there is dead and the compiler may warn.- Set headers before
halt(): Response headers must be set onresponsebefore callinghalt(). WhenHaltExceptionis caught, Spark commits the response — headers set afterhalt()would be unreachable anyway. - Plain
returnpasses through: Returning from thebefore()lambda without callinghalt()tells Spark to continue — next filters run, then the route handler. Usereturnfor the allow branch (e.g./robots.txtpass-through). - Register before routes:
before()filters fire in registration order. Register the bot-blocking filter as the first call inmain(), before anyget()/post()route definitions, to ensure it runs first. - Thread model: synchronous Jetty threads: SparkJava uses a Jetty thread pool — one thread per request, no async/await, no reactive. The
before()lambda runs synchronously on the request thread. NoCompletableFutureorFuturehandling needed. - SparkJava vs Javalin: Javalin (written by the same original author) is the recommended modern successor to SparkJava. Javalin uses
app.before()with aContextobject,ctx.status(403).result()to block (nohalt()), and has first-class Kotlin and async support. Migration is straightforward if you outgrow SparkJava.
Framework comparison — Java web frameworks
| Framework | Filter registration | Block request | Header access |
|---|---|---|---|
| SparkJava | before((req, res) -> …) | halt(403, "Forbidden") (throws) | req.headers("User-Agent") case-insensitive |
| Javalin | app.before(ctx -> …) | ctx.status(403).result("Forbidden") + skip | ctx.header("User-Agent") case-insensitive |
| Spring Boot | OncePerRequestFilter bean | response.sendError(403) or response.setStatus(403) | request.getHeader("User-Agent") case-insensitive |
| Quarkus | @ServerRequestFilter or Vert.x router.route().handler() | requestContext.abortWith(Response.status(403).build()) | headers.getHeaderString("User-Agent") case-insensitive |
SparkJava's halt() is unique among these frameworks — it uses a thrown exception to abort the filter chain rather than a return value or context flag. This makes the block absolute: no downstream code can accidentally run after a halt() call. Javalin (SparkJava's spiritual successor) uses a return-value approach instead, which can be more testable. Spring Boot and Quarkus use the Servlet/JAX-RS filter model, both of which are request-scoped with explicit chain.doFilter() / abortWith() semantics.