How to Block AI Bots in Java Helidon SE
Helidon is Oracle's Java microservices framework in two flavours: SE (functional routing, no DI) and MP (MicroProfile — CDI, JAX-RS, annotations). Helidon 4.x (Níma) runs on Project Loom virtual threads (Java 21+) — blocking code in handlers is safe without coroutines or reactive chains. Bot blocking in SE uses a routing-level handler registered with .any() — it fires before route-specific handlers. res.next() passes through to the next handler; res.send() completes the request. Never call both — that throws IllegalStateException.
1. Bot detection
Pure Java, no dependencies. String.contains() for literal substring matching — no regex. Stream.anyMatch() short-circuits on first match.
// BotUtils.java — AI bot detection, no external dependencies
package com.example.botblocker;
import java.util.List;
public final class BotUtils {
private static final List<String> AI_BOT_PATTERNS = List.of(
"gptbot",
"chatgpt-user",
"claudebot",
"anthropic-ai",
"ccbot",
"google-extended",
"cohere-ai",
"meta-externalagent",
"bytespider",
"omgili",
"diffbot",
"imagesiftbot",
"magpie-crawler",
"amazonbot",
"dataprovider",
"netcraft"
);
private BotUtils() {}
/**
* Returns true if the User-Agent string matches a known AI crawler.
* String.contains() — literal substring match, no regex.
*/
public static boolean isAiBot(String ua) {
if (ua == null || ua.isBlank()) return false;
String lower = ua.toLowerCase();
// anyMatch() short-circuits on first match
return AI_BOT_PATTERNS.stream().anyMatch(lower::contains);
}
}2. Routing filter — Handler with res.next() / res.send()
The Handler interface is void handle(ServerRequest, ServerResponse). Call res.next() to continue routing or res.send() to terminate — these are mutually exclusive. req.headers().first(HeaderNames.USER_AGENT) returns Optional<String>.
// AiBotFilter.java — Helidon SE routing filter
package com.example.botblocker;
import io.helidon.http.HeaderNames;
import io.helidon.http.Status;
import io.helidon.webserver.http.Handler;
import io.helidon.webserver.http.ServerRequest;
import io.helidon.webserver.http.ServerResponse;
public class AiBotFilter implements Handler {
@Override
public void handle(ServerRequest req, ServerResponse res) {
// Path guard: pass robots.txt through — bots read it for Disallow rules.
String path = req.path().rawPath();
if ("/robots.txt".equals(path)) {
res.next(); // continue to next handler — do NOT send here
return;
}
// req.headers().first() returns Optional<String> — case-insensitive lookup.
// HeaderNames.USER_AGENT is the standard Helidon constant for "User-Agent".
String ua = req.headers()
.first(HeaderNames.USER_AGENT)
.orElse("");
if (BotUtils.isAiBot(ua)) {
// Block: set status, add headers, then send — this completes the request.
// Do NOT call res.next() after res.send() — IllegalStateException.
res.status(Status.FORBIDDEN_403)
.header("X-Robots-Tag", "noai, noimageai")
.header("Content-Type", "text/plain")
.send("Forbidden");
} else {
// Pass: add X-Robots-Tag to outgoing response headers, then delegate.
// res.next() passes to the next registered handler in the routing chain.
// Do NOT call res.send() after res.next().
res.header("X-Robots-Tag", "noai, noimageai");
res.next();
}
}
}3. Main.java — WebServer with global .any() filter
Register .any(new AiBotFilter()) first in the routing builder — Helidon matches handlers in registration order. Route-specific handlers (.get()) are only reached when the filter calls res.next().
// Main.java — Helidon SE WebServer with routing filter (Helidon 4.x / Níma)
package com.example.botblocker;
import io.helidon.http.Status;
import io.helidon.webserver.WebServer;
import io.helidon.webserver.http.HttpRouting;
public class Main {
public static void main(String[] args) {
// Build the routing table.
// .any() fires for ALL methods and ALL paths — used as a global filter.
// .any() registered first runs first, before more-specific routes.
HttpRouting.Builder routing = HttpRouting.builder()
// Global filter — runs before any route-specific handler
.any(new AiBotFilter())
// Route handlers — only reached if AiBotFilter calls res.next()
.get("/robots.txt", (req, res) -> {
res.status(Status.OK_200)
.header("Content-Type", "text/plain")
.send("""
User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
""");
})
.get("/", (req, res) -> {
res.status(Status.OK_200)
.header("Content-Type", "application/json")
.send("{\"message\":\"Hello\"}");
})
.get("/api/data", (req, res) -> {
res.status(Status.OK_200)
.header("Content-Type", "application/json")
.send("{\"data\":\"value\"}");
});
// Start the server — Helidon 4.x uses virtual threads (Java 21 / Loom).
// Each request runs on a virtual thread — blocking code in handlers is fine.
WebServer server = WebServer.builder()
.port(8080)
.routing(routing)
.build()
.start();
System.out.println("Server started on port " + server.port());
server.stopOnShutdown();
}
}4. Scoped filter — .register() for path prefix
.register("/api", r -> r.any(filter)...) scopes the bot filter to /api/** only. Routes outside the registered prefix bypass the filter entirely.
// Scoped filter — protect /api routes only using path prefix routing.
// Helidon SE supports nested routing via .register() for path-scoped rules.
import io.helidon.webserver.http.HttpRouting;
HttpRouting.Builder routing = HttpRouting.builder()
// Unprotected: health check and robots.txt bypass the bot filter
.get("/health", (req, res) -> res.send("ok"))
.get("/robots.txt", robotsHandler)
// Protected: register a sub-routing scope at /api
// AiBotFilter.any() fires for all /api/* requests
.register("/api", r -> r
.any(new AiBotFilter()) // only /api/** goes through filter
.get("/data", dataHandler)
.get("/status", statusHandler)
);5. Helidon MP — @PreMatching ContainerRequestFilter
For Helidon MP applications (CDI + JAX-RS), use the standard JAX-RS filter. @PreMatching is required — without it, requests to paths that would 404 bypass the filter entirely. ctx.getHeaderString() returns null (not "") when absent. This pattern is identical to Quarkus and Micronaut JAX-RS.
// Helidon MP (MicroProfile) variant — JAX-RS ContainerRequestFilter.
// Use this when your app is built on Helidon MP with CDI and JAX-RS annotations.
// Identical pattern to Quarkus, Micronaut JAX-RS, or standard JAX-RS.
package com.example.botblocker;
import jakarta.annotation.Priority;
import jakarta.ws.rs.Priorities;
import jakarta.ws.rs.container.ContainerRequestContext;
import jakarta.ws.rs.container.ContainerRequestFilter;
import jakarta.ws.rs.container.PreMatching;
import jakarta.ws.rs.core.Response;
import jakarta.ws.rs.ext.Provider;
@Provider
@PreMatching // runs before route matching — fires for all paths including 404s
@Priority(Priorities.AUTHENTICATION - 100) // run early in the filter chain
public class AiBotMpFilter implements ContainerRequestFilter {
@Override
public void filter(ContainerRequestContext ctx) {
// Path guard: /robots.txt must be accessible to bots
String path = ctx.getUriInfo().getPath();
if ("robots.txt".equals(path) || "/robots.txt".equals(path)) {
return; // return without aborting — request continues
}
// ctx.getHeaderString() returns null when header is absent (not "")
String ua = ctx.getHeaderString("User-Agent");
if (ua == null) ua = "";
if (BotUtils.isAiBot(ua)) {
// ctx.abortWith() terminates the request — no handler runs after this.
// @PreMatching required: without it, 404 paths bypass the filter.
ctx.abortWith(
Response.status(403)
.header("X-Robots-Tag", "noai, noimageai")
.header("Content-Type", "text/plain")
.entity("Forbidden")
.build()
);
}
// No else needed — return without aborting to pass through
}
}6. pom.xml
<!-- pom.xml — Helidon SE 4.x dependencies -->
<project>
<parent>
<groupId>io.helidon.applications</groupId>
<artifactId>helidon-se</artifactId>
<version>4.1.4</version>
<relativePath/>
</parent>
<dependencies>
<!-- Core SE web server — WebServer, HttpRouting, Handler -->
<dependency>
<groupId>io.helidon.webserver</groupId>
<artifactId>helidon-webserver</artifactId>
</dependency>
<!-- Media support — JSON, text/plain response bodies -->
<dependency>
<groupId>io.helidon.http.media</groupId>
<artifactId>helidon-http-media-jsonp</artifactId>
</dependency>
<!-- Optional: Health checks, metrics -->
<dependency>
<groupId>io.helidon.webserver.observe</groupId>
<artifactId>helidon-webserver-observe-health</artifactId>
<scope>runtime</scope>
</dependency>
</dependencies>
</project>
<!-- Run: mvn package && java -jar target/app.jar
Requires Java 21+ (virtual threads, Loom)
Helidon MP: use parent helidon-mp instead of helidon-se -->Key points
res.next()andres.send()are mutually exclusive:res.send()writes the response and completes the request.res.next()delegates to the next handler. Calling both throwsIllegalStateExceptionat runtime. Alwaysreturnimmediately after either call.- Register
.any(filter)before route handlers: Helidon evaluates handlers in registration order. If a route-specific.get("/", handler)is registered before.any(filter), the filter will not run for that path. req.headers().first()returnsOptional<String>: Use.orElse("")to provide a safe default. TheHeaderNames.USER_AGENTconstant is inio.helidon.http— it encodes the canonicalUser-Agentheader name with case-insensitive lookup.- Helidon 4.x uses virtual threads (Java 21+): Each request runs on a Project Loom virtual thread. Unlike reactive frameworks (Vert.x, WebFlux), you can call blocking code — database queries, synchronous HTTP clients — directly in handlers without callbacks or reactive operators. The bot detection function is synchronous and safe on virtual threads.
@PreMatchingis required in Helidon MP: Without@PreMatching, JAX-RS only runs the filter for paths that match a registered resource. Requests to unknown paths (404 responses) bypass unmatched filters — bots can freely access any unregistered URL. With@PreMatching, the filter runs before route resolution for all requests.ctx.getHeaderString()returnsnull, not"": In Helidon MP (JAX-RS), a missing header returnsnullfromgetHeaderString(). Always null-check or useObjects.toString(ua, "")before callingtoLowerCase()— a NPE here would crash the request.
Framework comparison — Java microservices frameworks
| Framework | Filter mechanism | Block | UA header |
|---|---|---|---|
| Helidon SE | .any(Handler) routing filter | res.status(FORBIDDEN_403).send("Forbidden") | req.headers().first(HeaderNames.USER_AGENT) |
| Helidon MP | @PreMatching ContainerRequestFilter | ctx.abortWith(Response.status(403).build()) | ctx.getHeaderString("User-Agent") |
| Quarkus | @PreMatching ContainerRequestFilter | ctx.abortWith(Response.status(403).build()) | ctx.getHeaderString("User-Agent") |
| Dropwizard | @PreMatching ContainerRequestFilter | ctx.abortWith(...); must return | ctx.getHeaderString("User-Agent") |
Helidon SE's functional .any(Handler) approach is the most explicit — routing is code, not annotations, and the handler lifecycle (res.next() vs res.send()) is unambiguous. Helidon MP, Quarkus, and Dropwizard all use the same JAX-RS ContainerRequestFilter pattern with ctx.abortWith() — the key difference is Dropwizard requires an explicit return after abortWith() (it does not throw), while Helidon MP and Quarkus also require return for clarity.