Skip to content

How to Block AI Bots in Java Helidon SE

Helidon is Oracle's Java microservices framework in two flavours: SE (functional routing, no DI) and MP (MicroProfile — CDI, JAX-RS, annotations). Helidon 4.x (Níma) runs on Project Loom virtual threads (Java 21+) — blocking code in handlers is safe without coroutines or reactive chains. Bot blocking in SE uses a routing-level handler registered with .any() — it fires before route-specific handlers. res.next() passes through to the next handler; res.send() completes the request. Never call both — that throws IllegalStateException.

1. Bot detection

Pure Java, no dependencies. String.contains() for literal substring matching — no regex. Stream.anyMatch() short-circuits on first match.

// BotUtils.java — AI bot detection, no external dependencies
package com.example.botblocker;

import java.util.List;

public final class BotUtils {

    private static final List<String> AI_BOT_PATTERNS = List.of(
        "gptbot",
        "chatgpt-user",
        "claudebot",
        "anthropic-ai",
        "ccbot",
        "google-extended",
        "cohere-ai",
        "meta-externalagent",
        "bytespider",
        "omgili",
        "diffbot",
        "imagesiftbot",
        "magpie-crawler",
        "amazonbot",
        "dataprovider",
        "netcraft"
    );

    private BotUtils() {}

    /**
     * Returns true if the User-Agent string matches a known AI crawler.
     * String.contains() — literal substring match, no regex.
     */
    public static boolean isAiBot(String ua) {
        if (ua == null || ua.isBlank()) return false;
        String lower = ua.toLowerCase();
        // anyMatch() short-circuits on first match
        return AI_BOT_PATTERNS.stream().anyMatch(lower::contains);
    }
}

2. Routing filter — Handler with res.next() / res.send()

The Handler interface is void handle(ServerRequest, ServerResponse). Call res.next() to continue routing or res.send() to terminate — these are mutually exclusive. req.headers().first(HeaderNames.USER_AGENT) returns Optional<String>.

// AiBotFilter.java — Helidon SE routing filter
package com.example.botblocker;

import io.helidon.http.HeaderNames;
import io.helidon.http.Status;
import io.helidon.webserver.http.Handler;
import io.helidon.webserver.http.ServerRequest;
import io.helidon.webserver.http.ServerResponse;

public class AiBotFilter implements Handler {

    @Override
    public void handle(ServerRequest req, ServerResponse res) {
        // Path guard: pass robots.txt through — bots read it for Disallow rules.
        String path = req.path().rawPath();
        if ("/robots.txt".equals(path)) {
            res.next();   // continue to next handler — do NOT send here
            return;
        }

        // req.headers().first() returns Optional<String> — case-insensitive lookup.
        // HeaderNames.USER_AGENT is the standard Helidon constant for "User-Agent".
        String ua = req.headers()
                       .first(HeaderNames.USER_AGENT)
                       .orElse("");

        if (BotUtils.isAiBot(ua)) {
            // Block: set status, add headers, then send — this completes the request.
            // Do NOT call res.next() after res.send() — IllegalStateException.
            res.status(Status.FORBIDDEN_403)
               .header("X-Robots-Tag", "noai, noimageai")
               .header("Content-Type", "text/plain")
               .send("Forbidden");
        } else {
            // Pass: add X-Robots-Tag to outgoing response headers, then delegate.
            // res.next() passes to the next registered handler in the routing chain.
            // Do NOT call res.send() after res.next().
            res.header("X-Robots-Tag", "noai, noimageai");
            res.next();
        }
    }
}

3. Main.java — WebServer with global .any() filter

Register .any(new AiBotFilter()) first in the routing builder — Helidon matches handlers in registration order. Route-specific handlers (.get()) are only reached when the filter calls res.next().

// Main.java — Helidon SE WebServer with routing filter (Helidon 4.x / Níma)
package com.example.botblocker;

import io.helidon.http.Status;
import io.helidon.webserver.WebServer;
import io.helidon.webserver.http.HttpRouting;

public class Main {

    public static void main(String[] args) {
        // Build the routing table.
        // .any() fires for ALL methods and ALL paths — used as a global filter.
        // .any() registered first runs first, before more-specific routes.
        HttpRouting.Builder routing = HttpRouting.builder()
            // Global filter — runs before any route-specific handler
            .any(new AiBotFilter())
            // Route handlers — only reached if AiBotFilter calls res.next()
            .get("/robots.txt", (req, res) -> {
                res.status(Status.OK_200)
                   .header("Content-Type", "text/plain")
                   .send("""
                       User-agent: *
                       Allow: /

                       User-agent: GPTBot
                       Disallow: /

                       User-agent: ClaudeBot
                       Disallow: /

                       User-agent: CCBot
                       Disallow: /

                       User-agent: Google-Extended
                       Disallow: /
                       """);
            })
            .get("/", (req, res) -> {
                res.status(Status.OK_200)
                   .header("Content-Type", "application/json")
                   .send("{\"message\":\"Hello\"}");
            })
            .get("/api/data", (req, res) -> {
                res.status(Status.OK_200)
                   .header("Content-Type", "application/json")
                   .send("{\"data\":\"value\"}");
            });

        // Start the server — Helidon 4.x uses virtual threads (Java 21 / Loom).
        // Each request runs on a virtual thread — blocking code in handlers is fine.
        WebServer server = WebServer.builder()
            .port(8080)
            .routing(routing)
            .build()
            .start();

        System.out.println("Server started on port " + server.port());
        server.stopOnShutdown();
    }
}

4. Scoped filter — .register() for path prefix

.register("/api", r -> r.any(filter)...) scopes the bot filter to /api/** only. Routes outside the registered prefix bypass the filter entirely.

// Scoped filter — protect /api routes only using path prefix routing.
// Helidon SE supports nested routing via .register() for path-scoped rules.

import io.helidon.webserver.http.HttpRouting;

HttpRouting.Builder routing = HttpRouting.builder()
    // Unprotected: health check and robots.txt bypass the bot filter
    .get("/health", (req, res) -> res.send("ok"))
    .get("/robots.txt", robotsHandler)

    // Protected: register a sub-routing scope at /api
    // AiBotFilter.any() fires for all /api/* requests
    .register("/api", r -> r
        .any(new AiBotFilter())           // only /api/** goes through filter
        .get("/data", dataHandler)
        .get("/status", statusHandler)
    );

5. Helidon MP — @PreMatching ContainerRequestFilter

For Helidon MP applications (CDI + JAX-RS), use the standard JAX-RS filter. @PreMatching is required — without it, requests to paths that would 404 bypass the filter entirely. ctx.getHeaderString() returns null (not "") when absent. This pattern is identical to Quarkus and Micronaut JAX-RS.

// Helidon MP (MicroProfile) variant — JAX-RS ContainerRequestFilter.
// Use this when your app is built on Helidon MP with CDI and JAX-RS annotations.
// Identical pattern to Quarkus, Micronaut JAX-RS, or standard JAX-RS.

package com.example.botblocker;

import jakarta.annotation.Priority;
import jakarta.ws.rs.Priorities;
import jakarta.ws.rs.container.ContainerRequestContext;
import jakarta.ws.rs.container.ContainerRequestFilter;
import jakarta.ws.rs.container.PreMatching;
import jakarta.ws.rs.core.Response;
import jakarta.ws.rs.ext.Provider;

@Provider
@PreMatching  // runs before route matching — fires for all paths including 404s
@Priority(Priorities.AUTHENTICATION - 100)  // run early in the filter chain
public class AiBotMpFilter implements ContainerRequestFilter {

    @Override
    public void filter(ContainerRequestContext ctx) {
        // Path guard: /robots.txt must be accessible to bots
        String path = ctx.getUriInfo().getPath();
        if ("robots.txt".equals(path) || "/robots.txt".equals(path)) {
            return;  // return without aborting — request continues
        }

        // ctx.getHeaderString() returns null when header is absent (not "")
        String ua = ctx.getHeaderString("User-Agent");
        if (ua == null) ua = "";

        if (BotUtils.isAiBot(ua)) {
            // ctx.abortWith() terminates the request — no handler runs after this.
            // @PreMatching required: without it, 404 paths bypass the filter.
            ctx.abortWith(
                Response.status(403)
                    .header("X-Robots-Tag", "noai, noimageai")
                    .header("Content-Type", "text/plain")
                    .entity("Forbidden")
                    .build()
            );
        }
        // No else needed — return without aborting to pass through
    }
}

6. pom.xml

<!-- pom.xml — Helidon SE 4.x dependencies -->
<project>
  <parent>
    <groupId>io.helidon.applications</groupId>
    <artifactId>helidon-se</artifactId>
    <version>4.1.4</version>
    <relativePath/>
  </parent>

  <dependencies>
    <!-- Core SE web server — WebServer, HttpRouting, Handler -->
    <dependency>
      <groupId>io.helidon.webserver</groupId>
      <artifactId>helidon-webserver</artifactId>
    </dependency>

    <!-- Media support — JSON, text/plain response bodies -->
    <dependency>
      <groupId>io.helidon.http.media</groupId>
      <artifactId>helidon-http-media-jsonp</artifactId>
    </dependency>

    <!-- Optional: Health checks, metrics -->
    <dependency>
      <groupId>io.helidon.webserver.observe</groupId>
      <artifactId>helidon-webserver-observe-health</artifactId>
      <scope>runtime</scope>
    </dependency>
  </dependencies>
</project>

<!-- Run: mvn package && java -jar target/app.jar
     Requires Java 21+ (virtual threads, Loom)
     Helidon MP: use parent helidon-mp instead of helidon-se -->

Key points

Framework comparison — Java microservices frameworks

FrameworkFilter mechanismBlockUA header
Helidon SE.any(Handler) routing filterres.status(FORBIDDEN_403).send("Forbidden")req.headers().first(HeaderNames.USER_AGENT)
Helidon MP@PreMatching ContainerRequestFilterctx.abortWith(Response.status(403).build())ctx.getHeaderString("User-Agent")
Quarkus@PreMatching ContainerRequestFilterctx.abortWith(Response.status(403).build())ctx.getHeaderString("User-Agent")
Dropwizard@PreMatching ContainerRequestFilterctx.abortWith(...); must returnctx.getHeaderString("User-Agent")

Helidon SE's functional .any(Handler) approach is the most explicit — routing is code, not annotations, and the handler lifecycle (res.next() vs res.send()) is unambiguous. Helidon MP, Quarkus, and Dropwizard all use the same JAX-RS ContainerRequestFilter pattern with ctx.abortWith() — the key difference is Dropwizard requires an explicit return after abortWith() (it does not throw), while Helidon MP and Quarkus also require return for clarity.