Skip to content
Guides/Java Servlet

How to Block AI Bots with Java Servlet: Complete 2026 Guide

The Java Servlet API is the foundational HTTP layer beneath Spring Boot, Spring MVC, Jakarta EE, Tomcat, Jetty, and WildFly. Bot blocking at this level uses a Filter — a class that intercepts every request before your servlets or controllers execute. The same filter works on any servlet container, regardless of framework.

javax.servlet vs jakarta.servlet

Jakarta EE 9 (2020) renamed the core packages from javax.servlet to jakarta.servlet. Tomcat 9 and earlier use javax.servlet. Tomcat 10+ use jakarta.servlet. The API is identical — only the import path differs. Deploying a WAR built against the wrong namespace causes ClassNotFoundException at startup.

javax.servlet.* — Tomcat 9, JBoss EAP 7, Jetty 9–11jakarta.servlet.* — Tomcat 10+, WildFly 23+, Payara 6+

Four protection layers

1
robots.txtPlace in src/main/webapp/ — servlet container serves it before any filter runs
2
noai meta tag<meta name="robots" content="noai, noimageai" /> in your JSP base layout fragment
3
X-Robots-Tag headerresponse.setHeader() after chain.doFilter() in your Filter
4
Hard 403 blockAI_BOTS regex check in Filter, then response.sendError(403) and return — no chain.doFilter()

Layer 1: robots.txt

Place robots.txt in your web application root. Servlet containers serve static files from this directory before any filter or servlet executes — no controller or configuration needed.

WAR / Maven project

In a standard Maven WAR project, the web root is src/main/webapp/. Files here are copied to the root of the WAR and served directly by the container at /robots.txt.

# src/main/webapp/robots.txt

User-agent: *
Allow: /

User-agent: GPTBot
User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: Google-Extended
User-agent: CCBot
User-agent: Bytespider
User-agent: Applebot-Extended
User-agent: PerplexityBot
User-agent: Diffbot
User-agent: cohere-ai
User-agent: FacebookBot
User-agent: omgili
User-agent: omgilibot
User-agent: Amazonbot
Disallow: /

Spring Boot (embedded Tomcat)

Spring Boot serves static files from src/main/resources/static/. Place robots.txt there — it becomes available at /robots.txt with no additional configuration.

Layer 2: noai meta tag

Add <meta name="robots" content="noai, noimageai" /> to your base JSP layout. This signals to compliant AI crawlers not to use your content for training.

Base layout fragment (JSP)

<%-- WEB-INF/layout/header.jsp --%>
<%@ taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core" %>
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title><c:out value="${pageTitle}" default="My App" /></title>

  <%-- AI bot training opt-out. Per-page override: set request attribute "robots" --%>
  <meta name="robots"
        content="${not empty robots ? robots : 'noai, noimageai'}" />
</head>

Per-page override

Set a robots request attribute in your servlet before forwarding to the JSP. The JSTL expression reads it and falls back to noai, noimageai when not set.

// Pages that should be indexed normally (e.g. a public about page):
request.setAttribute("robots", "index, follow");
request.getRequestDispatcher("/WEB-INF/views/about.jsp")
       .forward(request, response);

Include the layout in each JSP

<%-- WEB-INF/views/index.jsp --%>
<%@ include file="/WEB-INF/layout/header.jsp" %>
<body>
  <h1>Welcome</h1>
</body>
</html>

Layers 3 & 4: Servlet Filter

A single Filter handles both the X-Robots-Tag header and the hard 403 block. Filters intercept every request before any servlet or controller executes — the correct layer for bot blocking.

AiBotFilter — Jakarta EE 9+ (jakarta.servlet)

Use this for Tomcat 10+, WildFly 23+, Payara 6+, GlassFish 6+. For Tomcat 9 and earlier, replace every jakarta.servlet import with javax.servlet.

// src/main/java/com/example/filter/AiBotFilter.java
package com.example.filter;

import jakarta.servlet.Filter;
import jakarta.servlet.FilterChain;
import jakarta.servlet.FilterConfig;
import jakarta.servlet.ServletException;
import jakarta.servlet.ServletRequest;
import jakarta.servlet.ServletResponse;
import jakarta.servlet.annotation.WebFilter;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import java.io.IOException;
import java.util.Set;
import java.util.regex.Pattern;

@WebFilter("/*")
public class AiBotFilter implements Filter {

    private static final Pattern AI_BOTS = Pattern.compile(
        "(?i)gptbot|claudebot|anthropic-ai|google-extended|ccbot" +
        "|bytespider|applebot-extended|perplexitybot|diffbot" +
        "|cohere-ai|facebookbot|omgili|omgilibot|amazonbot" +
        "|deepseekbot|mistralbot|xai-bot|ai2-bot"
    );

    // These paths must remain accessible — crawlers need robots.txt
    private static final Set<String> EXEMPT_PATHS = Set.of(
        "/robots.txt", "/sitemap.xml", "/favicon.ico"
    );

    @Override
    public void init(FilterConfig config) {}

    @Override
    public void doFilter(
        ServletRequest req, ServletResponse res, FilterChain chain
    ) throws IOException, ServletException {

        HttpServletRequest  request  = (HttpServletRequest) req;
        HttpServletResponse response = (HttpServletResponse) res;

        String path = request.getRequestURI()
                             .substring(request.getContextPath().length());

        // Always allow exempt paths — never block robots.txt
        if (EXEMPT_PATHS.contains(path)) {
            chain.doFilter(req, res);
            return;
        }

        String ua = request.getHeader("User-Agent");

        // Layer 4: Hard 403 — block before any servlet runs
        if (ua != null && AI_BOTS.matcher(ua).find()) {
            response.sendError(HttpServletResponse.SC_FORBIDDEN, "Forbidden");
            return;   // Do NOT call chain.doFilter() after blocking
        }

        // Pass legitimate requests through
        chain.doFilter(req, res);

        // Layer 3: X-Robots-Tag — applied to all served responses
        response.setHeader("X-Robots-Tag", "noai, noimageai");
    }

    @Override
    public void destroy() {}
}

Key points

  • @WebFilter("/*") — registers this filter for all URLs. Requires Servlet 3.0+ (Tomcat 7+). Spring Boot users also need @ServletComponentScan on the main class.
  • response.sendError(403) + return — sends the error response and halts the filter chain. Omitting return would continue processing after the error, which is incorrect.
  • EXEMPT_PATHS check comes before the UA check. Without it, robots.txt returns 403 and breaks the bot-blocking protocol.
  • X-Robots-Tag is set after chain.doFilter() — only on responses that were served normally, never on 403 bot blocks.
  • request.getHeader("User-Agent") is case-insensitive per the Servlet specification — no manual lowercasing needed.

Alternative: web.xml registration

If you need explicit ordering or Servlet 2.x compatibility, register the filter in web.xml instead of using the annotation. Remove the @WebFilter annotation from the class first.

<!-- src/main/webapp/WEB-INF/web.xml -->
<web-app xmlns="https://jakarta.ee/xml/ns/jakartaee"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="https://jakarta.ee/xml/ns/jakartaee
           https://jakarta.ee/xml/ns/jakartaee/web-app_6_0.xsd"
         version="6.0">

  <filter>
    <filter-name>AiBotFilter</filter-name>
    <filter-class>com.example.filter.AiBotFilter</filter-class>
  </filter>

  <filter-mapping>
    <filter-name>AiBotFilter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>

</web-app>

For Tomcat 9 / Java EE 8, use the http://xmlns.jcp.org/xml/ns/javaee namespace and version="4.0". Multiple <filter-mapping> entries execute in declaration order — place the bot filter first to block before any other filter processes the request.

Maven dependencies

The Servlet API is provided by the container at runtime — always use scope=provided so it is not bundled in your WAR.

Tomcat 10+ / Jakarta EE 9+

<dependency>
  <groupId>jakarta.servlet</groupId>
  <artifactId>jakarta.servlet-api</artifactId>
  <version>6.0.0</version>
  <scope>provided</scope>
</dependency>

Tomcat 9 / Java EE 8 and earlier

<dependency>
  <groupId>javax.servlet</groupId>
  <artifactId>javax.servlet-api</artifactId>
  <version>4.0.1</version>
  <scope>provided</scope>
</dependency>

Deployment

Build a WAR and deploy to any servlet container:

# Build
mvn clean package -DskipTests

# Deploy to Tomcat
cp target/myapp.war /opt/tomcat/webapps/ROOT.war

# Tomcat auto-deploys on file change; or restart:
/opt/tomcat/bin/shutdown.sh && /opt/tomcat/bin/startup.sh

Container support: Apache Tomcat (7–11), Eclipse Jetty (9–12), WildFly / JBoss EAP (16+), Payara (5+), GlassFish (5+), and any Jakarta EE application server. Spring Boot users: register the filter as a @Bean of type Filter — Spring Boot wraps it in a FilterRegistrationBean automatically, or add @ServletComponentScan to enable @WebFilter discovery.

FAQ

What is the difference between javax.servlet and jakarta.servlet?

javax.servlet is the Java EE namespace used in Tomcat 9 and earlier. jakarta.servlet is the Jakarta EE 9+ namespace used in Tomcat 10+, WildFly 23+, and modern Jakarta EE servers. The API is identical — only the import path changed. Deploying a WAR built against javax.servlet to Tomcat 10+ causes ClassNotFoundException at runtime. Match the dependency to your container version.

Should I use @WebFilter or web.xml?

@WebFilter is simpler — annotate the class and the container discovers it automatically. It requires Servlet 3.0+ (Tomcat 7+). web.xml gives you explicit filter ordering and works on all versions including Servlet 2.x. For a single bot-blocking filter, @WebFilter is sufficient. For multiple filters where execution order matters, prefer web.xml.

Does the filter run before the container serves robots.txt?

Yes — @WebFilter("/*") intercepts all requests, including static file requests. The EXEMPT_PATHS check at the top of doFilter() is essential. Without it, /robots.txt returns 403, which prevents crawlers from reading your disallow rules and breaks the robots.txt protocol.

How do I add noai meta tags in a Spring MVC or Thymeleaf app?

Add the meta tag to your Thymeleaf base layout: <meta name="robots" th:content="${robots ?: 'noai, noimageai'}">. In your controller, use model.addAttribute("robots", "index, follow") for pages that should be indexed normally. The Thymeleaf expression falls back to noai, noimageai when the attribute is absent.

Is this compatible with Spring Boot?

Yes. Spring Boot embeds Tomcat/Jetty/Undertow as a servlet container. Add @ServletComponentScan to your @SpringBootApplication class to enable @WebFilter discovery. Alternatively, declare the filter as a @Bean — Spring Boot registers it via FilterRegistrationBean automatically. For the Spring-idiomatic approach using OncePerRequestFilter and Spring Security integration, see the Spring Boot guide.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.