Spring Boot's static resource handler serves robots.txt with zero configuration — just drop it in src/main/resources/static/. For hard blocking, you have two clean options: a HandlerInterceptor in the MVC layer or a OncePerRequestFilter at the servlet layer — both intercept before any controller code runs.
| Method |
|---|
| robots.txt in src/main/resources/static/ Always — zero config needed |
| HandlerInterceptor (MVC layer) No Spring Security dependency |
| OncePerRequestFilter (servlet layer) Spring Security already in project |
| noai meta tag in Thymeleaf layout Thymeleaf template engine |
| nginx reverse proxy block nginx in front of embedded Tomcat |
Spring Boot's auto-configured ResourceHttpRequestHandler serves everything in src/main/resources/static/ at the root URL. No @Controller, no @RequestMapping — just drop the file and it's available at /robots.txt.
User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Google-Extended Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: meta-externalagent Disallow: / User-agent: Amazonbot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: xAI-Bot Disallow: / User-agent: DeepSeekBot Disallow: / User-agent: MistralBot Disallow: / User-agent: Diffbot Disallow: / User-agent: cohere-ai Disallow: / User-agent: AI2Bot Disallow: / User-agent: Ai2Bot-Dolma Disallow: / User-agent: YouBot Disallow: / User-agent: DuckAssistBot Disallow: / User-agent: omgili Disallow: / User-agent: omgilibot Disallow: / User-agent: webzio-extended Disallow: / User-agent: gemini-deep-research Disallow: / User-agent: * Allow: /
If you need environment-specific rules (e.g., block all crawlers in staging), expose a controller endpoint instead of a static file:
package com.example.app.web;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
@RestController
public class RobotsController {
@Value("${spring.profiles.active:default}")
private String activeProfile;
private static final String AI_BOTS_DISALLOW = """
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: *
Allow: /
""";
private static final String BLOCK_ALL = """
User-agent: *
Disallow: /
""";
@GetMapping(value = "/robots.txt", produces = MediaType.TEXT_PLAIN_VALUE)
public ResponseEntity<String> robots() {
String body = "production".equals(activeProfile)
? AI_BOTS_DISALLOW
: BLOCK_ALL;
return ResponseEntity.ok(body);
}
}Static vs controller: If bothsrc/main/resources/static/robots.txtand a @GetMapping("/robots.txt") controller exist, the controller takes precedence. Use one or the other.
A HandlerInterceptor fires after DispatcherServlet but before any@Controller method. Return false from preHandle() to stop the chain.
package com.example.app.config;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import org.springframework.stereotype.Component;
import org.springframework.web.servlet.HandlerInterceptor;
import java.util.regex.Pattern;
@Component
public class AiBotBlockingInterceptor implements HandlerInterceptor {
private static final Pattern BLOCKED_UAS = Pattern.compile(
"GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|" +
"Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|" +
"Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|" +
"cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|" +
"webzio-extended|gemini-deep-research",
Pattern.CASE_INSENSITIVE
);
@Override
public boolean preHandle(HttpServletRequest request,
HttpServletResponse response,
Object handler) throws Exception {
// Always allow robots.txt through
if ("/robots.txt".equals(request.getRequestURI())) {
return true;
}
String userAgent = request.getHeader("User-Agent");
if (userAgent != null && BLOCKED_UAS.matcher(userAgent).find()) {
response.sendError(HttpServletResponse.SC_FORBIDDEN, "Forbidden");
return false;
}
return true;
}
}package com.example.app.config;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.servlet.config.annotation.InterceptorRegistry;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;
@Configuration
public class WebConfig implements WebMvcConfigurer {
private final AiBotBlockingInterceptor aiBotBlockingInterceptor;
public WebConfig(AiBotBlockingInterceptor aiBotBlockingInterceptor) {
this.aiBotBlockingInterceptor = aiBotBlockingInterceptor;
}
@Override
public void addInterceptors(InterceptorRegistry registry) {
registry.addInterceptor(aiBotBlockingInterceptor)
.addPathPatterns("/**")
.excludePathPatterns("/robots.txt", "/favicon.ico");
}
}OncePerRequestFilter runs at the servlet filter level — earlier in the stack than HandlerInterceptor, before DispatcherServlet. Preferred if you already have Spring Security, since it fits naturally into the SecurityFilterChain.
package com.example.app.security;
import jakarta.servlet.FilterChain;
import jakarta.servlet.ServletException;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import org.springframework.web.filter.OncePerRequestFilter;
import java.io.IOException;
import java.util.regex.Pattern;
public class AiBotBlockingFilter extends OncePerRequestFilter {
private static final Pattern BLOCKED_UAS = Pattern.compile(
"GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|" +
"Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|" +
"Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|" +
"cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|" +
"webzio-extended|gemini-deep-research",
Pattern.CASE_INSENSITIVE
);
@Override
protected boolean shouldNotFilter(HttpServletRequest request) {
// Skip filter for robots.txt — always let it through
return "/robots.txt".equals(request.getRequestURI());
}
@Override
protected void doFilterInternal(HttpServletRequest request,
HttpServletResponse response,
FilterChain filterChain)
throws ServletException, IOException {
String userAgent = request.getHeader("User-Agent");
if (userAgent != null && BLOCKED_UAS.matcher(userAgent).find()) {
response.sendError(HttpServletResponse.SC_FORBIDDEN, "Forbidden");
return;
}
filterChain.doFilter(request, response);
}
}package com.example.app.security;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.security.config.annotation.web.builders.HttpSecurity;
import org.springframework.security.config.annotation.web.configuration.EnableWebSecurity;
import org.springframework.security.web.SecurityFilterChain;
import org.springframework.security.web.authentication.UsernamePasswordAuthenticationFilter;
@Configuration
@EnableWebSecurity
public class SecurityConfig {
@Bean
public SecurityFilterChain filterChain(HttpSecurity http) throws Exception {
http
.addFilterBefore(
new AiBotBlockingFilter(),
UsernamePasswordAuthenticationFilter.class
)
.authorizeHttpRequests(auth -> auth
.requestMatchers("/robots.txt", "/public/**").permitAll()
.anyRequest().authenticated()
);
return http.build();
}
}package com.example.app.config;
import com.example.app.security.AiBotBlockingFilter;
import org.springframework.boot.web.servlet.FilterRegistrationBean;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class FilterConfig {
@Bean
public FilterRegistrationBean<AiBotBlockingFilter> aiBotBlockingFilter() {
FilterRegistrationBean<AiBotBlockingFilter> registration =
new FilterRegistrationBean<>();
registration.setFilter(new AiBotBlockingFilter());
registration.addUrlPatterns("/*");
registration.setOrder(1); // runs first
return registration;
}
}Add the noai meta tag to your base Thymeleaf layout template. All pages that extend the layout inherit it automatically. Use a named fragment for per-page override.
<!DOCTYPE html>
<html xmlns:th="http://www.thymeleaf.org"
xmlns:layout="http://www.ultraq.net.nz/thymeleaf/layout">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<!-- Block AI training bots on all pages -->
<meta name="robots" th:content="'noai, noimageai'" />
<!-- Per-page head content (title, description, etc.) -->
<th:block layout:fragment="head-extra"></th:block>
</head>
<body>
<div layout:fragment="content"></div>
</body>
</html><!DOCTYPE html>
<html xmlns:th="http://www.thymeleaf.org"
xmlns:layout="http://www.ultraq.net.nz/thymeleaf/layout"
layout:decorate="~{layout/base}">
<head>
<!-- Override: allow AI on this specific public article -->
<th:block layout:fragment="head-extra">
<meta name="robots" content="index, follow" />
</th:block>
</head>
<body>
<div layout:fragment="content">
<h1 th:text="${article.title}">Article Title</h1>
<!-- content -->
</div>
</body>
</html><!DOCTYPE html>
<html xmlns:th="http://www.thymeleaf.org">
<head th:fragment="head(title)">
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="robots" content="noai, noimageai" />
<title th:text="${title}">My App</title>
</head>
</html>
<!-- In page templates: -->
<!-- <head th:replace="~{fragments/head :: head('Page Title')}"></head> -->In production, Spring Boot typically runs behind nginx. Block AI bots at the nginx level before any request reaches the JVM — cheaper compute and stops bots that ignore robots.txt.
# In http { } block of /etc/nginx/nginx.conf:
map $http_user_agent $blocked_ai_bot {
default 0;
"~*GPTBot" 1;
"~*ChatGPT-User" 1;
"~*OAI-SearchBot" 1;
"~*ClaudeBot" 1;
"~*anthropic-ai" 1;
"~*Google-Extended" 1;
"~*Bytespider" 1;
"~*CCBot" 1;
"~*PerplexityBot" 1;
"~*meta-externalagent" 1;
"~*Amazonbot" 1;
"~*Applebot-Extended" 1;
"~*xAI-Bot" 1;
"~*DeepSeekBot" 1;
"~*MistralBot" 1;
"~*Diffbot" 1;
"~*cohere-ai" 1;
"~*AI2Bot" 1;
"~*Ai2Bot-Dolma" 1;
"~*omgili" 1;
"~*omgilibot" 1;
"~*webzio-extended" 1;
"~*gemini-deep-research" 1;
}
server {
listen 80;
server_name myapp.com www.myapp.com;
# Block AI bots before hitting Spring Boot
if ($blocked_ai_bot) {
return 403;
}
# noai header on all responses
add_header X-Robots-Tag "noai, noimageai" always;
# Proxy to embedded Tomcat
location / {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: spring-app-ingress
annotations:
nginx.ingress.kubernetes.io/server-snippet: |
if ($http_user_agent ~* "GPTBot|ClaudeBot|CCBot|Bytespider|Diffbot|Google-Extended|anthropic-ai|meta-externalagent|cohere-ai|AI2Bot|DeepSeekBot|MistralBot") {
return 403;
}
add_header X-Robots-Tag "noai, noimageai" always;
spec:
rules:
- host: myapp.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: spring-app-service
port:
number: 8080| Layer | Runs where |
|---|---|
| HandlerInterceptor | After DispatcherServlet |
| OncePerRequestFilter | Before DispatcherServlet |
| nginx map + if | Before JVM |
| K8s ingress snippet | Before pods |
| Bot | Operator |
|---|---|
| GPTBot | OpenAI |
| ChatGPT-User | OpenAI |
| OAI-SearchBot | OpenAI |
| ClaudeBot | Anthropic |
| anthropic-ai | Anthropic |
| Google-Extended | |
| Bytespider | ByteDance |
| CCBot | Common Crawl |
| PerplexityBot | Perplexity |
| meta-externalagent | Meta |
| Amazonbot | Amazon |
| Applebot-Extended | Apple |
| xAI-Bot | xAI |
| DeepSeekBot | DeepSeek |
| MistralBot | Mistral |
| Diffbot | Diffbot |
| cohere-ai | Cohere |
| AI2Bot | Allen Institute |
| Ai2Bot-Dolma | Allen Institute |
| YouBot | You.com |
| DuckAssistBot | DuckDuckGo |
| omgili | Webz.io |
| omgilibot | Webz.io |
| webzio-extended | Webz.io |
| gemini-deep-research |
In src/main/resources/static/robots.txt. Spring Boot's auto-configured ResourceHttpRequestHandler serves everything in static/ at the root URL — no controller needed.
If you have Spring Security: use OncePerRequestFilter and register it in your SecurityFilterChain. If you don't have Spring Security: use HandlerInterceptor via WebMvcConfigurer.addInterceptors() — fewer dependencies.
No. There is no application.properties setting for user-agent blocking. You need to write a filter or interceptor, or block at the nginx/ingress layer.
Add <meta name="robots" content="noai, noimageai"> to your base layout template (Thymeleaf Layout Dialect) or a shared th:fragment. Pages that extend the layout inherit it. Override per-page with a layout:fragment block.
Use the nginx ingress controller's nginx.ingress.kubernetes.io/server-snippet annotation to inject user-agent blocking rules at the ingress layer. This blocks before traffic reaches your pods. Alternatively, implement blocking in-application via OncePerRequestFilter — works regardless of infrastructure.
It will unless you add a profile check. Use @Profile("!dev") on your FilterRegistrationBean or SecurityConfig bean to disable it in the dev profile. Or inject @Value("${spring.profiles.active}") and short-circuit the filter logic in non-production.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.