How to Block AI Bots in Scala Scalatra
Scalatra is a Sinatra-inspired Scala web framework built on the Java Servlet API, running on embedded Jetty. It is lightweight and direct — routes are defined as closures, and a before() filter fires before every route handler. The Scalatra-specific detail for bot blocking: short-circuiting is done with halt(), a DSL method that throws a HaltException internally — Scalatra catches it and renders the provided status and body. Because halt() throws, no return is needed after it. Header access uses the underlying servlet API: request.getHeader() returns null when absent — wrap it in Option().getOrElse("") for idiomatic Scala null safety.
1. Bot detection object
A Scala singleton object with no external dependencies. String.contains() performs literal substring matching. toLowerCase is applied once before the exists check — no regex engine involved.
// src/main/scala/com/example/AiBotDetector.scala
package com.example
object AiBotDetector {
// All lowercase — matched against ua.toLowerCase
private val patterns: Seq[String] = Seq(
"gptbot",
"chatgpt-user",
"claudebot",
"anthropic-ai",
"ccbot",
"google-extended",
"cohere-ai",
"meta-externalagent",
"bytespider",
"omgili",
"diffbot",
"imagesiftbot",
"magpie-crawler",
"amazonbot",
"dataprovider",
"netcraft"
)
def isAiBot(ua: String): Boolean = {
if (ua == null || ua.isEmpty) return false
val lower = ua.toLowerCase
// String.contains() — literal substring, no regex
patterns.exists(lower.contains)
}
}2. ScalatraServlet with before() filter
Extend ScalatraServlet and add a before() block. request.getHeader() returns null when absent — use Option(...).getOrElse(""). Set response headers on the response object before calling halt() — headers must be set before the exception is thrown.
// src/main/scala/com/example/MyServlet.scala
package com.example
import org.scalatra._
class MyServlet extends ScalatraServlet {
// ── Global before() filter ────────────────────────────────────────────────
// Fires before every route handler in this servlet.
// halt() stops execution immediately — no return needed after it.
before() {
// Path guard: let robots.txt through.
if (request.getPathInfo == "/robots.txt") {
return // pass through — continue to route handler
}
// request.getHeader() returns null when the header is absent.
// Option().getOrElse() converts null to empty string safely.
val ua: String = Option(request.getHeader("User-Agent")).getOrElse("")
if (AiBotDetector.isAiBot(ua)) {
// Set response header on the blocked response
response.setHeader("X-Robots-Tag", "noai, noimageai")
response.setContentType("text/plain")
// halt() throws HaltException — Scalatra catches it and renders
// the provided status and body. Code after halt() does not run.
halt(403, "Forbidden")
}
// Pass-through: inject X-Robots-Tag on all non-blocked responses
response.setHeader("X-Robots-Tag", "noai, noimageai")
}
// ── Routes ─────────────────────────────────────────────────────────────────
get("/") {
contentType = "application/json"
"""{"message": "Hello"}"""
}
get("/api/data") {
contentType = "application/json"
"""{"data": "value"}"""
}
get("/robots.txt") {
contentType = "text/plain"
"""User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /"""
}
}3. ScalatraBootstrap — servlet registration
Scalatra discovers ScalatraBootstrap by convention at startup. Register servlets with context.mount(new MyServlet, "/*"). Multiple servlets can be mounted at different path prefixes.
// src/main/scala/com/example/ScalatraBootstrap.scala
// LifeCycle class — registers servlets with the container.
// Scalatra discovers this class by convention at startup.
import com.example._
import org.scalatra._
import javax.servlet.ServletContext
class ScalatraBootstrap extends LifeCycle {
override def init(context: ServletContext): Unit = {
// Mount the servlet at all paths
context.mount(new MyServlet, "/*")
}
override def destroy(context: ServletContext): Unit = {
// Cleanup — close DB connections, stop scheduled tasks
}
}4. Embedded Jetty launcher
Scalatra is typically run with embedded Jetty for standalone deployment. The ScalatraListener discovers ScalatraBootstrap and mounts all registered servlets. Build a fat JAR with sbt assembly and run directly.
// src/main/scala/com/example/JettyLauncher.scala
// Embedded Jetty server — run as a standalone application.
import org.eclipse.jetty.server.Server
import org.eclipse.jetty.webapp.WebAppContext
import org.scalatra.servlet.ScalatraListener
object JettyLauncher extends App {
val port = sys.env.getOrElse("PORT", "8080").toInt
val server = new Server(port)
val context = new WebAppContext()
context.setContextPath("/")
// Point to the webapp directory (contains WEB-INF/web.xml)
context.setResourceBase("src/main/webapp")
// Scalatra's listener discovers ScalatraBootstrap and mounts servlets
context.addEventListener(new ScalatraListener)
context.addServlet(classOf[org.eclipse.jetty.servlet.DefaultServlet], "/")
server.setHandler(context)
server.start()
server.join()
}5. Allow-list path guard
When multiple paths should bypass the filter, use a Set allow-list and a startsWith check for path prefixes. Cleaner than repeated || conditions in the guard.
// Exclude multiple paths from the bot filter using pattern matching
before() {
val path = request.getPathInfo
// Allow list: paths that bypass bot blocking
val allowed = Set("/robots.txt", "/health", "/favicon.ico")
if (allowed.contains(path) || path.startsWith("/public/")) {
return
}
val ua = Option(request.getHeader("User-Agent")).getOrElse("")
if (AiBotDetector.isAiBot(ua)) {
response.setHeader("X-Robots-Tag", "noai, noimageai")
halt(403, "Forbidden")
}
response.setHeader("X-Robots-Tag", "noai, noimageai")
}6. halt() with JSON body — API variant
For API-only servlets, pass a Map to halt() — Scalatra serialises it to JSON when JSON format is active (requires scalatra-json + Jackson or json4s). The body named argument accepts any type that Scalatra knows how to render.
// halt() with a map body — Scalatra renders it as JSON when format is json
// Useful for API-only servlets where clients expect JSON error responses.
before() {
if (request.getPathInfo == "/robots.txt") return
val ua = Option(request.getHeader("User-Agent")).getOrElse("")
if (AiBotDetector.isAiBot(ua)) {
response.setHeader("X-Robots-Tag", "noai, noimageai")
// halt with a Map — Scalatra serialises to JSON if JSON format is active
halt(403, body = Map("error" -> "Forbidden", "status" -> 403))
}
response.setHeader("X-Robots-Tag", "noai, noimageai")
}7. web.xml — static robots.txt via DefaultServlet
Configure Jetty's DefaultServlet to handle /robots.txt before Scalatra. When configured this way, the before() filter never fires for it — the path guard is a safety net for other configurations.
<!-- src/main/webapp/WEB-INF/web.xml -->
<!-- Minimal web.xml for Scalatra with embedded Jetty -->
<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns="http://xmlns.jcp.org/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
version="3.1">
<listener>
<listener-class>org.scalatra.servlet.ScalatraListener</listener-class>
</listener>
<!-- Optional: serve static files (including robots.txt) via DefaultServlet
before Scalatra handles the request -->
<servlet>
<servlet-name>default</servlet-name>
<servlet-class>org.eclipse.jetty.servlet.DefaultServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>default</servlet-name>
<url-pattern>/robots.txt</url-pattern>
</servlet-mapping>
</web-app>8. build.sbt
// build.sbt
val ScalatraVersion = "2.8.4"
lazy val root = project
.in(file("."))
.settings(
name := "my-scalatra-app",
scalaVersion := "2.13.12",
libraryDependencies ++= Seq(
"org.scalatra" %% "scalatra" % ScalatraVersion,
"org.scalatra" %% "scalatra-json" % ScalatraVersion, // optional JSON support
"ch.qos.logback" % "logback-classic" % "1.4.14" % Runtime,
// Embedded Jetty
"org.eclipse.jetty" % "jetty-webapp" % "9.4.53.v20231009" % Container,
"javax.servlet" % "javax.servlet-api" % "3.1.0" % Provided
),
// sbt-revolver for hot reload in dev
)Key points
- halt() throws — no return needed:
halt(403, "Forbidden")throws aHaltExceptioninternally. Scalatra catches it and renders the response. Unlike Dropwizard'sabortWith(), you do not need toreturnafterhalt()— execution stops at the throw site. - Set response headers before halt():
response.setHeader()must be called beforehalt(). Once the exception is thrown, Scalatra takes over response rendering — headers set afterhalt()may not appear. Pass headers directly in thehalt()call using theheadersargument as an alternative. - request.getHeader() returns null — use Option(): The underlying servlet API returns
nullfor absent headers. Wrap withOption(request.getHeader("User-Agent")).getOrElse("")for idiomatic Scala null safety. - before() fires for all routes including 404: In Scalatra,
before()fires even for unmatched paths — Scalatra's 404 handler runs after the before filter. This means the bot filter applies to requests that would 404, which is the correct behaviour for bot blocking. - ScalatraServlet vs ScalatraFilter:
ScalatraServletterminates the request;ScalatraFiltercan pass unmatched requests to the next filter in the chain. For bot blocking, both work identically — useScalatraServletfor standalone APIs andScalatraFilterwhen coexisting with other servlets. - Sinatra analogy: Scalatra's
halt()is equivalent to Sinatra'shalt— both are throw/catch-based short-circuits. Grape'serror!()is also exception-based. All three contrast with frameworks where blocking requires explicit return values (Lapis table-return, Drogonfccb(resp)).
Framework comparison — Scala web frameworks
| Framework | Hook / filter | Block call | UA header |
|---|---|---|---|
| Scalatra | before() { } | halt(403, "Forbidden") | Option(request.getHeader("User-Agent")).getOrElse("") |
| Play Framework | EssentialFilter / ActionFilter | return Future(Forbidden("...")) | request.headers.get("User-Agent") |
| http4s | HttpRoutes.of middleware | return Forbidden("...") pure | req.headers.get["User-Agent"] |
| ZIO HTTP | Middleware composition | return ZIO.succeed(Response.status(Status.Forbidden)) | req.header(Header.UserAgent) |
Scalatra is the most imperative of the Scala frameworks — mutable response object, halt() via exception, servlet API for headers. Play, http4s, and ZIO HTTP are all functional and return-value-based. Scalatra is the right choice for teams coming from Java servlets or Sinatra who want a familiar, low-ceremony API in Scala.
Dependencies
# Run with sbt
sbt run
# Build fat JAR
sbt assembly
# Run fat JAR
java -jar target/scala-2.13/my-scalatra-app-assembly-0.1.0-SNAPSHOT.jar
# Hot reload in development
sbt "~;jetty:stop;jetty:start" # with xsbt-web-plugin
# Scalatra version support
# Scalatra 2.8.x — Scala 2.12 / 2.13, Jetty 9.x, javax.servlet
# Scalatra 3.x — Scala 3, Jetty 11+, jakarta.servlet (Jakarta EE 9+)