Skip to content

How to Block AI Bots in Julia Genie.jl

Genie.jl is Julia's full-featured web framework, widely used for serving machine learning models, scientific APIs, and data dashboards. Because Genie is built on top of HTTP.jl, bot blocking hooks into HTTP.jl's middleware stack — a handler-wrapping pattern that intercepts every request before Genie's router, giving you the earliest possible rejection point.

1. Bot pattern module

A dedicated AiBots module keeps patterns isolated and testable. lowercase() normalises the User-Agent; occursin(pattern, ua) does plain-text substring matching. any() short-circuits on the first match.

# src/ai_bots.jl
module AiBots

const PATTERNS = [
    "gptbot",
    "chatgpt-user",
    "claudebot",
    "anthropic-ai",
    "ccbot",
    "google-extended",
    "cohere-ai",
    "meta-externalagent",
    "bytespider",
    "omgili",
    "diffbot",
    "imagesiftbot",
    "magpie-crawler",
    "amazonbot",
    "dataprovider",
    "netcraft",
]

"""
    is_ai_bot(user_agent::AbstractString) -> Bool

Returns true if the User-Agent string matches any known AI crawler pattern.
Pattern matching is case-insensitive substring search — no regex needed.
"""
function is_ai_bot(user_agent::AbstractString)
    ua = lowercase(user_agent)
    any(pattern -> occursin(pattern, ua), PATTERNS)
end

end  # module

2. HTTP.jl middleware function

HTTP.jl middleware is a higher-order function: handler -> req -> response. Return HTTP.Response(403, headers; body=...) to short-circuit. Call handler(req) to forward. Use HTTP.setheader(resp, ...) to add headers to the passing response without reconstructing it.

# src/middleware.jl
using HTTP
include("ai_bots.jl")

"""
HTTP.jl middleware that blocks AI crawlers and injects X-Robots-Tag.

Genie is built on HTTP.jl — pass this function to the middleware stack
via up(middleware=[bot_blocker]) or Server.startup(middleware=[bot_blocker]).
"""
function bot_blocker(handler)
    function(req::HTTP.Request)
        # Allow robots.txt regardless of User-Agent
        if req.target != "/robots.txt"
            ua = HTTP.header(req, "User-Agent", "")
            if AiBots.is_ai_bot(ua)
                return HTTP.Response(
                    403,
                    [
                        "X-Robots-Tag" => "noai, noimageai",
                        "Content-Type" => "text/plain; charset=utf-8",
                    ];
                    body = "Forbidden",
                )
            end
        end

        # Pass to inner handler
        resp = handler(req)

        # Inject X-Robots-Tag on every passing response
        HTTP.setheader(resp, "X-Robots-Tag" => "noai, noimageai")
        return resp
    end
end

3. Register with Genie.Server.startup()

Pass the middleware function in the middleware vector. Multiple middleware functions run left-to-right — bot_blocker should come first so it fires before any authentication or logging middleware.

# app.jl — main Genie application
using Genie
include("src/middleware.jl")
include("src/routes.jl")   # your route definitions

# Start the server with bot-blocking middleware
Genie.Server.startup(
    port       = 8000,
    host       = "0.0.0.0",
    middleware = [bot_blocker],
    async      = false,       # set true for non-blocking startup in scripts
)

4. Routes (unchanged alongside middleware)

Genie's route macros (route, @get, @post) work without modification. Middleware runs transparently before any route handler is invoked.

# src/routes.jl — Genie routes (defined separately from middleware)
using Genie.Router, Genie.Renderer.Html, Genie.Renderer.Json

# Standard Genie route macros work unchanged alongside middleware
route("/") do
    html("<h1>Hello</h1>")
end

route("/api/data") do
    json(Dict("status" => "ok"))
end

# robots.txt — can also be a static file in public/robots.txt
# Genie serves public/ automatically; middleware path check handles both.
route("/robots.txt") do
    Genie.Renderer.respond(
        readfile("public/robots.txt"),
        200,
        Dict("Content-Type" => "text/plain"),
    )
end

5. public/robots.txt

Genie serves the public/ directory automatically. The req.target != "/robots.txt" guard in the middleware ensures AI crawlers can still fetch robots.txt to learn they are disallowed — which is the correct and standards-compliant behaviour.

# public/robots.txt
# Genie automatically serves files from public/ before the router.
# The /robots.txt path check in bot_blocker covers both the static
# file and the route handler variant.

User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

6. occursin() vs regex

# Why occursin() instead of regex?
#
# occursin(pattern, haystack) is a plain substring search — O(n·m) but
# with zero allocation overhead for short patterns. For a list of 16
# fixed patterns checked once per request, this is faster and simpler
# than compiling a regex or using a Trie.
#
# Julia's any() short-circuits on the first true match, so blocked bots
# rarely cause more than 1-2 pattern comparisons.
#
# Equivalent regex approach (if needed for complex patterns):
using Base.RegexMatch
const BOT_REGEX = Regex(join(AiBots.PATTERNS, "|"), "i")
is_ai_bot_regex(ua::String) = occursin(BOT_REGEX, ua)

7. Project.toml

# Project.toml
[deps]
Genie = "c43c736e-a2d1-11e8-161f-af95117fbd1e"
HTTP  = "cd3eb016-35fb-5094-929b-558a96fad6f3"

# Install dependencies:
#   julia --project=. -e 'using Pkg; Pkg.instantiate()'
#
# Run:
#   julia --project=. app.jl

Key points

Framework comparison — scientific / ML ecosystem

FrameworkMiddleware patternShort-circuitPattern match
Julia Genie.jlhandler -> req -> respHTTP.Response(403, ...)occursin()
Python FastAPIBaseHTTPMiddlewareResponse(status_code=403)in ua_lower
Python Flask@app.before_requestabort(403)in ua_lower
R Plumberpr_filter()res$status <- 403; return()grepl()

Genie.jl follows the same handler-wrapping pattern as Python's ASGI middleware (Starlette/FastAPI) — a function that receives the next handler and returns a new handler. The key Julia-specific detail is the JIT warm-up consideration: unlike Python or Go, Julia incurs a compilation cost on first execution, making a startup warm-up call worth adding in latency-sensitive APIs.