Skip to content

How to Block AI Bots in Python Robyn

Robyn is a Python web framework built on Rust (via PyO3) — it uses its own Rust-backed event loop rather than Python's asyncio, which allows it to bypass the GIL for I/O-bound work. The API is Flask-like but with important differences in middleware. Bot blocking uses @app.before_request() global middleware. Robyn stores all headers in lowercase — always use request.headers.get("user-agent"), not "User-Agent". Return a Response to block; return the Request to pass — returning None is invalid.

1. Bot detection

Pure Python, no dependencies. any() with a generator short-circuits on first match.

# bot_utils.py — AI bot detection, no external dependencies

AI_BOT_PATTERNS = [
    "gptbot",
    "chatgpt-user",
    "claudebot",
    "anthropic-ai",
    "ccbot",
    "google-extended",
    "cohere-ai",
    "meta-externalagent",
    "bytespider",
    "omgili",
    "diffbot",
    "imagesiftbot",
    "magpie-crawler",
    "amazonbot",
    "dataprovider",
    "netcraft",
]

def is_ai_bot(ua: str) -> bool:
    """Return True if ua matches a known AI crawler pattern.

    Lowercase comparison — str.lower() + 'in' operator.
    No regex; literal substring match is sufficient and fast.
    """
    if not ua:
        return False
    lower = ua.lower()
    return any(pattern in lower for pattern in AI_BOT_PATTERNS)

2. @app.before_request() — global middleware

@app.before_request() with no path argument applies globally. The function must return a Request (pass) or Response (block). Also includes @app.after_request() to inject X-Robots-Tag on all passing responses — this hook only runs for requests that were not blocked.

# app.py — Robyn app with @before_request global middleware
from robyn import Robyn, Request, Response, Headers

app = Robyn(__file__)

# @app.before_request() with no path argument is GLOBAL — fires for every route.
# The function receives a Request and must return either:
#   - a Request object  → pass through to the route handler
#   - a Response object → send immediately, skip the route handler
# Returning None is NOT valid in Robyn middleware — always return one of the two.
@app.before_request()
async def bot_blocker(request: Request) -> Request | Response:
    # Path guard: robots.txt must be reachable so bots can read Disallow rules.
    if request.url.path == "/robots.txt":
        return request  # pass through

    # CRITICAL: Robyn stores ALL headers in lowercase.
    # request.headers.get("User-Agent") → None  (Title-Case fails)
    # request.headers.get("user-agent") → the value or None
    ua = request.headers.get("user-agent") or ""

    if is_ai_bot(ua):
        # Block: return a Response — Robyn sends this and skips the route handler.
        return Response(
            status_code=403,
            headers=Headers({
                "content-type": "text/plain",
                "x-robots-tag": "noai, noimageai",
            }),
            description="Forbidden",
        )

    # Pass: return the Request object — Robyn continues to the route handler.
    return request


# @after_request injects X-Robots-Tag on ALL passing responses.
# This runs AFTER the route handler completes (only for requests that passed).
# Blocked requests (where before_request returned a Response) skip this hook.
@app.after_request()
async def inject_robots_tag(response: Response) -> Response:
    response.headers["x-robots-tag"] = "noai, noimageai"
    return response

3. Route handlers

Route handlers are only reached when the middleware returns the Request object. Robyn handlers can return a dict (auto-serialised to JSON), a str, or a Response object.

# routes.py — route handlers (only reached when middleware passes the request)
from robyn import Robyn, Request, Response, Headers

def register_routes(app: Robyn) -> None:
    @app.get("/robots.txt")
    async def robots_txt(request: Request) -> str:
        return """User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /
"""

    @app.get("/")
    async def index(request: Request) -> dict:
        return {"message": "Hello"}

    @app.get("/api/data")
    async def api_data(request: Request) -> dict:
        return {"data": "value"}

4. Scoped middleware — SubRouter

SubRouter groups routes under a prefix with their own middleware stack. app.include_router(api) mounts the sub-router onto the main app. Public routes on the root app are unaffected by sub-router middleware.

# Scoped middleware — protect /api routes using SubRouter.
# SubRouter lets you group routes under a prefix with their own middleware.

from robyn import Robyn, Request, Response, Headers
from robyn.router import SubRouter

app = Robyn(__file__)

# Public routes — no bot blocking
@app.get("/robots.txt")
async def robots_txt(request: Request) -> str:
    return "User-agent: *\nAllow: /\n"

@app.get("/")
async def index(request: Request) -> dict:
    return {"message": "Hello"}


# Protected API sub-router
api = SubRouter(__name__, prefix="/api")

@api.before_request()
async def api_bot_blocker(request: Request) -> Request | Response:
    ua = request.headers.get("user-agent") or ""
    if is_ai_bot(ua):
        return Response(
            status_code=403,
            headers=Headers({"x-robots-tag": "noai, noimageai"}),
            description="Forbidden",
        )
    return request

@api.get("/data")
async def api_data(request: Request) -> dict:
    return {"data": "value"}

# Register sub-router on the main app
app.include_router(api)

5. Install and run

# Install and run
pip install robyn

# Run with default single worker
python app.py

# Run with multiple processes and workers (built-in — no Gunicorn needed)
python app.py --processes 2 --workers 4 --port 8080

# Development mode with auto-reload
python app.py --dev

Key points

Framework comparison — Python web frameworks

FrameworkMiddlewareBlockUA access
Robyn@app.before_request()return Response(status_code=403, ...)request.headers.get("user-agent") (lowercase only)
Flask@app.before_requestreturn make_response("Forbidden", 403)request.headers.get("User-Agent") (case-insensitive)
FastAPI@app.middleware("http")return Response(status_code=403)request.headers.get("user-agent") (case-insensitive)
Sanic@app.middleware("request")return HTTPResponse("Forbidden", 403)request.headers.get("User-Agent") (case-insensitive)

Robyn is the only framework in this table that requires exact lowercase header keys — all others normalise case internally. The pass/block return convention also differs: Flask returns None to pass; Robyn returns the Request object. FastAPI and Sanic both use return Response(...) to block, consistent with Robyn — but their UA header access is case-insensitive, unlike Robyn.