Skip to content

How to Block AI Bots in Python BlackSheep

BlackSheep is a fast async Python web framework built for performance — it uses bytes throughout the HTTP layer, avoiding string encoding/decoding overhead on every request. This is the defining API difference from FastAPI, Starlette, and Quart, which abstract headers as case-insensitive string dicts. In BlackSheep, header names are always lowercase bytes: request.get_first_header(b"user-agent") — not "User-Agent". The return value is a Header object with a .value bytes attribute, or None when absent. Middleware uses the @app.middleware decorator with an async (request, handler) -> Response signature — return a Response to block, or await handler(request) to pass through.

1. Bot detection

Plain Python, no dependencies. str.find() for literal substring matching. Called with the decoded string — decoding happens once in the middleware, not here.

# bot_utils.py — bot detection, no dependencies
from __future__ import annotations

# All lowercase — matched against ua.lower()
_AI_BOT_PATTERNS: tuple[str, ...] = (
    "gptbot",
    "chatgpt-user",
    "claudebot",
    "anthropic-ai",
    "ccbot",
    "google-extended",
    "cohere-ai",
    "meta-externalagent",
    "bytespider",
    "omgili",
    "diffbot",
    "imagesiftbot",
    "magpie-crawler",
    "amazonbot",
    "dataprovider",
    "netcraft",
)


def is_ai_bot(ua: str) -> bool:
    if not ua:
        return False
    lower = ua.lower()
    return any(lower.find(p) != -1 for p in _AI_BOT_PATTERNS)

2. @app.middleware — bytes header access

Register with @app.middleware. The critical detail: get_first_header(b"user-agent") takes a bytes literal with lowercase name. Decode .value to a string before pattern matching. Return a Response(403) to block — do not call handler after returning.

# app.py — BlackSheep application with middleware bot blocking
from __future__ import annotations

from blacksheep import Application, Request, Response
from blacksheep.contents import TextContent

from bot_utils import is_ai_bot

app = Application()


# ── Bot-blocking middleware ───────────────────────────────────────────────────
# Register with @app.middleware — fires for every request before routing.
# Signature: async (request, handler) -> Response
# Pass through: return await handler(request)
# Block: return Response(...) WITHOUT calling handler
@app.middleware
async def block_ai_bots(request: Request, handler) -> Response:
    # Path guard: let robots.txt through.
    if request.url.path == b"/robots.txt":
        return await handler(request)

    # BlackSheep uses bytes for header names — always lowercase bytes.
    # get_first_header() returns a Header object or None.
    header = request.get_first_header(b"user-agent")

    # .value is bytes — decode to str for pattern matching.
    ua: str = header.value.decode("utf-8", errors="replace") if header else ""

    if is_ai_bot(ua):
        # Return a Response directly — do NOT call handler(request).
        # Headers are (bytes, bytes) tuples in BlackSheep.
        return Response(
            403,
            headers=[(b"X-Robots-Tag", b"noai, noimageai")],
            content=TextContent("Forbidden"),
        )

    # Pass through: call the next handler and inject X-Robots-Tag.
    response = await handler(request)
    response.add_header(b"X-Robots-Tag", b"noai, noimageai")
    return response


# ── Routes ────────────────────────────────────────────────────────────────────
@app.router.get("/")
async def index(request: Request) -> Response:
    return Response(200, content=TextContent("Hello"))


@app.router.get("/api/data")
async def api_data(request: Request) -> Response:
    from blacksheep.contents import JSONContent
    return Response(200, content=JSONContent({"data": "value"}))


@app.router.get("/robots.txt")
async def robots_txt(request: Request) -> Response:
    content = """User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /
"""
    return Response(200, content=TextContent(content))

3. Class-based middleware

A callable class allows constructor injection of dependencies (config, Redis client, allow-list). Register via app.middlewares.append(). Middleware is applied in reverse insertion order — last appended runs first.

# Alternative: middleware as a callable class
# Useful when the middleware needs injected dependencies (DB, Redis, config).

from blacksheep import Request, Response
from blacksheep.contents import TextContent
from bot_utils import is_ai_bot


class AiBotBlocker:
    """Middleware class — instantiated once, called for every request."""

    async def __call__(self, request: Request, handler) -> Response:
        if request.url.path == b"/robots.txt":
            return await handler(request)

        header = request.get_first_header(b"user-agent")
        ua: str = header.value.decode("utf-8", errors="replace") if header else ""

        if is_ai_bot(ua):
            return Response(
                403,
                headers=[(b"X-Robots-Tag", b"noai, noimageai")],
                content=TextContent("Forbidden"),
            )

        response = await handler(request)
        response.add_header(b"X-Robots-Tag", b"noai, noimageai")
        return response


# Register with app.middlewares.append()
# Note: middlewares are applied in reverse order — last added runs first.
app.middlewares.append(AiBotBlocker())

4. Header access patterns

Four equivalent ways to access the User-Agent header in BlackSheep. All return bytes-based values. The key invariant: all header names in BlackSheep are stored and looked up as lowercase bytes — use b"user-agent", not b"User-Agent".

# Header access patterns in BlackSheep — all equivalent for User-Agent

# 1. get_first_header() — returns Header object or None
header = request.get_first_header(b"user-agent")
ua = header.value.decode() if header else ""

# 2. try_get_single_header() — same as get_first_header for most headers
header = request.try_get_single_header(b"user-agent")
ua = header.value.decode() if header else ""

# 3. headers property — iterate all headers
for name, value in request.headers:
    if name == b"user-agent":
        ua = value.decode()
        break

# 4. get_headers() — returns list of Header objects with that name
headers = request.get_headers(b"user-agent")
ua = headers[0].value.decode() if headers else ""

# Key point: ALL header names in BlackSheep are bytes, lowercase.
# "User-Agent", "user-agent", b"User-Agent" — all normalised to b"user-agent".
# Always use lowercase bytes literals: b"user-agent" not b"User-Agent".

5. With OpenAPI and dependency injection

BlackSheep has built-in OpenAPI support and a DI container. Middleware registration is the same regardless of other features — @app.middleware works alongside DI-injected handlers and OpenAPI docs.

# BlackSheep with OpenAPI docs — dependency injection pattern
# BlackSheep has built-in DI; middleware receives the app's services.

from blacksheep import Application
from blacksheep.server.openapi.v3 import OpenAPIHandler
from openapidocs.v3 import Info

app = Application()

# OpenAPI docs (optional)
docs = OpenAPIHandler(ui_path="/docs", info=Info(title="My API", version="0.1.0"))
docs.bind_app(app)

# Middleware is registered the same way regardless of other features
@app.middleware
async def block_ai_bots(request, handler):
    header = request.get_first_header(b"user-agent")
    ua = header.value.decode() if header else ""
    if is_ai_bot(ua):
        return Response(403, headers=[(b"X-Robots-Tag", b"noai, noimageai")],
                        content=TextContent("Forbidden"))
    response = await handler(request)
    response.add_header(b"X-Robots-Tag", b"noai, noimageai")
    return response

6. ASGI deployment — Uvicorn / Hypercorn

# Run with Uvicorn (ASGI server)
# uvicorn app:app --host 0.0.0.0 --port 8080 --workers 4

# Or Gunicorn with Uvicorn workers:
# gunicorn app:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8080

# Or Hypercorn (also ASGI-compatible):
# hypercorn app:app --bind 0.0.0.0:8080 --workers 4

# BlackSheep also has a built-in dev server:
# python app.py  (if app.run() is called in __main__)

# Run programmatically:
if __name__ == "__main__":
    import uvicorn
    uvicorn.run("app:app", host="0.0.0.0", port=8080, reload=True)

7. robots.txt

# static/robots.txt (served via Nginx upstream, or as a BlackSheep route)
User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Key points

Framework comparison — Python async frameworks

FrameworkMiddlewareBlockUA header
BlackSheep@app.middleware async (req, handler)return Response(403) without calling handlerreq.get_first_header(b"user-agent").value.decode()
FastAPI@app.middleware("http") async (req, call_next)return Response(status_code=403)request.headers.get("user-agent", "")
StarletteBaseHTTPMiddleware subclassreturn Response("Forbidden", status_code=403)request.headers.get("user-agent", "")
Quart@app.before_request async defabort(403) or return responserequest.headers.get("User-Agent", "")

BlackSheep's bytes-based header API is the most distinctive among Python frameworks — it trades API ergonomics for performance. FastAPI, Starlette, and Quart all use string-based header access with case-insensitive lookup. If you're migrating from FastAPI to BlackSheep, the header access pattern is the main thing to update — the middleware structure is similar.

Dependencies

pip install blacksheep
pip install uvicorn[standard]    # ASGI server
pip install gunicorn             # optional process manager

# Optional extras
pip install blacksheep[full]     # includes openapi, jinja2, etc.

# Run development server
uvicorn app:app --reload --host 0.0.0.0 --port 8080

# Production
uvicorn app:app --workers 4 --host 0.0.0.0 --port 8080

# BlackSheep version: 2.x (current)
# Python 3.8+ required; 3.11+ recommended for best async performance