How to Block AI Bots in Python BlackSheep
BlackSheep is a fast async Python web framework built for performance — it uses bytes throughout the HTTP layer, avoiding string encoding/decoding overhead on every request. This is the defining API difference from FastAPI, Starlette, and Quart, which abstract headers as case-insensitive string dicts. In BlackSheep, header names are always lowercase bytes: request.get_first_header(b"user-agent") — not "User-Agent". The return value is a Header object with a .value bytes attribute, or None when absent. Middleware uses the @app.middleware decorator with an async (request, handler) -> Response signature — return a Response to block, or await handler(request) to pass through.
1. Bot detection
Plain Python, no dependencies. str.find() for literal substring matching. Called with the decoded string — decoding happens once in the middleware, not here.
# bot_utils.py — bot detection, no dependencies
from __future__ import annotations
# All lowercase — matched against ua.lower()
_AI_BOT_PATTERNS: tuple[str, ...] = (
"gptbot",
"chatgpt-user",
"claudebot",
"anthropic-ai",
"ccbot",
"google-extended",
"cohere-ai",
"meta-externalagent",
"bytespider",
"omgili",
"diffbot",
"imagesiftbot",
"magpie-crawler",
"amazonbot",
"dataprovider",
"netcraft",
)
def is_ai_bot(ua: str) -> bool:
if not ua:
return False
lower = ua.lower()
return any(lower.find(p) != -1 for p in _AI_BOT_PATTERNS)2. @app.middleware — bytes header access
Register with @app.middleware. The critical detail: get_first_header(b"user-agent") takes a bytes literal with lowercase name. Decode .value to a string before pattern matching. Return a Response(403) to block — do not call handler after returning.
# app.py — BlackSheep application with middleware bot blocking
from __future__ import annotations
from blacksheep import Application, Request, Response
from blacksheep.contents import TextContent
from bot_utils import is_ai_bot
app = Application()
# ── Bot-blocking middleware ───────────────────────────────────────────────────
# Register with @app.middleware — fires for every request before routing.
# Signature: async (request, handler) -> Response
# Pass through: return await handler(request)
# Block: return Response(...) WITHOUT calling handler
@app.middleware
async def block_ai_bots(request: Request, handler) -> Response:
# Path guard: let robots.txt through.
if request.url.path == b"/robots.txt":
return await handler(request)
# BlackSheep uses bytes for header names — always lowercase bytes.
# get_first_header() returns a Header object or None.
header = request.get_first_header(b"user-agent")
# .value is bytes — decode to str for pattern matching.
ua: str = header.value.decode("utf-8", errors="replace") if header else ""
if is_ai_bot(ua):
# Return a Response directly — do NOT call handler(request).
# Headers are (bytes, bytes) tuples in BlackSheep.
return Response(
403,
headers=[(b"X-Robots-Tag", b"noai, noimageai")],
content=TextContent("Forbidden"),
)
# Pass through: call the next handler and inject X-Robots-Tag.
response = await handler(request)
response.add_header(b"X-Robots-Tag", b"noai, noimageai")
return response
# ── Routes ────────────────────────────────────────────────────────────────────
@app.router.get("/")
async def index(request: Request) -> Response:
return Response(200, content=TextContent("Hello"))
@app.router.get("/api/data")
async def api_data(request: Request) -> Response:
from blacksheep.contents import JSONContent
return Response(200, content=JSONContent({"data": "value"}))
@app.router.get("/robots.txt")
async def robots_txt(request: Request) -> Response:
content = """User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
"""
return Response(200, content=TextContent(content))3. Class-based middleware
A callable class allows constructor injection of dependencies (config, Redis client, allow-list). Register via app.middlewares.append(). Middleware is applied in reverse insertion order — last appended runs first.
# Alternative: middleware as a callable class
# Useful when the middleware needs injected dependencies (DB, Redis, config).
from blacksheep import Request, Response
from blacksheep.contents import TextContent
from bot_utils import is_ai_bot
class AiBotBlocker:
"""Middleware class — instantiated once, called for every request."""
async def __call__(self, request: Request, handler) -> Response:
if request.url.path == b"/robots.txt":
return await handler(request)
header = request.get_first_header(b"user-agent")
ua: str = header.value.decode("utf-8", errors="replace") if header else ""
if is_ai_bot(ua):
return Response(
403,
headers=[(b"X-Robots-Tag", b"noai, noimageai")],
content=TextContent("Forbidden"),
)
response = await handler(request)
response.add_header(b"X-Robots-Tag", b"noai, noimageai")
return response
# Register with app.middlewares.append()
# Note: middlewares are applied in reverse order — last added runs first.
app.middlewares.append(AiBotBlocker())4. Header access patterns
Four equivalent ways to access the User-Agent header in BlackSheep. All return bytes-based values. The key invariant: all header names in BlackSheep are stored and looked up as lowercase bytes — use b"user-agent", not b"User-Agent".
# Header access patterns in BlackSheep — all equivalent for User-Agent
# 1. get_first_header() — returns Header object or None
header = request.get_first_header(b"user-agent")
ua = header.value.decode() if header else ""
# 2. try_get_single_header() — same as get_first_header for most headers
header = request.try_get_single_header(b"user-agent")
ua = header.value.decode() if header else ""
# 3. headers property — iterate all headers
for name, value in request.headers:
if name == b"user-agent":
ua = value.decode()
break
# 4. get_headers() — returns list of Header objects with that name
headers = request.get_headers(b"user-agent")
ua = headers[0].value.decode() if headers else ""
# Key point: ALL header names in BlackSheep are bytes, lowercase.
# "User-Agent", "user-agent", b"User-Agent" — all normalised to b"user-agent".
# Always use lowercase bytes literals: b"user-agent" not b"User-Agent".5. With OpenAPI and dependency injection
BlackSheep has built-in OpenAPI support and a DI container. Middleware registration is the same regardless of other features — @app.middleware works alongside DI-injected handlers and OpenAPI docs.
# BlackSheep with OpenAPI docs — dependency injection pattern
# BlackSheep has built-in DI; middleware receives the app's services.
from blacksheep import Application
from blacksheep.server.openapi.v3 import OpenAPIHandler
from openapidocs.v3 import Info
app = Application()
# OpenAPI docs (optional)
docs = OpenAPIHandler(ui_path="/docs", info=Info(title="My API", version="0.1.0"))
docs.bind_app(app)
# Middleware is registered the same way regardless of other features
@app.middleware
async def block_ai_bots(request, handler):
header = request.get_first_header(b"user-agent")
ua = header.value.decode() if header else ""
if is_ai_bot(ua):
return Response(403, headers=[(b"X-Robots-Tag", b"noai, noimageai")],
content=TextContent("Forbidden"))
response = await handler(request)
response.add_header(b"X-Robots-Tag", b"noai, noimageai")
return response6. ASGI deployment — Uvicorn / Hypercorn
# Run with Uvicorn (ASGI server)
# uvicorn app:app --host 0.0.0.0 --port 8080 --workers 4
# Or Gunicorn with Uvicorn workers:
# gunicorn app:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8080
# Or Hypercorn (also ASGI-compatible):
# hypercorn app:app --bind 0.0.0.0:8080 --workers 4
# BlackSheep also has a built-in dev server:
# python app.py (if app.run() is called in __main__)
# Run programmatically:
if __name__ == "__main__":
import uvicorn
uvicorn.run("app:app", host="0.0.0.0", port=8080, reload=True)7. robots.txt
# static/robots.txt (served via Nginx upstream, or as a BlackSheep route)
User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /Key points
- Header names are lowercase bytes: BlackSheep stores and looks up headers using bytes, normalised to lowercase. Use
b"user-agent"— not"User-Agent"orb"User-Agent". Passing a string toget_first_header()raises aTypeError; passing uppercase bytes returnsNone(the header won't be found after normalisation). - get_first_header() returns None — decode .value: The return value is a
Headerobject with a.valuebytes attribute, orNonewhen absent. Always check forNonebefore accessing.value. Useerrors="replace"indecode()to handle malformed UA strings without raising. - Do not call handler after returning a Response: Returning a
Responseobject from the middleware sends it to the client. Callingawait handler(request)after returning would invoke the route handler but its response would be discarded — and could cause double-processing side effects. Alwaysreturnimmediately after building the block response. - response.add_header() for pass-through: To add
X-Robots-Tagto passing responses, callresponse.add_header(b"X-Robots-Tag", b"noai, noimageai")on the response returned byawait handler(request). Both name and value are bytes. - Middleware order is reversed: Middleware registered with
@app.middlewareor appended viaapp.middlewares.append()is applied in reverse order — the last registered middleware runs first. Register the bot blocker last to ensure it runs outermost (first for requests, last for responses). - url.path is bytes: In BlackSheep,
request.url.pathis a bytes object. Compare with bytes literals:request.url.path == b"/robots.txt". Using a string comparison raisesTypeError.
Framework comparison — Python async frameworks
| Framework | Middleware | Block | UA header |
|---|---|---|---|
| BlackSheep | @app.middleware async (req, handler) | return Response(403) without calling handler | req.get_first_header(b"user-agent").value.decode() |
| FastAPI | @app.middleware("http") async (req, call_next) | return Response(status_code=403) | request.headers.get("user-agent", "") |
| Starlette | BaseHTTPMiddleware subclass | return Response("Forbidden", status_code=403) | request.headers.get("user-agent", "") |
| Quart | @app.before_request async def | abort(403) or return response | request.headers.get("User-Agent", "") |
BlackSheep's bytes-based header API is the most distinctive among Python frameworks — it trades API ergonomics for performance. FastAPI, Starlette, and Quart all use string-based header access with case-insensitive lookup. If you're migrating from FastAPI to BlackSheep, the header access pattern is the main thing to update — the middleware structure is similar.
Dependencies
pip install blacksheep
pip install uvicorn[standard] # ASGI server
pip install gunicorn # optional process manager
# Optional extras
pip install blacksheep[full] # includes openapi, jinja2, etc.
# Run development server
uvicorn app:app --reload --host 0.0.0.0 --port 8080
# Production
uvicorn app:app --workers 4 --host 0.0.0.0 --port 8080
# BlackSheep version: 2.x (current)
# Python 3.8+ required; 3.11+ recommended for best async performance