How to Block AI Bots on Tornado (Python): Complete 2026 Guide
Tornado is Python's original async web framework — built before asyncio, used by Jupyter Notebook, and still a top choice for WebSocket servers and long-polling APIs. Unlike FastAPI and Starlette, Tornado has no middleware stack. Bot blocking uses a BaseHandler class with a prepare() override that all route handlers inherit.
Tornado has no middleware — use BaseHandler
Tornado's design is class-based, not middleware-based. The idiomatic approach is to create a BaseHandler(RequestHandler) that overrides prepare(), and have every route handler in your app extend it. prepare() runs before get(), post(), or any HTTP verb handler — it is Tornado's per-request entry point, equivalent to middleware in Flask, Express, or Gin.
Protection layers
Layer 1: robots.txt
Register /robots.txt using StaticFileHandler as the first route in your application. Tornado matches routes in order — placing it first ensures static files are served before any bot-blocking logic.
# static/robots.txt User-agent: * Allow: / User-agent: GPTBot User-agent: ClaudeBot User-agent: anthropic-ai User-agent: Google-Extended User-agent: CCBot User-agent: Bytespider User-agent: Applebot-Extended User-agent: PerplexityBot User-agent: Diffbot User-agent: cohere-ai User-agent: FacebookBot User-agent: omgili User-agent: omgilibot User-agent: Amazonbot User-agent: DeepSeekBot User-agent: MistralBot User-agent: xAI-Bot User-agent: AI2Bot Disallow: /
import tornado.web
app = tornado.web.Application([
# Static files — registered FIRST so they bypass BaseHandler
(r"/robots.txt", tornado.web.StaticFileHandler, {"path": "static/robots.txt"}),
(r"/static/(.*)", tornado.web.StaticFileHandler, {"path": "static/"}),
# Your route handlers — all extend BaseHandler
(r"/", HomeHandler),
(r"/api/data", DataHandler),
])Layer 2: noai meta tag
Override get_template_namespace() in BaseHandler to inject a robots variable into every template automatically:
BaseHandler — inject robots into template namespace
class BaseHandler(tornado.web.RequestHandler):
def get_template_namespace(self):
ns = super().get_template_namespace()
ns['robots'] = getattr(self, '_robots', 'noai, noimageai')
return nstemplates/base.html (Tornado uses Python string templates or Jinja2)
<meta name="robots" content="{{ robots }}">Per-handler override
class PublicBlogHandler(BaseHandler):
async def get(self):
self._robots = 'index, follow' # overrides default
self.render('blog.html')Layers 3 & 4: BaseHandler with prepare()
prepare() is called before every HTTP method handler. Override it in BaseHandler and have all your handlers extend it:
handlers/base.py
import tornado.web
AI_BOT_PATTERNS = [
"gptbot", "chatgpt-user", "oai-searchbot",
"claudebot", "anthropic-ai", "claude-web",
"google-extended", "ccbot", "bytespider",
"applebot-extended", "perplexitybot", "diffbot",
"cohere-ai", "facebookbot", "meta-externalagent",
"omgili", "omgilibot", "amazonbot",
"deepseekbot", "mistralbot", "xai-bot", "ai2-bot",
]
EXEMPT_PATHS = {"/robots.txt", "/sitemap.xml", "/favicon.ico"}
class BaseHandler(tornado.web.RequestHandler):
# Set to True in a specific handler to bypass bot blocking
ALLOW_AI_BOTS: bool = False
def prepare(self):
# Exempt paths always pass through
if self.request.path in EXEMPT_PATHS:
return
# Per-handler opt-out
if self.ALLOW_AI_BOTS:
return
ua = self.request.headers.get("User-Agent", "").lower()
for pattern in AI_BOT_PATTERNS:
if pattern in ua:
# Layer 4: hard 403 block
# send_error() writes status, calls finish(), raises Finish
# The get()/post() method never runs after this
self.send_error(403)
return
# Layer 3: set X-Robots-Tag for all legitimate requests
self.set_header("X-Robots-Tag", "noai, noimageai")Key points
- Blocking:
self.send_error(403)writes the HTTP 403 response and raises atornado.web.Finishexception that stops handler execution. Theget()orpost()method is never called. Thereturnafter it is a safety guard in case Finish is caught upstream. - Reading User-Agent:
self.request.headers.get("User-Agent", "")—HTTPHeadersis case-insensitive. The empty string default avoidsAttributeErroron.lower()when bots omit the header entirely. - X-Robots-Tag:
self.set_header()queues the header for the response. Call it inprepare()beforeget()/post()runs — safe because Tornado buffers headers untilfinish()is called. - Per-handler opt-out: Set
ALLOW_AI_BOTS = Trueon any handler class to bypass bot blocking for that route.
Route handlers extending BaseHandler
from handlers.base import BaseHandler
class HomeHandler(BaseHandler):
async def get(self):
# prepare() already ran — bot blocked or X-Robots-Tag set
self.write("Hello, World!")
class ApiHandler(BaseHandler):
async def get(self):
self.set_header("Content-Type", "application/json")
self.write('{"status": "ok"}')
class PublicFeedHandler(BaseHandler):
# This route intentionally allows AI crawlers
ALLOW_AI_BOTS = True
async def get(self):
self.write("Public RSS feed — AI bots welcome")
class WebSocketHandler(tornado.websocket.WebSocketHandler, BaseHandler):
# WebSocket handlers can also extend BaseHandler for prepare() checks
def open(self):
self.write_message("Connected")
# Application setup
app = tornado.web.Application([
(r"/robots.txt", tornado.web.StaticFileHandler, {"path": "static/robots.txt"}),
(r"/", HomeHandler),
(r"/api/data", ApiHandler),
(r"/feed", PublicFeedHandler),
])Async prepare() (Tornado 6+)
prepare() can be async — useful if you need to look up IP reputation or check a rate-limit store before deciding to block:
class BaseHandler(tornado.web.RequestHandler):
async def prepare(self):
"""Async prepare — Tornado 6.1+ supports async prepare()."""
if self.request.path in EXEMPT_PATHS:
return
ua = self.request.headers.get("User-Agent", "").lower()
for pattern in AI_BOT_PATTERNS:
if pattern in ua:
self.send_error(403)
return
# Could await a Redis rate-limit check here
# allowed = await self.settings["redis"].get(ip_key)
self.set_header("X-Robots-Tag", "noai, noimageai")Tornado calls await prepare() automatically when it detects a coroutine. No configuration change needed — just add async.
Comparison: Tornado vs FastAPI vs Django
Tornado — BaseHandler.prepare()
class BaseHandler(RequestHandler):
def prepare(self):
ua = self.request.headers.get("User-Agent", "").lower()
if any(p in ua for p in AI_BOT_PATTERNS):
self.send_error(403)FastAPI / Starlette — BaseHTTPMiddleware
class AiBotBlocker(BaseHTTPMiddleware):
async def dispatch(self, request, call_next):
ua = request.headers.get("user-agent", "").lower()
if any(p in ua for p in AI_BOT_PATTERNS):
return Response("Forbidden", status_code=403)
return await call_next(request)Django — process_request() middleware
class AiBotBlockerMiddleware:
def __init__(self, get_response): self.get_response = get_response
def __call__(self, request):
ua = request.META.get("HTTP_USER_AGENT", "").lower()
if any(p in ua for p in AI_BOT_PATTERNS):
return HttpResponseForbidden("Forbidden")
return self.get_response(request)All three patterns achieve the same result. Tornado's prepare() is the least obvious but idiomatic — it is how Tornado's own auth mixins (e.g., tornado.auth) work.
Running Tornado
# Install
pip install tornado
# app.py
import asyncio
import tornado.web
import tornado.ioloop
from handlers.base import BaseHandler
class HomeHandler(BaseHandler):
async def get(self):
self.write("Hello!")
def make_app():
return tornado.web.Application([
(r"/robots.txt", tornado.web.StaticFileHandler, {"path": "static/robots.txt"}),
(r"/", HomeHandler),
])
if __name__ == "__main__":
app = make_app()
app.listen(8888)
asyncio.get_event_loop().run_forever()
# Run
python app.py
# Production — run multiple processes (one per CPU core)
# tornado.process.fork_processes(0) # 0 = one per CPUTornado runs its own IOLoop — no ASGI server (uvicorn, gunicorn) needed. For server-level blocking before Python runs, place nginx in front and use a map $http_user_agent block. See the nginx guide.
Verification
# Should return 403 (blocked AI bot) curl -I -A "GPTBot" http://localhost:8888/ # Should return 200 (regular browser) curl -I -A "Mozilla/5.0" http://localhost:8888/ # robots.txt must always return 200 curl -I -A "GPTBot" http://localhost:8888/robots.txt # Check X-Robots-Tag on legitimate request curl -si -A "Mozilla/5.0" http://localhost:8888/ | grep -i x-robots
Default Tornado port is 8888. Expected: GPTBot → 403. Mozilla/5.0 → 200 with X-Robots-Tag: noai, noimageai. robots.txt → 200 for any user agent.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.