How to Block AI Bots on Flask
Flask is intentionally minimal — no built-in robots.txt, no user agent blocking. You add what you need: a @app.route('/robots.txt') for polite opt-out, a @app.before_request hook for hard 403 blocking before any view runs, or a Werkzeug WSGI middleware for zero-framework overhead at the WSGI layer.
Quick fix — robots.txt route + before_request
Add to your app.py or main Flask module.
import re
from flask import Flask, request, make_response, abort
app = Flask(__name__)
AI_BOT_PATTERN = re.compile(
r"GPTBot|ClaudeBot|CCBot|Bytespider|Google-Extended",
re.IGNORECASE,
)
@app.before_request
def block_ai_bots():
ua = request.headers.get("User-Agent", "")
if AI_BOT_PATTERN.search(ua):
abort(403)
@app.route("/robots.txt")
def robots_txt():
content = """User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: *
Allow: /
"""
return make_response(content, 200, {"Content-Type": "text/plain"})All Methods
robots.txt route (Recommended)
EasyAll deployments
@app.route("/robots.txt")
A Flask route that returns a plain text Response listing all AI bots with Disallow: /. Flask has no built-in robots.txt — this route is required. Works on every deployment.
Flask serves the static/ directory at /static/ by default, not /. A route is the most reliable way to serve robots.txt at the correct path.
before_request hook — hard blocking
EasyAll deployments
@app.before_request
Flask's before_request hook runs before every view. Return a 403 response when the User-Agent matches an AI bot pattern — no view function runs, no template is rendered.
Use app.before_request for the main app, or Blueprint.before_request for blueprint-scoped blocking. Combine with robots.txt route for layered protection.
Werkzeug WSGI middleware
EasyAll deployments
app.wsgi_app = BlockAiBotsMiddleware(app.wsgi_app)
A Werkzeug WSGI middleware class that intercepts requests at the WSGI layer — before Flask creates a request context. Lowest overhead, most portable across WSGI servers.
Wrap app.wsgi_app, not app itself. This preserves Flask error handling and debug mode.
X-Robots-Tag response header
EasyAll deployments
@app.after_request
Add X-Robots-Tag: noai, noimageai to every response via Flask's after_request hook. Works for any content type — HTML, JSON, XML.
HTTP headers are read by compliant crawlers regardless of content type. Useful as a belt-and-suspenders signal alongside robots.txt.
Jinja2 template noai meta tag
EasyHTML-rendering Flask apps
templates/base.html
Add <meta name="robots" content="noai, noimageai"> to your base Jinja2 template. All pages that extend it inherit the tag automatically.
Applicable only when Flask renders HTML. API-only Flask apps should use X-Robots-Tag header instead.
nginx — server-level block
Intermediatenginx + gunicorn deployments
nginx server block config
Match AI bot user agents in nginx and return 403 before gunicorn and Flask receive the request. Most efficient — zero Python overhead for blocked bots.
Requires server access (VPS). Not available on Railway, Render, or Heroku without custom infrastructure.
Method 1: robots.txt Route
Flask serves files from the static/ folder at /static/filename — not at /filename. You must define a route to serve robots.txt at the path crawlers expect:
# app.py
import re
from flask import Flask, request, make_response
app = Flask(__name__)
AI_BOTS = [
"GPTBot", "ChatGPT-User", "OAI-SearchBot",
"ClaudeBot", "anthropic-ai", "Google-Extended",
"Bytespider", "CCBot", "PerplexityBot",
"meta-externalagent", "Amazonbot", "Applebot-Extended",
"xAI-Bot", "DeepSeekBot", "MistralBot", "Diffbot",
"cohere-ai", "AI2Bot", "Ai2Bot-Dolma", "YouBot",
"DuckAssistBot", "omgili", "omgilibot",
"webzio-extended", "gemini-deep-research",
]
@app.route("/robots.txt")
def robots_txt():
lines = []
for bot in AI_BOTS:
lines += [f"User-agent: {bot}", "Disallow: /", ""]
lines += ["User-agent: *", "Allow: /", ""]
lines.append(f"Sitemap: {request.url_root}sitemap.xml")
response = make_response("\n".join(lines))
response.headers["Content-Type"] = "text/plain"
return response/robots.txt regardless of blueprint URL prefix.Method 2: before_request Hook (Hard Blocking)
Flask's @before_request decorator runs a function before every view. Return a non-None response and Flask short-circuits — the view function never runs. Use abort(403) or return a make_response tuple:
# app.py
import re
from flask import Flask, request, abort
app = Flask(__name__)
BLOCKED_UA = re.compile(
r"GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai"
r"|Google-Extended|Bytespider|CCBot|PerplexityBot"
r"|meta-externalagent|Amazonbot|Applebot-Extended|xAI-Bot"
r"|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|Ai2Bot-Dolma"
r"|YouBot|DuckAssistBot|omgili|omgilibot|webzio-extended"
r"|gemini-deep-research",
re.IGNORECASE,
)
@app.before_request
def block_ai_bots():
ua = request.headers.get("User-Agent", "")
if BLOCKED_UA.search(ua):
abort(403) # raises HTTPException — no view runsFor Blueprint-specific blocking (e.g. block AI bots from an API blueprint but allow them on a public docs blueprint):
# blueprints/api.py
from flask import Blueprint, request, abort
api_bp = Blueprint("api", __name__, url_prefix="/api")
@api_bp.before_request
def block_ai_bots_from_api():
ua = request.headers.get("User-Agent", "")
if BLOCKED_UA.search(ua):
abort(403)Method 3: Werkzeug WSGI Middleware
For blocking at the WSGI layer — before Flask creates a request context or runs any middleware — wrap app.wsgi_app with a custom WSGI callable. This is the lowest-level approach and works with any WSGI server (gunicorn, uWSGI, waitress):
# wsgi_middleware.py
import re
BLOCKED_UA = re.compile(
r"GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai"
r"|Google-Extended|Bytespider|CCBot|PerplexityBot"
r"|meta-externalagent|Amazonbot|Applebot-Extended|xAI-Bot"
r"|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|Ai2Bot-Dolma"
r"|YouBot|DuckAssistBot|omgili|omgilibot|webzio-extended"
r"|gemini-deep-research",
re.IGNORECASE,
)
class BlockAiBotsMiddleware:
"""Werkzeug WSGI middleware — intercepts before Flask processes the request."""
def __init__(self, wsgi_app):
self.wsgi_app = wsgi_app
def __call__(self, environ, start_response):
ua = environ.get("HTTP_USER_AGENT", "")
if BLOCKED_UA.search(ua):
start_response("403 Forbidden", [("Content-Type", "text/plain")])
return [b"Forbidden"]
return self.wsgi_app(environ, start_response)Apply it in app.py — wrap app.wsgi_app, not app:
# app.py from flask import Flask from wsgi_middleware import BlockAiBotsMiddleware app = Flask(__name__) app.wsgi_app = BlockAiBotsMiddleware(app.wsgi_app) # wrap wsgi_app, not app # ... routes below
Wrap wsgi_app, not app
Always wrap app.wsgi_app — wrapping app directly replaces the Flask object and breaks the debug server, interactive debugger, and test client. Wrapping app.wsgi_app preserves all Flask tooling while adding WSGI-layer middleware.
Method 4: X-Robots-Tag Response Header
Add X-Robots-Tag: noai, noimageai to every response using Flask's @after_request hook. Works for any content type — HTML, JSON, XML, CSV:
# app.py
from flask import Flask
app = Flask(__name__)
@app.after_request
def add_robots_header(response):
response.headers["X-Robots-Tag"] = "noai, noimageai"
return responseMethod 5: noai Meta Tag in Base Template
For Flask apps that render HTML with Jinja2, add the meta tag to your base template. All child templates that use {% extends "base.html" %} inherit it:
{# templates/base.html #}
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>{% block title %}My App{% endblock %}</title>
{% block robots_meta %}
<meta name="robots" content="noai, noimageai">
{% endblock %}
</head>
<body>
{% block content %}{% endblock %}
</body>
</html>
{# templates/public_page.html — allow AI on this page only #}
{% extends "base.html" %}
{% block robots_meta %}
<meta name="robots" content="index, follow">
{% endblock %}Method 6: nginx (Server-Level Blocking)
Production Flask deployments typically run gunicorn behind nginx. Block AI bots in nginx before gunicorn processes the request:
# /etc/nginx/sites-available/yourapp.conf
server {
listen 80;
server_name yourdomain.com;
# Block AI training crawlers at the edge
if ($http_user_agent ~* "(GPTBot|ClaudeBot|anthropic-ai|CCBot|Bytespider|Google-Extended|PerplexityBot|Diffbot|DeepSeekBot|MistralBot|cohere-ai|meta-externalagent|Amazonbot|xAI-Bot|AI2Bot|omgili|webzio-extended|gemini-deep-research|OAI-SearchBot|ChatGPT-User)") {
return 403;
}
location / {
proxy_pass http://127.0.0.1:8000; # gunicorn
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}# Start gunicorn (production) gunicorn app:app --workers 4 --bind 127.0.0.1:8000 # Reload nginx after config change sudo nginx -t && sudo systemctl reload nginx
PaaS deployments (Heroku, Railway, Render)
On managed platforms without nginx access, use the Python-layer methods:
- ✓
@app.before_request— 403 response before any view - ✓
BlockAiBotsMiddlewarewrappingapp.wsgi_app - ✓
@app.route('/robots.txt')— polite opt-out for compliant crawlers - ✓ Cloudflare in front of your PaaS domain — WAF custom rules for edge blocking
AI Bots to Block
25 user agents covering AI training crawlers and AI search bots. The route and middleware patterns above include all of them.
Frequently Asked Questions
How do I serve robots.txt from a Flask app?
Add a GET route for '/robots.txt' that returns a Response or make_response() with mimetype='text/plain'. Flask does not serve a robots.txt automatically — you must define the route. Alternatively, place a robots.txt file in your static/ directory and it is served automatically at /static/robots.txt, but crawlers expect it at /robots.txt so the route approach is more reliable.
What is the before_request hook in Flask?
The before_request hook is a Flask decorator that runs a function before every request is dispatched to a view. If the function returns a Response (or tuple), Flask returns it immediately — the view function is never called. This makes it ideal for bot blocking: check request.user_agent.string, and if it matches an AI bot pattern, return a 403 response with abort(403) or return make_response('Forbidden', 403).
What is the difference between before_request and Werkzeug WSGI middleware for bot blocking?
before_request runs inside Flask — Flask has already parsed the HTTP request, created the request context, and matched the URL. Werkzeug WSGI middleware runs at the WSGI layer, before Flask does any processing. For bot blocking, the practical difference is minimal. before_request is simpler to write and maintain. Werkzeug middleware has slightly lower overhead and works even if Flask raises an error before before_request runs, but this is a rare edge case.
How do I add noai meta tags to every Flask HTML page?
Add the meta tag to your base Jinja2 template — typically templates/base.html or templates/layout.html. Place <meta name="robots" content="noai, noimageai"> inside the <head> block. All templates that extend this base with {% extends 'base.html' %} inherit the tag. For per-page override, define a {% block robots_meta %} in base.html that child templates can override with a different robots value.
Does Flask have a built-in robots.txt or user agent blocking feature?
No. Flask has no built-in robots.txt serving or user agent blocking. Django has DISALLOWED_USER_AGENTS (processed by CommonMiddleware). Flask is intentionally minimal — you add what you need. The three approaches are: a /robots.txt route, a before_request hook, or a Werkzeug WSGI middleware wrapper. All three can be combined for layered protection.
How do I deploy Flask with nginx and block AI bots at the server level?
Run Flask with gunicorn behind nginx as a reverse proxy. In the nginx server block, add an if block that checks $http_user_agent against AI bot names and returns 403 before gunicorn receives the request. This is the most efficient approach — no Python code runs for blocked bots. Requires VPS access (DigitalOcean, EC2, Hetzner). PaaS platforms (Heroku, Railway, Render) require Python-layer blocking instead.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.