How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

FlaskPythonNew8 min read

How to Block AI Bots on Flask

Flask is intentionally minimal — no built-in robots.txt, no user agent blocking. You add what you need: a @app.route('/robots.txt') for polite opt-out, a @app.before_request hook for hard 403 blocking before any view runs, or a Werkzeug WSGI middleware for zero-framework overhead at the WSGI layer.

Quick fix — robots.txt route + before_request

Add to your app.py or main Flask module.

import re
from flask import Flask, request, make_response, abort

app = Flask(__name__)

AI_BOT_PATTERN = re.compile(
    r"GPTBot|ClaudeBot|CCBot|Bytespider|Google-Extended",
    re.IGNORECASE,
)

@app.before_request
def block_ai_bots():
    ua = request.headers.get("User-Agent", "")
    if AI_BOT_PATTERN.search(ua):
        abort(403)

@app.route("/robots.txt")
def robots_txt():
    content = """User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: *
Allow: /
"""
    return make_response(content, 200, {"Content-Type": "text/plain"})

All Methods

robots.txt route (Recommended)

Easy

All deployments

@app.route("/robots.txt")

A Flask route that returns a plain text Response listing all AI bots with Disallow: /. Flask has no built-in robots.txt — this route is required. Works on every deployment.

Flask serves the static/ directory at /static/ by default, not /. A route is the most reliable way to serve robots.txt at the correct path.

before_request hook — hard blocking

Easy

All deployments

@app.before_request

Flask's before_request hook runs before every view. Return a 403 response when the User-Agent matches an AI bot pattern — no view function runs, no template is rendered.

Use app.before_request for the main app, or Blueprint.before_request for blueprint-scoped blocking. Combine with robots.txt route for layered protection.

Werkzeug WSGI middleware

Easy

All deployments

app.wsgi_app = BlockAiBotsMiddleware(app.wsgi_app)

A Werkzeug WSGI middleware class that intercepts requests at the WSGI layer — before Flask creates a request context. Lowest overhead, most portable across WSGI servers.

Wrap app.wsgi_app, not app itself. This preserves Flask error handling and debug mode.

X-Robots-Tag response header

Easy

All deployments

@app.after_request

Add X-Robots-Tag: noai, noimageai to every response via Flask's after_request hook. Works for any content type — HTML, JSON, XML.

HTTP headers are read by compliant crawlers regardless of content type. Useful as a belt-and-suspenders signal alongside robots.txt.

Jinja2 template noai meta tag

Easy

HTML-rendering Flask apps

templates/base.html

Add <meta name="robots" content="noai, noimageai"> to your base Jinja2 template. All pages that extend it inherit the tag automatically.

Applicable only when Flask renders HTML. API-only Flask apps should use X-Robots-Tag header instead.

nginx — server-level block

Intermediate

nginx + gunicorn deployments

nginx server block config

Match AI bot user agents in nginx and return 403 before gunicorn and Flask receive the request. Most efficient — zero Python overhead for blocked bots.

Requires server access (VPS). Not available on Railway, Render, or Heroku without custom infrastructure.

Method 1: robots.txt Route

Flask serves files from the static/ folder at /static/filename — not at /filename. You must define a route to serve robots.txt at the path crawlers expect:

# app.py
import re
from flask import Flask, request, make_response

app = Flask(__name__)

AI_BOTS = [
    "GPTBot", "ChatGPT-User", "OAI-SearchBot",
    "ClaudeBot", "anthropic-ai", "Google-Extended",
    "Bytespider", "CCBot", "PerplexityBot",
    "meta-externalagent", "Amazonbot", "Applebot-Extended",
    "xAI-Bot", "DeepSeekBot", "MistralBot", "Diffbot",
    "cohere-ai", "AI2Bot", "Ai2Bot-Dolma", "YouBot",
    "DuckAssistBot", "omgili", "omgilibot",
    "webzio-extended", "gemini-deep-research",
]

@app.route("/robots.txt")
def robots_txt():
    lines = []
    for bot in AI_BOTS:
        lines += [f"User-agent: {bot}", "Disallow: /", ""]
    lines += ["User-agent: *", "Allow: /", ""]
    lines.append(f"Sitemap: {request.url_root}sitemap.xml")

    response = make_response("\n".join(lines))
    response.headers["Content-Type"] = "text/plain"
    return response

Blueprint note: If your Flask app uses Blueprints, register the robots.txt route on the main app (not a blueprint) so it is accessible at /robots.txt regardless of blueprint URL prefix.

Method 2: before_request Hook (Hard Blocking)

Flask's @before_request decorator runs a function before every view. Return a non-None response and Flask short-circuits — the view function never runs. Use abort(403) or return a make_response tuple:

# app.py
import re
from flask import Flask, request, abort

app = Flask(__name__)

BLOCKED_UA = re.compile(
    r"GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai"
    r"|Google-Extended|Bytespider|CCBot|PerplexityBot"
    r"|meta-externalagent|Amazonbot|Applebot-Extended|xAI-Bot"
    r"|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|Ai2Bot-Dolma"
    r"|YouBot|DuckAssistBot|omgili|omgilibot|webzio-extended"
    r"|gemini-deep-research",
    re.IGNORECASE,
)

@app.before_request
def block_ai_bots():
    ua = request.headers.get("User-Agent", "")
    if BLOCKED_UA.search(ua):
        abort(403)  # raises HTTPException — no view runs

For Blueprint-specific blocking (e.g. block AI bots from an API blueprint but allow them on a public docs blueprint):

# blueprints/api.py
from flask import Blueprint, request, abort

api_bp = Blueprint("api", __name__, url_prefix="/api")

@api_bp.before_request
def block_ai_bots_from_api():
    ua = request.headers.get("User-Agent", "")
    if BLOCKED_UA.search(ua):
        abort(403)

Method 3: Werkzeug WSGI Middleware

For blocking at the WSGI layer — before Flask creates a request context or runs any middleware — wrap app.wsgi_app with a custom WSGI callable. This is the lowest-level approach and works with any WSGI server (gunicorn, uWSGI, waitress):

# wsgi_middleware.py
import re

BLOCKED_UA = re.compile(
    r"GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai"
    r"|Google-Extended|Bytespider|CCBot|PerplexityBot"
    r"|meta-externalagent|Amazonbot|Applebot-Extended|xAI-Bot"
    r"|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|Ai2Bot-Dolma"
    r"|YouBot|DuckAssistBot|omgili|omgilibot|webzio-extended"
    r"|gemini-deep-research",
    re.IGNORECASE,
)

class BlockAiBotsMiddleware:
    """Werkzeug WSGI middleware — intercepts before Flask processes the request."""

    def __init__(self, wsgi_app):
        self.wsgi_app = wsgi_app

    def __call__(self, environ, start_response):
        ua = environ.get("HTTP_USER_AGENT", "")
        if BLOCKED_UA.search(ua):
            start_response("403 Forbidden", [("Content-Type", "text/plain")])
            return [b"Forbidden"]
        return self.wsgi_app(environ, start_response)

Apply it in app.py — wrap app.wsgi_app, not app:

# app.py
from flask import Flask
from wsgi_middleware import BlockAiBotsMiddleware

app = Flask(__name__)
app.wsgi_app = BlockAiBotsMiddleware(app.wsgi_app)  # wrap wsgi_app, not app

# ... routes below

Wrap wsgi_app, not app

Always wrap app.wsgi_app — wrapping app directly replaces the Flask object and breaks the debug server, interactive debugger, and test client. Wrapping app.wsgi_app preserves all Flask tooling while adding WSGI-layer middleware.

Method 4: X-Robots-Tag Response Header

Add X-Robots-Tag: noai, noimageai to every response using Flask's @after_request hook. Works for any content type — HTML, JSON, XML, CSV:

# app.py
from flask import Flask
app = Flask(__name__)

@app.after_request
def add_robots_header(response):
    response.headers["X-Robots-Tag"] = "noai, noimageai"
    return response

Method 5: noai Meta Tag in Base Template

For Flask apps that render HTML with Jinja2, add the meta tag to your base template. All child templates that use {% extends "base.html" %} inherit it:

{# templates/base.html #}
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>{% block title %}My App{% endblock %}</title>

    {% block robots_meta %}
    <meta name="robots" content="noai, noimageai">
    {% endblock %}
</head>
<body>
    {% block content %}{% endblock %}
</body>
</html>

{# templates/public_page.html — allow AI on this page only #}
{% extends "base.html" %}
{% block robots_meta %}
<meta name="robots" content="index, follow">
{% endblock %}

Method 6: nginx (Server-Level Blocking)

Production Flask deployments typically run gunicorn behind nginx. Block AI bots in nginx before gunicorn processes the request:

# /etc/nginx/sites-available/yourapp.conf
server {
    listen 80;
    server_name yourdomain.com;

    # Block AI training crawlers at the edge
    if ($http_user_agent ~* "(GPTBot|ClaudeBot|anthropic-ai|CCBot|Bytespider|Google-Extended|PerplexityBot|Diffbot|DeepSeekBot|MistralBot|cohere-ai|meta-externalagent|Amazonbot|xAI-Bot|AI2Bot|omgili|webzio-extended|gemini-deep-research|OAI-SearchBot|ChatGPT-User)") {
        return 403;
    }

    location / {
        proxy_pass http://127.0.0.1:8000;  # gunicorn
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

# Start gunicorn (production)
gunicorn app:app --workers 4 --bind 127.0.0.1:8000

# Reload nginx after config change
sudo nginx -t && sudo systemctl reload nginx

PaaS deployments (Heroku, Railway, Render)

On managed platforms without nginx access, use the Python-layer methods:

✓ @app.before_request — 403 response before any view
✓ BlockAiBotsMiddleware wrapping app.wsgi_app
✓ @app.route('/robots.txt') — polite opt-out for compliant crawlers
✓ Cloudflare in front of your PaaS domain — WAF custom rules for edge blocking

AI Bots to Block

25 user agents covering AI training crawlers and AI search bots. The route and middleware patterns above include all of them.

GPTBotChatGPT-UserOAI-SearchBotClaudeBotanthropic-aiGoogle-ExtendedBytespiderCCBotPerplexityBotmeta-externalagentAmazonbotApplebot-ExtendedxAI-BotDeepSeekBotMistralBotDiffbotcohere-aiAI2BotAi2Bot-DolmaYouBotDuckAssistBotomgiliomgilibotwebzio-extendedgemini-deep-research

Frequently Asked Questions

How do I serve robots.txt from a Flask app?

Add a GET route for '/robots.txt' that returns a Response or make_response() with mimetype='text/plain'. Flask does not serve a robots.txt automatically — you must define the route. Alternatively, place a robots.txt file in your static/ directory and it is served automatically at /static/robots.txt, but crawlers expect it at /robots.txt so the route approach is more reliable.

What is the before_request hook in Flask?

The before_request hook is a Flask decorator that runs a function before every request is dispatched to a view. If the function returns a Response (or tuple), Flask returns it immediately — the view function is never called. This makes it ideal for bot blocking: check request.user_agent.string, and if it matches an AI bot pattern, return a 403 response with abort(403) or return make_response('Forbidden', 403).

What is the difference between before_request and Werkzeug WSGI middleware for bot blocking?

before_request runs inside Flask — Flask has already parsed the HTTP request, created the request context, and matched the URL. Werkzeug WSGI middleware runs at the WSGI layer, before Flask does any processing. For bot blocking, the practical difference is minimal. before_request is simpler to write and maintain. Werkzeug middleware has slightly lower overhead and works even if Flask raises an error before before_request runs, but this is a rare edge case.

How do I add noai meta tags to every Flask HTML page?

Add the meta tag to your base Jinja2 template — typically templates/base.html or templates/layout.html. Place <meta name="robots" content="noai, noimageai"> inside the <head> block. All templates that extend this base with {% extends 'base.html' %} inherit the tag. For per-page override, define a {% block robots_meta %} in base.html that child templates can override with a different robots value.

Does Flask have a built-in robots.txt or user agent blocking feature?

No. Flask has no built-in robots.txt serving or user agent blocking. Django has DISALLOWED_USER_AGENTS (processed by CommonMiddleware). Flask is intentionally minimal — you add what you need. The three approaches are: a /robots.txt route, a before_request hook, or a Werkzeug WSGI middleware wrapper. All three can be combined for layered protection.

How do I deploy Flask with nginx and block AI bots at the server level?

Run Flask with gunicorn behind nginx as a reverse proxy. In the nginx server block, add an if block that checks $http_user_agent against AI bot names and returns 403 before gunicorn receives the request. This is the most efficient approach — no Python code runs for blocked bots. Requires VPS access (DigitalOcean, EC2, Hetzner). PaaS platforms (Heroku, Railway, Render) require Python-layer blocking instead.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.