Skip to content
openshadow.io/guides/blocking-ai-bots-php

How to Block AI Bots on PHP: Complete 2026 Guide

PHP powers around 78% of all websites — including millions on shared hosting where server config is locked down. The fastest option everywhere is $_SERVER['HTTP_USER_AGENT'] + preg_match() at the top of your front controller. On shared hosting (cPanel, Bluehost, SiteGround), .htaccess mod_rewrite is the only server-level option — no PHP or nginx access required.

8 min read·Updated April 2026·PHP 7.4+ · Apache · nginx · Shared hosting

Methods overview

Method
Static robots.txt in document root

Always — zero PHP involved

Dynamic robots.php endpoint

Need staging vs production rules

Front controller $_SERVER check

Any PHP app with index.php entry point

.htaccess mod_rewrite (Apache)

Apache / shared hosting (cPanel) — most common

noai meta tag in HTML layout

Any PHP template / layout file

nginx + PHP-FPM block

nginx serving PHP via php-fpm

1. Static robots.txt in document root

Place robots.txt in your public document root — the same directory as index.php (typically public/, public_html/, or httpdocs/). Apache and nginx serve static files directly — PHP is never invoked for this request.

public/robots.txtserved directly — no PHP
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: DeepSeekBot
Disallow: /

User-agent: MistralBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: AI2Bot
Disallow: /

User-agent: Ai2Bot-Dolma
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: DuckAssistBot
Disallow: /

User-agent: omgili
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: webzio-extended
Disallow: /

User-agent: gemini-deep-research
Disallow: /

User-agent: *
Allow: /

2. Dynamic robots.php endpoint

When you need environment-specific rules — block everything on staging, block only AI bots in production — route /robots.txt to a PHP script. The routing happens in .htaccess or nginx config.

robots.php
<?php

$aiBotsDisallow = <<<'EOT'
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: *
Allow: /
EOT;

$blockAll = "User-agent: *
Disallow: /";

header('Content-Type: text/plain; charset=utf-8');

// Use APP_ENV environment variable or a constant defined in your config
$env = getenv('APP_ENV') ?: 'development';
echo ($env === 'production') ? $aiBotsDisallow : $blockAll;

Route /robots.txt to robots.php in .htaccess

.htaccess
RewriteEngine On

# Route /robots.txt to the PHP script (place before WordPress/framework rules)
RewriteRule ^robots.txt$ robots.php [L]

Static vs dynamic: If a static robots.txt file exists in the same directory, Apache will serve the static file and ignore the RewriteRule. Remove the static file when using the dynamic PHP approach — or use RewriteCond %{REQUEST_FILENAME} !-f before the rule.

3. Front controller $_SERVER check

Most PHP applications have a single entry point — index.php. Add a user-agent check at the very top before any output, framework bootstrap, or session start. Use http_response_code(403) and exit to stop execution immediately.

public/index.phpadd at top — before anything else
<?php

// ─── Block AI bots ────────────────────────────────────────────────────────────
// Run before framework bootstrap, session start, or any output.
// Always allow /robots.txt through so bots can read your opt-out rules.

if ($_SERVER['REQUEST_URI'] !== '/robots.txt') {
    $userAgent = $_SERVER['HTTP_USER_AGENT'] ?? '';

    if ($userAgent !== '' && preg_match(
        '/GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|' .
        'Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|' .
        'Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|' .
        'cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|' .
        'webzio-extended|gemini-deep-research/i',
        $userAgent
    )) {
        http_response_code(403);
        header('Content-Type: text/plain');
        exit('Forbidden');
    }
}

// ─── Your application starts here ────────────────────────────────────────────
require_once __DIR__ . '/../vendor/autoload.php';
// ... rest of bootstrap

Extracted to a reusable include

For multi-entry-point applications, extract to a dedicated file and require it at the top of each:

src/block-ai-bots.php
<?php

declare(strict_types=1);

/**
 * Block known AI training and scraping bots.
 * Include at the top of every entry point.
 * Always allows /robots.txt through.
 */
function blockAiBots(): void
{
    if (str_ends_with($_SERVER['REQUEST_URI'] ?? '', '/robots.txt')) {
        return;
    }

    $ua = $_SERVER['HTTP_USER_AGENT'] ?? '';
    if ($ua === '') {
        return;
    }

    $pattern =
        '/GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|' .
        'Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|' .
        'Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|' .
        'cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|' .
        'webzio-extended|gemini-deep-research/i';

    if (preg_match($pattern, $ua)) {
        http_response_code(403);
        header('Content-Type: text/plain');
        exit('Forbidden');
    }
}

blockAiBots();

preg_match vs strpos: preg_match() with /i (case-insensitive) is the right tool here — bot names are alphanumeric and you need partial matching. The overhead of a single regex match against a ~100-character user-agent string is negligible. Avoid chained str_contains() calls — the single regex is cleaner and faster for 25+ patterns.

4. .htaccess mod_rewrite (Apache)

On shared hosting (cPanel, Bluehost, SiteGround, DreamHost), .htaccess is the only server-level blocking option — you have no access to Apache's main config or nginx. This blocks before PHP runs.

.htaccessadd ABOVE WordPress / framework rules
RewriteEngine On

# ─── Block AI bots — place ABOVE any WordPress or framework RewriteRules ───
# [F] = 403 Forbidden, [L] = stop processing rules

# Always allow robots.txt through
RewriteRule ^robots.txt$ - [L]

# Block AI training and scraping bots
RewriteCond %{HTTP_USER_AGENT} GPTBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ChatGPT-User [NC,OR]
RewriteCond %{HTTP_USER_AGENT} OAI-SearchBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ClaudeBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} anthropic-ai [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Google-Extended [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Bytespider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} CCBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PerplexityBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} meta-externalagent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Amazonbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Applebot-Extended [NC,OR]
RewriteCond %{HTTP_USER_AGENT} xAI-Bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DeepSeekBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MistralBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Diffbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} cohere-ai [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AI2Bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} YouBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DuckAssistBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} omgili [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webzio-extended [NC,OR]
RewriteCond %{HTTP_USER_AGENT} gemini-deep-research [NC]
RewriteRule .* - [F,L]

# ─── WordPress / your framework rules below this line ───
# BEGIN WordPress
# ...

Rule order: .htaccess rules are processed top-to-bottom. Place the AI bot block before the # BEGIN WordPress block (or your framework's rewrite section). If WordPress or another tool regenerates your .htaccess, your block rules may be overwritten — check after any plugin update that touches .htaccess.

Single-line variant (shorter .htaccess)

Condense to a single regex condition if you prefer a shorter file:

.htaccess (condensed)
RewriteEngine On
RewriteRule ^robots.txt$ - [L]
RewriteCond %{HTTP_USER_AGENT} "GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|YouBot|DuckAssistBot|omgili|webzio-extended|gemini-deep-research" [NC]
RewriteRule .* - [F,L]

5. noai meta tag in HTML layout

Add noai and noimageai meta tags to your shared layout template. In plain PHP this is typically a header file included on every page. Use a variable or constant to allow per-page override.

templates/header.php
<?php
// $robots — set per-page before including this header.
// Default: block AI training on all pages.
$robots = $robots ?? 'noai, noimageai';
?>
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title><?= htmlspecialchars($pageTitle ?? 'My Site') ?></title>
    <meta name="robots" content="<?= htmlspecialchars($robots) ?>">
</head>
<body>

Per-page override

blog-post.php (example)
<?php
// This page allows AI indexing but not AI training
$robots = 'noimageai';
$pageTitle = 'My Blog Post';
require_once 'templates/header.php';
?>

<h1>Blog Post Title</h1>
<!-- ... -->

X-Robots-Tag response header

Send as an HTTP header for non-HTML responses (JSON APIs, XML sitemaps) or when you want to ensure delivery before the HTML is parsed:

index.php (global header)
<?php
// Send X-Robots-Tag on all HTML responses
// Call before any output is sent
header('X-Robots-Tag: noai, noimageai');

6. nginx + PHP-FPM block

On servers running nginx with PHP-FPM, add a map block for user agent matching. The block runs at the nginx layer — PHP-FPM is never invoked for matched bots.

/etc/nginx/sites-available/yoursite
map $http_user_agent $block_ai_bot {
    default                 0;
    ~*GPTBot                1;
    ~*ChatGPT-User          1;
    ~*OAI-SearchBot         1;
    ~*ClaudeBot             1;
    ~*anthropic-ai          1;
    ~*Google-Extended       1;
    ~*Bytespider            1;
    ~*CCBot                 1;
    ~*PerplexityBot         1;
    ~*meta-externalagent    1;
    ~*Amazonbot             1;
    ~*Applebot-Extended     1;
    ~*xAI-Bot               1;
    ~*DeepSeekBot           1;
    ~*MistralBot            1;
    ~*Diffbot               1;
    ~*cohere-ai             1;
    ~*AI2Bot                1;
    ~*YouBot                1;
    ~*DuckAssistBot         1;
    ~*omgili                1;
    ~*webzio-extended       1;
    ~*gemini-deep-research  1;
}

server {
    listen 443 ssl;
    server_name yoursite.com;
    root /var/www/yoursite/public;
    index index.php;

    # Always serve static robots.txt
    location = /robots.txt {
        try_files $uri =404;
    }

    location / {
        if ($block_ai_bot) {
            return 403 "Forbidden";
        }
        try_files $uri $uri/ /index.php?$query_string;
    }

    location ~ .php$ {
        fastcgi_pass unix:/run/php/php8.2-fpm.sock;
        fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
        include fastcgi_params;
    }
}

Hosting comparison

Hosting type
Shared (cPanel, Bluehost etc.)
VPS + Apache
VPS + nginx + PHP-FPM
Docker container (PHP built-in server)
Managed (Kinsta, WP Engine)
Platform (Heroku, Railway, Render)

Frequently asked questions

How do I block AI bots on shared hosting?

Add a .htaccess mod_rewrite block above your existing rules (above # BEGIN WordPress if applicable). Use RewriteCond %{HTTP_USER_AGENT} with the bot name and [NC,OR] flags, then RewriteRule .* - [F,L] to return 403. This is the only server-level option on most shared hosts — no PHP or server admin access needed.

Should I use .htaccess or PHP to block AI bots?

.htaccess blocks before PHP runs — more efficient and applies to all file types including images. PHP front controller blocking only applies to requests that reach PHP. Use .htaccess for Apache/shared hosting; use nginx map blocks for nginx servers; fall back to PHP $_SERVER checking for platforms where you have no server config access (Docker, Heroku, Railway).

What PHP function should I use to check the user agent?

preg_match() with a case-insensitive pattern against $_SERVER['HTTP_USER_AGENT']. Use a single regex with all bot names joined by | — one preg_match() call is faster than 25 separate str_contains() calls. Always check that the user agent string is non-empty before matching.

Will blocking AI bots break my SEO?

No. Google Search (Googlebot), Bing (Bingbot), and other search engine crawlers are not in the block list. The list specifically targets AI training bots (GPTBot, CCBot, ClaudeBot) and AI search bots where you choose not to appear in AI answers. Your robots.txt explicitly allows all unblocked bots with "User-agent: * Allow: /".

My .htaccess gets overwritten by WordPress. What do I do?

WordPress and plugins like Yoast regenerate the section between # BEGIN WordPress and # END WordPress markers. Place your AI bot block ABOVE the # BEGIN WordPress line — WordPress only modifies content between its markers, not above them. After adding the rules, verify they persist after a plugin update or permalink flush.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

Related Guides