Skip to content

How to Block AI Bots in PHP ReactPHP

ReactPHP is a non-blocking event-loop HTTP server for PHP — one PHP process handles many concurrent requests without threads or forks. Middleware is a callable (ServerRequestInterface, callable): PromiseInterface. To block: return Promise::resolve(new Response(403, …)) without calling $next. To pass: $next($request)->then(fn) to inject headers on the downstream response. getHeaderLine('User-Agent') returns '' (empty string, not null) when the header is absent — PSR-7 §3.2. Never use blocking code (sleep, file_get_contents, PDO) inside middleware — it stalls the entire event loop.

1. Bot detection

Pure PHP, no dependencies. str_contains() for literal substring matching. getHeaderLine() always returns a string so strtolower() is safe without a null-check.

<?php
// AiBotDetector.php — AI bot detection, no external dependencies

class AiBotDetector
{
    private const AI_BOT_PATTERNS = [
        'gptbot',
        'chatgpt-user',
        'claudebot',
        'anthropic-ai',
        'ccbot',
        'google-extended',
        'cohere-ai',
        'meta-externalagent',
        'bytespider',
        'omgili',
        'diffbot',
        'imagesiftbot',
        'magpie-crawler',
        'amazonbot',
        'dataprovider',
        'netcraft',
    ];

    /**
     * Returns true if the User-Agent string matches a known AI crawler.
     *
     * getHeaderLine() always returns a string ('' when absent — PSR-7 §3.2).
     * str_contains() is safe to call without a null-check.
     * Case-folded to lowercase before comparison.
     *
     * @param string $userAgent The raw User-Agent header value (may be '')
     * @return bool
     */
    public static function detect(string $userAgent): bool
    {
        if ($userAgent === '') {
            return false;
        }
        $lower = strtolower($userAgent);
        foreach (self::AI_BOT_PATTERNS as $pattern) {
            if (str_contains($lower, $pattern)) {
                return true;
            }
        }
        return false;
    }
}

2. Middleware and server setup

ReactPHP runs its own HTTP server — there is no Apache or Nginx in front. The HttpServer constructor accepts middleware and a final handler in left-to-right order (outermost first). Install with composer require react/http react/socket.

<?php
// server.php — ReactPHP HTTP server with AI bot blocking middleware
// Install: composer require react/http react/socket

require __DIR__ . '/vendor/autoload.php';
require __DIR__ . '/AiBotDetector.php';

use React\Http\HttpServer;
use React\Http\Message\Response;
use React\Socket\SocketServer;
use Psr\Http\Message\ServerRequestInterface;
use function React\Promise\resolve;

// ── Middleware ─────────────────────────────────────────────────────────────
//
// ReactPHP middleware is any callable with the signature:
//   (ServerRequestInterface $request, callable $next): PromiseInterface
//
// - To BLOCK: return resolve(new Response(403, ...))   — do NOT call $next
// - To PASS:  return $next($request)->then(fn)          — chain to inject headers
//
// getHeaderLine() returns '' when the header is absent (PSR-7 §3.2),
// so strtolower() is always safe — no null-check needed.

$botBlockerMiddleware = function (ServerRequestInterface $request, callable $next) {
    // Always allow robots.txt so crawlers can discover Disallow rules.
    if ($request->getUri()->getPath() === '/robots.txt') {
        return $next($request);
    }

    // getHeaderLine() is case-insensitive (RFC 7230) and returns '' if absent.
    $ua = $request->getHeaderLine('User-Agent');

    if (AiBotDetector::detect($ua)) {
        // Block: return a resolved promise wrapping a 403 Response.
        // Do NOT call $next($request) — that would invoke the app handler.
        return resolve(
            new Response(
                403,
                [
                    'Content-Type'  => 'text/plain',
                    'X-Robots-Tag'  => 'noai, noimageai',
                ],
                'Forbidden'
            )
        );
    }

    // Pass: call $next, then inject X-Robots-Tag on the downstream response.
    // $next($request) returns a PromiseInterface<ResponseInterface>.
    // withHeader() returns a NEW PSR-7 object — original is unchanged (immutability).
    return $next($request)->then(function (Response $response) {
        return $response->withHeader('X-Robots-Tag', 'noai, noimageai');
    });
};

// ── Application handler ───────────────────────────────────────────────────

$robotsTxt = <<<'TXT'
User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /
TXT;

$handler = function (ServerRequestInterface $request) use ($robotsTxt) {
    $path = $request->getUri()->getPath();

    if ($path === '/robots.txt') {
        return new Response(200, ['Content-Type' => 'text/plain'], $robotsTxt);
    }

    return new Response(200, ['Content-Type' => 'application/json'], '{"message":"ok"}');
};

// ── Server bootstrap ──────────────────────────────────────────────────────
//
// HttpServer receives middleware + handler in constructor order:
//   leftmost = outermost (first to see the request, last to see the response)
//   rightmost = innermost (closest to the application handler)
//
// Middleware are applied in left-to-right order for requests
// and right-to-left order for responses (standard onion model).

$server = new HttpServer(
    $botBlockerMiddleware,   // outermost — runs first
    $handler                 // innermost — only reached if not blocked
);

$socket = new SocketServer('0.0.0.0:8080');
$server->listen($socket);

echo "Server running on http://127.0.0.1:8080\n";

React\EventLoop\Loop::run();

3. Class-based invokable middleware

An invokable class is testable and injectable. ReactPHP accepts any callable — closures and __invoke classes are interchangeable.

<?php
// AiBotMiddleware.php — invokable class middleware
// Classes are preferred over closures for testability and dependency injection.

use Psr\Http\Message\ServerRequestInterface;
use React\Http\Message\Response;
use React\Promise\PromiseInterface;
use function React\Promise\resolve;

class AiBotMiddleware
{
    /**
     * ReactPHP invokable middleware.
     *
     * @param ServerRequestInterface $request
     * @param callable               $next
     * @return PromiseInterface<Response>
     */
    public function __invoke(
        ServerRequestInterface $request,
        callable $next
    ): PromiseInterface {
        if ($request->getUri()->getPath() === '/robots.txt') {
            return $next($request);
        }

        // getHeaderLine always returns string — '' when absent.
        if (AiBotDetector::detect($request->getHeaderLine('User-Agent'))) {
            return resolve(new Response(
                403,
                ['Content-Type' => 'text/plain', 'X-Robots-Tag' => 'noai, noimageai'],
                'Forbidden'
            ));
        }

        // PSR-7: withHeader() returns a NEW Response — chain with then().
        return $next($request)->then(
            fn(Response $res) => $res->withHeader('X-Robots-Tag', 'noai, noimageai')
        );
    }
}

// Usage in server.php:
// $server = new HttpServer(new AiBotMiddleware(), $handler);

4. Route-scoped middleware

ReactPHP's HttpServer has no built-in router. Scope middleware by checking $request->getUri()->getPath() inside the middleware and returning $next($request) for paths you want to skip.

<?php
// Route-scoped middleware — protect only /api/* routes.
//
// ReactPHP HttpServer does not have a built-in router.
// Scope middleware by inspecting the path before deciding to block.

$apiOnlyBotBlocker = function (ServerRequestInterface $request, callable $next) {
    $path = $request->getUri()->getPath();

    // Only apply bot blocking to /api/* paths.
    if (!str_starts_with($path, '/api/')) {
        return $next($request); // non-API paths pass through unconditionally
    }

    if (AiBotDetector::detect($request->getHeaderLine('User-Agent'))) {
        return resolve(new Response(
            403,
            ['Content-Type' => 'text/plain', 'X-Robots-Tag' => 'noai, noimageai'],
            'Forbidden'
        ));
    }

    return $next($request)->then(
        fn(Response $res) => $res->withHeader('X-Robots-Tag', 'noai, noimageai')
    );
};

// Stack: route-scoped blocker on top, then rate-limiter, then handler.
$server = new HttpServer(
    $apiOnlyBotBlocker,
    $rateLimiterMiddleware,
    $handler
);

Key points

Framework comparison — PHP HTTP server models

FrameworkProcess modelBlock requestHeader access
ReactPHPSingle-process event loopresolve(new Response(403))getHeaderLine()'' if absent
WorkermanMulti-process + event loop$connection->send(Response(403))$request->header('user-agent')
Hyperf (Swoole)Multi-process + coroutinesreturn (new Response())->withStatus(403)getHeaderLine() case-insensitive, PSR-7
Laravel / Symfony (FPM)PHP-FPM: new process per requestreturn response('Forbidden', 403)$request->header('user-agent')

ReactPHP's event-loop model means a single stalled middleware (a blocking DB call) freezes all in-flight requests simultaneously — unlike FPM where blocking only affects that one worker. This is why the no-blocking constraint is non-negotiable. Bot detection from a User-Agent string is purely CPU-bound so it is always safe.