Skip to content
Guides/CakePHP

How to Block AI Bots on CakePHP: Complete 2026 Guide

CakePHP 4 and 5 implement PSR-15 middleware — the same process(request, handler) interface as Slim 4. The difference is how you register it: CakePHP uses Application::middleware() with a MiddlewareQueue object. Use prepend() — not add() — to run the bot blocker before CakePHP's own stack.

Always use prepend() — not add()

CakePHP's built-in middleware stack (Security, CSRF, Session, Routing) processes requests in order. add() appends your middleware after the built-in stack — sessions get started and CSRF tokens get validated before the bot is blocked. Use prepend() to run the bot blocker first, before any of that processing.

Protection layers

1
robots.txtwebroot/robots.txt — served by Apache/nginx before PHP is invoked
2
noai meta tagSet a request attribute in middleware; read in CakePHP layout template
3
X-Robots-Tag headerPSR-7 $response->withHeader() on every pass-through response
4
Hard 403 blockReturn new Response(403) without calling $handler->handle() — handler never runs

Layer 1: robots.txt

CakePHP's document root is webroot/ (not public/). Place robots.txt there — Apache and nginx serve it directly without invoking PHP:

# webroot/robots.txt

User-agent: *
Allow: /

User-agent: GPTBot
User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: Google-Extended
User-agent: CCBot
User-agent: cohere-ai
User-agent: Bytespider
User-agent: Amazonbot
User-agent: PerplexityBot
User-agent: YouBot
User-agent: Diffbot
User-agent: DeepSeekBot
User-agent: MistralBot
User-agent: xAI-Bot
User-agent: AI2Bot
Disallow: /
webroot/ — not public/
Slim 4, Laravel, and Symfony use public/ as the document root. CakePHP uses webroot/. Put robots.txt in webroot/robots.txt.

The middleware class

Create src/Middleware/AiBotBlocker.php. CakePHP uses PSR-15 MiddlewareInterface — identical to Slim 4:

<?php
// src/Middleware/AiBotBlocker.php
declare(strict_types=1);

namespace App\Middleware;

use Psr\Http\Message\ResponseInterface;
use Psr\Http\Message\ServerRequestInterface;
use Psr\Http\Server\MiddlewareInterface;
use Psr\Http\Server\RequestHandlerInterface;
use Cake\Http\Response;

class AiBotBlocker implements MiddlewareInterface
{
    private const AI_BOTS = [
        'gptbot', 'chatgpt-user', 'claudebot', 'anthropic-ai',
        'ccbot', 'cohere-ai', 'bytespider', 'amazonbot',
        'applebot-extended', 'perplexitybot', 'youbot', 'diffbot',
        'google-extended', 'deepseekbot', 'mistralbot', 'xai-bot',
        'ai2bot', 'oai-searchbot', 'duckassistbot',
    ];

    private const EXEMPT_PATHS = [
        '/robots.txt',
        '/sitemap.xml',
        '/favicon.ico',
    ];

    public function process(
        ServerRequestInterface $request,
        RequestHandlerInterface $handler
    ): ResponseInterface {
        // Set noai meta attribute for templates
        $request = $request->withAttribute('robots', 'noai, noimageai');

        // Exempt paths bypass blocking
        $path = $request->getUri()->getPath();
        if (in_array($path, self::EXEMPT_PATHS, true)) {
            return $this->passThrough($handler, $request);
        }

        // Check User-Agent
        $ua = strtolower($request->getHeaderLine('User-Agent'));
        foreach (self::AI_BOTS as $bot) {
            if (str_contains($ua, $bot)) {
                // Block — do NOT call $handler->handle()
                return (new Response())
                    ->withStatus(403)
                    ->withStringBody('Forbidden: AI crawlers are not permitted.');
            }
        }

        return $this->passThrough($handler, $request);
    }

    private function passThrough(
        RequestHandlerInterface $handler,
        ServerRequestInterface $request
    ): ResponseInterface {
        // PSR-7 IMMUTABILITY: withHeader() returns a NEW object — must capture it
        $response = $handler->handle($request);
        return $response->withHeader('X-Robots-Tag', 'noai, noimageai');
    }
}

PSR-7 immutability — the #1 gotcha

PSR-7 response objects are immutable. withHeader() returns a new object — it does not modify the existing response. The header is silently discarded if you don't capture the return value:

// ❌ WRONG — withHeader() return value discarded
$response->withHeader('X-Robots-Tag', 'noai, noimageai');
return $response; // Header NOT set

// ✅ CORRECT — capture the new object
$response = $response->withHeader('X-Robots-Tag', 'noai, noimageai');
return $response;

Registration in Application.php

Register the middleware in src/Application.php using prepend() inside the middleware() method:

<?php
// src/Application.php
namespace App;

use App\Middleware\AiBotBlocker;
use Cake\Http\BaseApplication;
use Cake\Http\MiddlewareQueue;
use Cake\Routing\Middleware\AssetMiddleware;
use Cake\Routing\Middleware\RoutingMiddleware;
use Cake\Http\Middleware\CsrfProtectionMiddleware;
use Cake\Http\Middleware\HttpsEnforcerMiddleware;
use Cake\Http\Middleware\SecurityHeadersMiddleware;

class Application extends BaseApplication
{
    public function middleware(MiddlewareQueue $middlewareQueue): MiddlewareQueue
    {
        $middlewareQueue
            // ✅ PREPEND — runs BEFORE CakePHP's built-in middleware
            ->prepend(new AiBotBlocker())

            // CakePHP's standard middleware stack (runs after bot blocker)
            ->add(new SecurityHeadersMiddleware())
            ->add(new HttpsEnforcerMiddleware())
            ->add(new AssetMiddleware([
                'cacheTime' => env('ASSET_CACHE_TIME', '+1 day'),
            ]))
            ->add(new RoutingMiddleware($this))
            ->add(new CsrfProtectionMiddleware([
                'httponly' => true,
            ]));

        return $middlewareQueue;
    }
}
add() vs prepend()
  • prepend() — adds to the beginning of the queue (runs first)
  • add() — appends to the end of the queue (runs last)
  • There is also insertAt(position, middleware) for precise placement

Layer 2: noai meta tag

The middleware sets a robots attribute on the request via withAttribute(). Read it in your CakePHP layout template:

<!-- templates/layout/default.php -->
<?php
// Access the request attribute — default to noai if not set
$robots = $this->request->getAttribute('robots', 'noai, noimageai');
?>
<!DOCTYPE html>
<html>
<head>
    <meta name="robots" content="<?= h($robots) ?>">
    <!-- rest of head -->
</head>

Override per-controller or per-action if needed:

// In a controller action — allow indexing for public pages
public function index(): ?Response
{
    // Override the robots attribute for this page only
    $this->request = $this->request->withAttribute('robots', 'index, follow');
    // ...
}

Route-scoped middleware

To block AI bots only on specific routes (e.g., your API), apply middleware in config/routes.php using a route scope with the middleware option, or in a Plugin's routes() method:

<?php
// config/routes.php
use App\Middleware\AiBotBlocker;
use Cake\Routing\RouteBuilder;

return static function (RouteBuilder $routes): void {
    // Apply bot blocker only to /api/* routes
    $routes->scope('/api', function (RouteBuilder $builder): void {
        $builder->registerMiddleware('aiBotBlocker', new AiBotBlocker());
        $builder->applyMiddleware('aiBotBlocker');

        $builder->get('/data', ['controller' => 'Api', 'action' => 'data']);
        // ... more API routes
    });

    // Public routes — no bot blocking
    $routes->scope('/', function (RouteBuilder $builder): void {
        $builder->get('/', ['controller' => 'Pages', 'action' => 'index']);
    });
};

Route-scoped middleware requires RoutingMiddleware to be in the global queue (it is, by default). The route-scope middleware runs after routing resolves the matched scope.

CakePHP 4 vs CakePHP 5

CakePHP 4 — PHP 7.4+ / 8.x

// src/Application.php (CakePHP 4)
// Identical pattern — prepend() and PSR-15 work the same
$middlewareQueue->prepend(new AiBotBlocker());

// str_contains() requires PHP 8.0 — use strpos() for PHP 7.4:
// if (strpos($ua, $bot) !== false) { ... }

CakePHP 5 — PHP 8.1+ required

// src/Application.php (CakePHP 5)
// Same pattern — no changes needed
$middlewareQueue->prepend(new AiBotBlocker());

// PHP 8.1+ guaranteed — str_contains() is safe
// CakePHP 5 also uses named arguments — no impact on middleware

CakePHP vs Slim 4 vs Symfony — registration comparison

CakePHP — MiddlewareQueue::prepend()

// src/Application.php
public function middleware(MiddlewareQueue $middlewareQueue): MiddlewareQueue
{
    $middlewareQueue->prepend(new AiBotBlocker());
    // ... built-in middleware
    return $middlewareQueue;
}

Slim 4 — $app->addMiddleware()

// index.php or routes.php
$app->addMiddleware(new AiBotBlocker());
// Slim adds middleware in LIFO — last added = outermost = runs first

Symfony — EventSubscriber on KernelEvents::REQUEST

// config/services.yaml auto-registers via autoconfigure: true
// src/EventSubscriber/AiBotBlockerSubscriber.php
// KernelEvents::REQUEST at priority 9999 — highest priority runs first

Laravel — $middleware[] in bootstrap/app.php

// bootstrap/app.php (Laravel 11)
->withMiddleware(function (Middleware $middleware) {
    $middleware->prepend(AiBotBlocker::class);
})

Testing

Use Cake\TestSuite\IntegrationTestTrait and its configRequest() method to send custom headers:

<?php
// tests/TestCase/Middleware/AiBotBlockerTest.php
namespace App\Test\TestCase\Middleware;

use Cake\TestSuite\IntegrationTestTrait;
use Cake\TestSuite\TestCase;

class AiBotBlockerTest extends TestCase
{
    use IntegrationTestTrait;

    public function testBlocksAiBot(): void
    {
        $this->configRequest([
            'headers' => ['User-Agent' => 'GPTBot/1.0'],
        ]);
        $this->get('/articles');
        $this->assertResponseCode(403);
    }

    public function testAllowsBrowser(): void
    {
        $this->configRequest([
            'headers' => ['User-Agent' => 'Mozilla/5.0 (compatible)'],
        ]);
        $this->get('/articles');
        $this->assertResponseOk();
        $this->assertHeaderContains('X-Robots-Tag', 'noai, noimageai');
    }

    public function testRobotsTxtExempt(): void
    {
        // robots.txt served by web server — test via direct file check
        $this->assertFileExists(WWW_ROOT . 'robots.txt');
    }
}

Run with vendor/bin/phpunit tests/TestCase/Middleware/.

AI bot User-Agent strings (2026)

GPTBotChatGPT-UserClaudeBotanthropic-aiCCBotcohere-aiBytespiderAmazonbotApplebot-ExtendedPerplexityBotYouBotDiffbotGoogle-ExtendedFacebookBotomgiliomgilibotDeepSeekBotMistralBotxAI-BotAI2Bot

Use strtolower() before checking and lowercase the list — match case-insensitively. str_contains($ua, $bot) (PHP 8.0+) or strpos($ua, $bot) !== false for PHP 7.4.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.