Skip to content
PHP · Craft CMS 5 · Twig Templates

How to Block AI Bots on Craft CMS

Craft CMS is the agency-favourite content management system — built on PHP and Yii2, with Twig templates, a flexible field layout system, and a developer-first philosophy. Unlike WordPress, Craft has no bundled SEO plugin, so robots.txt and meta tag management is explicit: place a file in web/, use a Twig template with a URL rule, or install SEOmatic. This guide covers all four AI bot protection layers for Craft CMS 5: robots.txt, noai meta tags in Twig layouts with per-entry overrides, X-Robots-Tag via Apache/.htaccess and Nginx, and hard 403 blocking at the server and application level.

9 min readUpdated April 2026Craft CMS 5.x

1. robots.txt

Craft CMS serves requests through a PHP entry point (web/index.php), but Apache and Nginx serve static files directly from the web/ directory before PHP is invoked. This makes web/robots.txt the simplest approach — zero Craft overhead.

Option A: Static file in web/ (recommended)

Create web/robots.txt in your Craft project:

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Diffbot
Disallow: /

# Allow standard search crawlers
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Allow: /

Apache and Nginx serve this file directly — the request never reaches Craft's PHP layer. This is the fastest option and works without any Craft configuration.

Option B: Dynamic Twig template (environment-aware)

For different rules per environment (block everything on staging, granular rules in production), use a Twig template served by Craft.

Create templates/robots.txt.twig:

{# templates/robots.txt.twig #}
{%- if craft.app.config.env != 'production' %}
User-agent: *
Disallow: /
{% else %}
# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Bytespider
Disallow: /

# Allow search crawlers
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Allow: /
{% endif -%}

Add a URL rule in config/routes.php:

<?php
// config/routes.php
return [
    'robots.txt' => ['template' => 'robots.txt'],
];
Content-Type for Twig robots.txt: Craft renders Twig templates as text/html by default. Set the correct content type at the top of your template:
{# templates/robots.txt.twig #}
{% header "Content-Type: text/plain; charset=utf-8" %}
{# ... rest of your robots.txt content ... #}
Multi-site Craft installations: For Craft multi-site setups, you can serve different robots.txt per site using the {% if craft.app.sites.currentSite.handle == 'primary' %} check in your Twig template. This lets staging sites block all crawlers while production serves the proper rules.

2. noai meta tags in Twig layouts

Craft uses Twig for templating. Your project typically has a base layout that all page templates extend — this is where you add the noai meta tag for site-wide coverage.

Base layout

Edit your base layout (commonly templates/_layouts/master.twig or templates/_base.html.twig):

{# templates/_layouts/master.twig #}
<!DOCTYPE html>
<html lang="{{ craft.app.language }}">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">

  {# AI bot protection — site-wide default #}
  <meta name="robots" content="{{ robots ?? 'noai, noimageai' }}">

  <title>
    {% block title %}{{ siteName }}{% endblock %}
  </title>

  {# ... stylesheets, scripts ... #}
</head>
<body class="{{ bodyClass ?? '' }}">
  {% block content %}{% endblock %}
</body>
</html>

Page template extending the layout

{# templates/index.twig #}
{% extends '_layouts/master' %}

{% block title %}Home | {{ siteName }}{% endblock %}

{% block content %}
  <h1>Welcome</h1>
{% endblock %}

When no robots variable is set by the template or controller, the ?? fallback in the layout applies noai, noimageai to every page.

3. Per-entry robots control

To give content editors per-entry control over AI bot indexing, add a custom field to your section's field layout.

Add a robots field

  1. Go to Settings → Fields in the Craft Control Panel
  2. Create a new Plain Text field named robotsTag
  3. Set Instructions to: “robots meta tag value. Leave blank for site default (noai, noimageai). Examples: index, follow · noindex, nofollow
  4. Go to Settings → Sections, open your section's field layout, and add the robotsTag field

Use in templates

{# templates/_layouts/master.twig #}

{# Check entry field first, then fall back to page variable, then to site default #}
{% set robotsValue = entry.robotsTag ?? robots ?? 'noai, noimageai' %}
<meta name="robots" content="{{ robotsValue }}">

With this pattern, editors can leave the field blank (site default noai, noimageai applies), or enter index, follow for entries that should be indexed by AI search engines.

Craft field handles: Craft converts field names to camelCase handles automatically. A field named “Robots Tag” gets a handle of robotsTag. Access it in Twig as entry.robotsTag. Verify the handle in Settings → Fields if the name differs.

4. X-Robots-Tag via server config

HTTP headers are the most reliable signal — some AI crawlers parse headers without rendering HTML. The most performant place to set them is the web server, before PHP is invoked.

Apache — .htaccess

Add to your web/.htaccess file (Craft includes a default .htaccess — add above the Craft rewrite rules):

# AI bot protection headers
<IfModule mod_headers.c>
    Header always set X-Robots-Tag "noai, noimageai"
</IfModule>

# Craft CMS default rewrite rules
RewriteEngine On
# ... existing Craft rules ...

Apache — VirtualHost block

<VirtualHost *:443>
    ServerName example.com
    DocumentRoot /var/www/mycraft/web

    <Directory /var/www/mycraft/web>
        AllowOverride All
        Require all granted
    </Directory>

    <IfModule mod_headers.c>
        Header always set X-Robots-Tag "noai, noimageai"
    </IfModule>

    # ... SSL config ...
</VirtualHost>

Nginx

server {
    listen 443 ssl;
    server_name example.com;
    root /var/www/mycraft/web;
    index index.php;

    # AI bot protection header
    add_header X-Robots-Tag "noai, noimageai" always;

    # Static files served by Nginx directly
    location ~* .(css|js|png|jpg|ico|woff2|svg)$ {
        expires 1y;
        access_log off;
    }

    # PHP requests pass to PHP-FPM
    location / {
        try_files $uri $uri/ /index.php?$query_string;
    }

    location ~ \.php$ {
        fastcgi_pass unix:/run/php/php8.3-fpm.sock;
        fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
        include fastcgi_params;
    }
}
The always keyword (Nginx): Without always, Nginx only adds the header to 200 responses. With always, the header is added to all responses including 3xx redirects, 4xx errors, and 5xx — ensuring crawlers receive the signal on every response.

5. Hard 403 blocking

To reject AI crawlers before Craft processes the request, use server-level User-Agent matching. This is more performant than PHP-level blocking — the request is rejected without invoking Craft at all.

Apache — .htaccess UA blocking

# web/.htaccess — add above Craft's rewrite rules
<IfModule mod_rewrite.c>
    RewriteEngine On

    # Allow robots.txt regardless of User-Agent
    RewriteRule ^robots.txt$ - [L]
    RewriteRule ^sitemap.xml$ - [L]

    # Block AI training crawlers
    RewriteCond %{HTTP_USER_AGENT} GPTBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ClaudeBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Claude-Web [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} anthropic-ai [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} CCBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Google-Extended [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} PerplexityBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Applebot-Extended [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Amazonbot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} meta-externalagent [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Bytespider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Diffbot [NC]
    RewriteRule .* - [F,L]
</IfModule>

The [F] flag returns a 403 Forbidden response. The [L] flag stops processing further rewrite rules. The [NC] flag makes the match case-insensitive.

Nginx — UA blocking

# In your Nginx server block
map $http_user_agent $is_ai_bot {
    default 0;
    ~*GPTBot 1;
    ~*ClaudeBot 1;
    ~*Claude-Web 1;
    ~*anthropic-ai 1;
    ~*CCBot 1;
    ~*Google-Extended 1;
    ~*PerplexityBot 1;
    ~*Applebot-Extended 1;
    ~*Amazonbot 1;
    ~*meta-externalagent 1;
    ~*Bytespider 1;
    ~*Diffbot 1;
}

server {
    # ... server config ...

    # Allow robots.txt for all crawlers
    location = /robots.txt {
        alias /var/www/mycraft/web/robots.txt;
        access_log off;
    }

    # Block AI bots from everything else
    location / {
        if ($is_ai_bot) {
            return 403 "Forbidden";
        }
        try_files $uri $uri/ /index.php?$query_string;
    }
}

Craft module — application-level blocking

For PHP-level blocking (useful when you need access to Craft's context, such as checking if a user is logged into the Control Panel and exempting them):

Create modules/AiBotModule/Module.php:

<?php
// modules/AiBotModule/Module.php
namespace modules\AiBotModule;

use Craft;
use yii\base\Module as BaseModule;
use yii\base\Event;

class Module extends BaseModule
{
    public function init(): void
    {
        parent::init();

        // Only run on site requests, not Control Panel
        if (Craft::$app->getRequest()->getIsSiteRequest()) {
            $this->blockAiBots();
        }
    }

    private function blockAiBots(): void
    {
        $ua = Craft::$app->getRequest()->getUserAgent() ?? '';

        $patterns = [
            'GPTBot', 'ClaudeBot', 'Claude-Web', 'anthropic-ai',
            'CCBot', 'Google-Extended', 'PerplexityBot', 'Applebot-Extended',
            'Amazonbot', 'meta-externalagent', 'Bytespider', 'Diffbot',
        ];

        $path = Craft::$app->getRequest()->getPathInfo();
        $exempt = ['robots.txt', 'sitemap.xml'];

        if (in_array($path, $exempt, true)) {
            return;
        }

        foreach ($patterns as $pattern) {
            if (stripos($ua, $pattern) !== false) {
                Craft::$app->getResponse()->setStatusCode(403);
                Craft::$app->getResponse()->data = 'Forbidden';
                Craft::$app->end();
            }
        }
    }
}

Register the module in config/app.php:

<?php
// config/app.php
return [
    'modules' => [
        'ai-bot-module' => \modules\AiBotModule\Module::class,
    ],
    'bootstrap' => ['ai-bot-module'],
];
Server vs PHP blocking: Apache/Nginx UA blocking is faster — no PHP process is invoked. The Craft module approach is slower but has access to Craft's context (session, logged-in user, current site). For most sites, server-level blocking is the right choice. Use the module only when you need Craft context to make the blocking decision.

6. SEOmatic plugin

SEOmatic is the most popular SEO plugin for Craft CMS — used on thousands of agency-built sites. It manages robots.txt, meta tags, sitemaps, and structured data from the Control Panel. If your project already has SEOmatic installed, configure AI bot protection there rather than editing templates manually.

robots.txt via SEOmatic

  1. Go to SEOmatic → Plugin Settings → Robots
  2. In the robots.txt Template field, add your AI bot Disallow rules
  3. SEOmatic serves robots.txt at /robots.txt automatically

Example robots.txt template in SEOmatic:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: *
{% for entry in sitemapEntries %}
Sitemap: {{ entry.url }}
{% endfor %}

Meta tags via SEOmatic

  1. Go to SEOmatic → Content SEO → (select section) → General
  2. In the Additional Meta Tags field, add: <meta name="robots" content="noai, noimageai">
  3. Alternatively, go to SEOmatic → Global SEO → General to apply site-wide
SEOmatic Robots field: SEOmatic also has a per-entry SEO Robots dropdown (Index/Follow, Noindex, Nofollow, etc.). However, it doesn't natively include noai as an option — you'll need to add a custom meta tag rather than using the dropdown.

7. Deployment

Craft CMS runs on PHP 8.2+ with a MySQL or PostgreSQL database. Standard deployment is Apache or Nginx with PHP-FPM. Managed hosting platforms that support PHP work with all the configurations in this guide.

Platformrobots.txtMeta tagsX-Robots-TagHard 403
Apache + PHP-FPM
Nginx + PHP-FPM
Craft Cloud
DigitalOcean / Linode
Ploi / Forge
Docker

Craft Cloud

Craft Cloud is Craft's managed hosting platform. It runs Nginx, so the Nginx configurations in this guide apply directly. Craft Cloud supports custom Nginx directives via the Control Panel — add your add_header and map directives there without needing server access.

Recommended stack

For a self-hosted Craft CMS site with comprehensive AI bot protection:

# Layers of protection (in order of preference)
1. web/robots.txt      — static file, zero overhead, Disallow rules
2. Nginx map + 403     — server-level UA block, no PHP invoked
3. Twig noai meta      — in _layouts/master.twig with entry field override
4. Nginx add_header    — X-Robots-Tag on all responses

FAQ

How do I add robots.txt to Craft CMS?

The simplest approach is placing robots.txt in the web/ directory (your document root). Apache and Nginx serve it directly without touching Craft. For dynamic content with Twig, create templates/robots.txt.twig and add a URL rule in config/routes.php: 'robots.txt' => ['template' => 'robots.txt']. Add {% header "Content-Type: text/plain; charset=utf-8" %} at the top of the Twig template to override the default HTML content type.

How do I add noai meta tags to every Craft CMS page?

Edit your base Twig layout and add <meta name="robots" content="{{ robots ?? 'noai, noimageai' }}"> inside <head>. For per-entry control, add a Plain Text field named robotsTag to your section's field layout, then use {{ entry.robotsTag ?? robots ?? 'noai, noimageai' }} in the layout. Editors leave the field blank for the default, or enter index, follow for pages that should be indexed.

Can I use SEOmatic to block AI bots in Craft CMS?

Yes. SEOmatic manages robots.txt from the Control Panel — go to Plugin Settings → Robots → robots.txt Template and add your AI bot Disallow rules. For the noai meta tag, add it as a custom meta tag in SEOmatic's Global SEO → General settings. SEOmatic's built-in Robots dropdown doesn't include noai as an option, so use the Additional Meta Tags field.

How do I add X-Robots-Tag headers in Craft CMS?

For Apache, add Header always set X-Robots-Tag "noai, noimageai" inside an <IfModule mod_headers.c> block in web/.htaccess or your VirtualHost config. For Nginx, add add_header X-Robots-Tag "noai, noimageai" always; in your server block. The always keyword ensures the header is added to all response codes, not just 200.

How do I hard-block AI bots at the application level in Craft CMS?

The most performant option is Apache RewriteRule or Nginx map + 403 — no PHP invoked. For application-level blocking with Craft context (e.g., exempt logged-in CP users), create a Craft module in modules/, register it in config/app.php, and in init() check getRequest()->getUserAgent(), then call Craft::$app->end() with a 403 status code.

Does blocking AI bots affect Googlebot in Craft CMS?

No. Googlebot and Bingbot are different user agents from AI training crawlers (GPTBot, ClaudeBot, CCBot, Google-Extended). The UA patterns in this guide target AI-specific bots only. Include explicit Allow: / rules for Googlebot and Bingbot in your robots.txt, and verify with Google Search Console after making changes.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.