How to Block AI Bots on Drupal
Drupal powers government agencies, universities, and major publishers — all prime targets for AI training crawlers. Four methods: direct robots.txt editing, the robotstxt module, noai meta tags, and .htaccess server blocking.
Self-Hosted Drupal
- ✓ Edit robots.txt static file directly
- ✓ noai meta tag via Metatag module
- ✓ noai tag via html.html.twig template
- ✓ .htaccess server-level blocking
- ✓ Cloudflare WAF
Acquia / Pantheon / Platform.sh
- ✓ Edit robots.txt in Git repo root
- ✓ robotstxt module (UI-based, no SSH)
- ✓ Metatag module for noai tags
- ✓ Cloudflare WAF (any plan)
- ✗ Direct .htaccess changes (platform-managed)
Quick fix — add to your Drupal robots.txt
File is at the Drupal root (same folder as index.php). Edit directly or via the robotstxt module.
User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: CCBot Disallow: / User-agent: Google-Extended Disallow: /
Method 1: Edit the robots.txt File Directly
Unlike WordPress, Drupal ships with a real static robots.txt file in the document root. You can edit it directly — no plugin or module needed. This is the fastest approach for any Drupal install where you have file system or Git access.
index.php, .htaccess, and composer.json. On a typical Linux server: /var/www/html/robots.txt. On Acquia/Pantheon: the repo root.- 1
Open the robots.txt file in your Drupal root (SSH, SFTP, or Git):
nano /var/www/html/robots.txt # Or on Pantheon/Acquia: edit in your repo and commit
- 2
Below the existing
User-agent: *block, add the AI bot rules:
User-agent: * Allow: / User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Google-Extended Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: meta-externalagent Disallow: / User-agent: Amazonbot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: xAI-Bot Disallow: / User-agent: DeepSeekBot Disallow: / User-agent: MistralBot Disallow: / User-agent: Diffbot Disallow: / User-agent: cohere-ai Disallow: / User-agent: AI2Bot Disallow: / User-agent: DuckAssistBot Disallow: / User-agent: omgilibot Disallow: / User-agent: webzio-extended Disallow: / User-agent: gemini-deep-research Disallow: / Sitemap: https://yourdomain.com/sitemap.xml
- 3
Replace
yourdomain.comwith your actual domain. Save the file. For Git-based hosts, commit and push. - 4
Verify: visit
https://yourdomain.com/robots.txt— confirm your new rules appear.
composer update drupal/core, Drupal's core robots.txt can be overwritten depending on your scaffold settings. Check your composer.json under "drupal-scaffold" → "file-mapping". Add "[web-root]/robots.txt": false to prevent scaffolding from overwriting your customized file. Or use the robotstxt module (Method 2) — it's immune to this issue.Method 2: robotstxt Module (No SSH Required)
The robotstxt contrib module replaces the static robots.txt file with a database-driven version you can edit from the Drupal admin UI. Ideal for managed hosting environments where direct file editing is inconvenient, or for teams that manage content through the admin interface.
- 1
Install the module via Composer:
composer require drupal/robotstxt drush en robotstxt drush cr
- 2
In Drupal Admin, go to Configuration → Search and Metadata → robots.txt.
- 3
In the textarea, add the AI bot Disallow rules from Method 1 above. Click Save configuration.
- 4
Verify at
https://yourdomain.com/robots.txt. The module intercepts this path before the static file is served.
robots.txt file in the Drupal root is ignored. The module serves its database content instead. If you later uninstall the module, Drupal falls back to the static file — so keep them in sync or delete the static file.Method 3: noai Meta Tag (All Plans)
The robots.txt rules cover cooperative bots. The noai meta tag adds a second layer — it instructs bots that check page-level signals. There are two ways to add it to Drupal.
Option A: Metatag Module (recommended)
- 1
Install the Metatag module if not already installed:
composer require drupal/metatag drush en metatag drush cr
- 2
Go to Admin → Configuration → Search and Metadata → Metatag.
- 3
Click Global (or the relevant content type). Find the Robots field. Add
noai, noimageaito the existing values. - 4
Click Save. Run
drush crto clear caches. Verify in page source — search fornoai.
Option B: html.html.twig Template (direct theme edit)
If you prefer not to install a module, add the tag directly to your active theme's base template:
- 1
Find your active theme's
html.html.twig— typically at:/web/themes/custom/YOUR_THEME/templates/layout/html.html.twig
If the file doesn't exist in your custom theme, copy it from
core/modules/system/templates/html.html.twig. - 2
Find the
{{ head }}placeholder. Add the meta tag immediately after it:{{ head }} <meta name="robots" content="noai, noimageai"> - 3
Clear Drupal's template cache:
drush cror Admin → Reports → Flush all caches.
Method 4: .htaccess Server Blocking (Apache)
For Apache-hosted Drupal installs, you can block AI bots at the server level via .htaccess. This is harder to bypass than robots.txt — the request is terminated before PHP or Drupal processes it.
Open your Drupal .htaccess (in the Drupal root). Add this block near the top, before the RewriteEngine On line:
# Block AI training crawlers
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ChatGPT-User|OAI-SearchBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (ClaudeBot|anthropic-ai) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Google-Extended|Bytespider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (CCBot|PerplexityBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (meta-externalagent|Amazonbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Applebot-Extended|xAI-Bot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (DeepSeekBot|MistralBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Diffbot|cohere-ai) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (AI2Bot|DuckAssistBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (omgilibot|webzio-extended|gemini-deep-research) [NC]
RewriteRule ^ - [F,L]
</IfModule>.htaccess, before Drupal's existing RewriteEngine On statement. Drupal's .htaccess already has a RewriteEngine directive — having two can cause a conflict. Either add your rules before Drupal's block, or merge them carefully into Drupal's existing rewrite section.server {}):if ($http_user_agent ~* "(GPTBot|ClaudeBot|CCBot|Google-Extended|Bytespider|PerplexityBot|anthropic-ai|meta-externalagent|Amazonbot|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|DuckAssistBot|omgilibot|webzio-extended|gemini-deep-research)") {
return 403;
}Reload nginx: nginx -t && systemctl reload nginxMethod 5: Cloudflare WAF (All Hosting)
Works for all Drupal hosting environments — self-hosted, Acquia, Pantheon, Platform.sh. Blocks bots at Cloudflare's edge before they reach your Drupal server. Highly effective against bots that ignore robots.txt (Bytespider in particular).
- 1Add your domain to Cloudflare (free plan) and update your DNS nameservers.
- 2Go to Security → WAF → Custom Rules → Create rule.
- 3Click Edit expression and paste:
(http.user_agent contains "GPTBot") or (http.user_agent contains "ClaudeBot") or (http.user_agent contains "anthropic-ai") or (http.user_agent contains "Google-Extended") or (http.user_agent contains "Bytespider") or (http.user_agent contains "CCBot") or (http.user_agent contains "PerplexityBot") or (http.user_agent contains "meta-externalagent") or (http.user_agent contains "DeepSeekBot") or (http.user_agent contains "MistralBot") or (http.user_agent contains "xAI-Bot") or (http.user_agent contains "Diffbot") or (http.user_agent contains "cohere-ai") or (http.user_agent contains "AI2Bot") or (http.user_agent contains "DuckAssistBot") or (http.user_agent contains "omgilibot") or (http.user_agent contains "webzio-extended") or (http.user_agent contains "gemini-deep-research")
Set action to Block. Drupal never processes the request.
All 25 AI Bots to Block
User agents for the robots.txt rules, .htaccess, and Cloudflare WAF:
GPTBotChatGPT-UserOAI-SearchBotClaudeBotanthropic-aiGoogle-ExtendedBytespiderCCBotPerplexityBotmeta-externalagentAmazonbotApplebot-ExtendedxAI-BotDeepSeekBotMistralBotDiffbotcohere-aiAI2BotAi2Bot-DolmaYouBotDuckAssistBotomgiliomgilibotwebzio-extendedgemini-deep-researchWhy Drupal sites are prime AI training targets
Drupal powers the White House, NASA, Tesla, BBC, Harvard, and hundreds of government agencies. Its structured content model, clean semantic HTML, and taxonomy-rich pages make it exceptionally valuable training data. Drupal's Views module produces consistently formatted list pages — ideal for scraping at scale. CCBot (which feeds GPT, Gemini, Llama, and most open-source models) and Diffbot (sold to AI companies) both actively prioritise high-authority Drupal sites. If your Drupal site has been live for more than a year, it's almost certainly in multiple training datasets already.
Will This Affect Drupal SEO?
Safe to block
- ✓ Google Search rankings unaffected
- ✓ Bing rankings unaffected
- ✓ Drupal Sitemap module unaffected
- ✓ JSON-LD structured data unaffected
- ✓ Metatag SEO fields unaffected
- ✓ Pathauto URL paths unaffected
Consider before blocking
- ⚠ OAI-SearchBot → removes from ChatGPT Search
- ⚠ PerplexityBot → removes from Perplexity citations
- ⚠ DuckAssistBot → removes from Duck.ai answers
- For government and institutional Drupal sites, full AI search visibility may be desirable. Use the per-path approach in robots.txt to block training bots while allowing AI search bots on public content.
Protecting robots.txt from Composer Scaffold
When you run composer update drupal/core, the Drupal scaffold process may overwrite your robots.txt with the default version. Prevent this by adding a file-mapping exclusion to your composer.json:
{
"extra": {
"drupal-scaffold": {
"file-mapping": {
"[web-root]/robots.txt": false
}
}
}
}Setting the path to false tells Composer scaffold to skip that file entirely. Your custom robots.txt will survive all future composer update runs. Alternatively, use the robotstxt module — it's completely immune to scaffold changes since it doesn't use the static file.
Frequently Asked Questions
Where is the robots.txt file in Drupal?↓
In the Drupal root directory — the same folder as index.php and .htaccess. On a typical self-hosted install: /var/www/html/robots.txt. On Acquia or Pantheon: the repository root. Edit it directly, or install the robotstxt contrib module for admin UI management.
How do I add a noai meta tag to every Drupal page?↓
Two methods: (1) Metatag module — Admin → Configuration → Search and Metadata → Metatag → Global → Robots field → add 'noai, noimageai'. (2) html.html.twig template — add <meta name="robots" content="noai, noimageai"> after the {{ head }} placeholder in your theme's layout template. The Metatag module is preferred — it survives theme changes.
Can I block AI bots without SSH or file system access?↓
Yes. Install the robotstxt module (drupal.org/project/robotstxt) — it replaces the static file with a UI-editable, database-stored version. For meta tags, use the Metatag module. Both are admin-only changes with no file system access needed.
Will Composer update overwrite my robots.txt?↓
Potentially — Drupal's scaffold process can overwrite the robots.txt file when you update core. Prevent this by adding "[web-root]/robots.txt": false to the drupal-scaffold file-mapping in your composer.json. Or use the robotstxt module, which manages robots.txt via the database and is completely unaffected by scaffold.
How do I block AI bots on Acquia or Pantheon?↓
Edit the robots.txt file in your Git repository root and commit the change — it deploys automatically. For meta tags, use the Metatag module (admin config, no Git commit needed). For edge blocking, proxy your domain through Cloudflare and add a WAF rule. Direct .htaccess changes for bot blocking may conflict with Acquia/Pantheon managed configs — test in a non-prod environment first.
Will blocking AI bots affect Drupal's Google Search rankings?↓
No. Blocking GPTBot, ClaudeBot, CCBot, Google-Extended, and other AI training bots has zero effect on Googlebot or Bingbot. Your Drupal site's Search API, sitemap, Pathauto paths, and structured data all continue working normally. Google Search rankings and Bing rankings are completely unaffected.
Related guides
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.