Skip to content
JoomlaNew8 min read

How to Block AI Bots on Joomla

Joomla is the world's third most popular CMS — and its clean, structured output makes it a frequent AI training target. Here's how to block 25+ AI crawlers with direct robots.txt editing, noai meta tags, and .htaccess rules.

Joomla 4.x (2021+)

  • ✓ Edit robots.txt at Joomla root (FTP/SSH)
  • ✓ Template editor in Admin → System → Site Templates
  • ✓ noai tag via template index.php
  • ✓ .htaccess server-level blocking
  • ✓ Cloudflare WAF

Joomla 3.x (legacy)

  • ✓ Edit robots.txt at Joomla root (FTP/SSH)
  • ✓ Template editor via Extensions → Templates
  • ✓ noai tag via template index.php
  • ✓ .htaccess server-level blocking
  • ✓ Cloudflare WAF

Quick fix — add to your Joomla robots.txt

File is at the Joomla root directory (same folder as index.php and configuration.php). Edit via FTP, SSH, or your hosting control panel's file manager.

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Method 1: Edit robots.txt (Fastest)

Joomla stores a static robots.txt file in the site root — the same directory as index.php and configuration.php. Edit it directly with FTP/SFTP, SSH, or your hosting control panel's file manager. No extensions or admin-panel changes required.

Also check: Joomla ships a companion file called robots.txt.dist in the same folder — this is the safe factory default and is not served. Only edit robots.txt (no extension). If robots.txt is missing from your install, copy robots.txt.dist to robots.txt first.
  1. 1

    Connect to your server via FTP/SFTP or open your hosting control panel file manager.

  2. 2

    Navigate to the Joomla root and open robots.txt for editing (typically /public_html/robots.txt on shared hosting, or /var/www/html/robots.txt on VPS).

  3. 3

    Replace or append with the full AI bot block list:

User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: DeepSeekBot
Disallow: /

User-agent: MistralBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: AI2Bot
Disallow: /

User-agent: DuckAssistBot
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: webzio-extended
Disallow: /

User-agent: gemini-deep-research
Disallow: /

Sitemap: https://yourdomain.com/sitemap.xml
  1. 4

    Replace yourdomain.com in the Sitemap line with your domain. Save and upload. Verify at https://yourdomain.com/robots.txt.

Method 2: noai Meta Tag via Template (All Joomla Installs)

The noai meta tag adds a second layer of protection — it signals AI bots at the page level. The most reliable way to add it to every Joomla page is editing your active template's index.php file.

Option A: Template Editor in Joomla Admin (Joomla 4.x)

  1. 1

    Log into Joomla Admin. Go to System → Site Templates.

  2. 2

    Click the name of your active template (marked with a star). Then click Open Template Editor.

  3. 3

    In the file tree on the left, click index.php.

  4. 4

    Find the </head> closing tag. Add the noai tag immediately before it:

    <meta name="robots" content="noai, noimageai">
    </head>
  5. 5

    Click Save. No cache clearing needed for template file changes.

Option B: Extensions → Templates (Joomla 3.x)

  1. 1

    Go to Extensions → Templates → Templates. Click your active template (star icon). Click the Editor tab.

  2. 2

    Click index.php in the file list. Find </head> and add the meta tag before it (same as above). Save.

Template updates: If you update or switch your active template, the index.php file may be overwritten and your meta tag will be lost. Use a child template (Joomla 4 supports template overrides) or keep a record of the change so you can re-add it after updates.

Alternative: Custom HTML module in <head>

In Joomla 4+, you can create a Custom HTML module assigned to a head position without touching template code. However, this only works if your template has a module position inside <head> (most templates don't). The template index.php approach is more reliable for most installs.

Joomla Admin → Content → Site Modules → New → Custom HTML → paste <meta name="robots" content="noai, noimageai"> → assign to a head module position → set Status to Published.

Method 3: .htaccess Server Blocking (Apache)

Joomla runs on Apache and ships with its own .htaccess file (or htaccess.txt that you rename to .htaccess). Adding RewriteCond rules blocks AI bots at the web server level — before Joomla or PHP processes the request. This is more robust than robots.txt for bots that ignore it (like Bytespider).

Open your Joomla .htaccess in the site root. Add this block at the very top, before any existing content:

# Block AI training crawlers — add BEFORE Joomla's RewriteEngine block
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ChatGPT-User|OAI-SearchBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (ClaudeBot|anthropic-ai) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Google-Extended|Bytespider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (CCBot|PerplexityBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (meta-externalagent|Amazonbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Applebot-Extended|xAI-Bot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (DeepSeekBot|MistralBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Diffbot|cohere-ai) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (AI2Bot|DuckAssistBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (omgilibot|webzio-extended|gemini-deep-research) [NC]
RewriteRule ^ - [F,L]
</IfModule>
Placement is critical: Joomla's default .htaccess already contains a RewriteEngine On statement. Add your blocking rules in a separate block at the very top of the file — before Joomla's block — to avoid RewriteEngine conflicts. Do not add RewriteEngine On twice inside the same <IfModule> context.
On nginx: Add the following inside your server {} block before the location / block:
if ($http_user_agent ~* "(GPTBot|ClaudeBot|CCBot|Google-Extended|Bytespider|PerplexityBot|anthropic-ai|meta-externalagent|Amazonbot|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|DuckAssistBot|omgilibot|webzio-extended|gemini-deep-research)") {
    return 403;
}
Run nginx -t && systemctl reload nginx.

Method 4: Cloudflare WAF (All Hosting)

Works with any Joomla hosting — shared, VPS, managed. Blocks AI bots at Cloudflare's edge before they reach your server. The most effective method against bots that ignore robots.txt directives.

  1. 1Add your domain to Cloudflare (free plan) and update your DNS nameservers.
  2. 2In your Cloudflare dashboard → Security → WAF → Custom Rules → Create rule.
  3. 3Click Edit expression and paste:
(http.user_agent contains "GPTBot") or
(http.user_agent contains "ClaudeBot") or
(http.user_agent contains "anthropic-ai") or
(http.user_agent contains "Google-Extended") or
(http.user_agent contains "Bytespider") or
(http.user_agent contains "CCBot") or
(http.user_agent contains "PerplexityBot") or
(http.user_agent contains "meta-externalagent") or
(http.user_agent contains "DeepSeekBot") or
(http.user_agent contains "MistralBot") or
(http.user_agent contains "xAI-Bot") or
(http.user_agent contains "Diffbot") or
(http.user_agent contains "cohere-ai") or
(http.user_agent contains "AI2Bot") or
(http.user_agent contains "DuckAssistBot") or
(http.user_agent contains "omgilibot") or
(http.user_agent contains "webzio-extended") or
(http.user_agent contains "gemini-deep-research")

Set action to Block. Joomla never processes the request.

All 25 AI Bots to Block

User agents for the robots.txt rules, .htaccess, and Cloudflare WAF:

GPTBot
ChatGPT-User
OAI-SearchBot
ClaudeBot
anthropic-ai
Google-Extended
Bytespider
CCBot
PerplexityBot
meta-externalagent
Amazonbot
Applebot-Extended
xAI-Bot
DeepSeekBot
MistralBot
Diffbot
cohere-ai
AI2Bot
Ai2Bot-Dolma
YouBot
DuckAssistBot
omgili
omgilibot
webzio-extended
gemini-deep-research

Why Joomla sites are AI training targets

Joomla powers over 2 million active websites, including government portals across Europe, Australia, and Southeast Asia. Its structured article system, category hierarchy, and clean URL patterns make it extremely efficient for bulk crawling. Joomla's default robots.txt (when present) only blocks the /administrator/ path — leaving all public content completely open to AI training bots. The Joomla default robots.txt.dist was written before AI crawlers existed. If you haven't modified it, every AI training crawler has unrestricted access to your content.

Will This Affect Joomla SEO?

Safe to block

  • ✓ Google Search rankings unaffected
  • ✓ Bing rankings unaffected
  • ✓ Joomla's SEF URL system unaffected
  • ✓ Joomla sitemap extension unaffected
  • ✓ Article meta descriptions unaffected
  • ✓ Open Graph / social sharing unaffected

Consider before blocking

  • ⚠ OAI-SearchBot → removes from ChatGPT Search
  • ⚠ PerplexityBot → removes from Perplexity citations
  • ⚠ DuckAssistBot → removes from Duck.ai answers
  • Government and institutional Joomla publishers may want AI search visibility for public information. Consider blocking training bots (CCBot, GPTBot) while allowing AI search bots (OAI-SearchBot, PerplexityBot).

Check your current Joomla robots.txt

The Joomla default robots.txt.dist contains these rules — meaning nothing is protected by default:

# Default Joomla! robots.txt (blocks nothing useful for AI)
User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /libraries/
# ... etc.
# NO AI bot rules — all public content accessible

Visit https://yourdomain.com/robots.txt — if you see only Joomla system paths being blocked, your content is open to every AI training crawler.

Frequently Asked Questions

Where is the robots.txt file in Joomla?

In the Joomla root directory — the same folder as index.php and configuration.php. On shared hosting, typically at /public_html/robots.txt. On a VPS: /var/www/html/robots.txt. Access it via FTP/SFTP, SSH, or your hosting control panel file manager (cPanel File Manager, Plesk, etc.). Also check for robots.txt.dist in the same folder — this is the factory default but is not served.

How do I add a noai meta tag to every Joomla page?

Edit your active template's index.php. Joomla 4: Admin → System → Site Templates → your active template → Open Template Editor → index.php. Joomla 3: Extensions → Templates → Templates → your template → Editor tab → index.php. Find </head> and add <meta name="robots" content="noai, noimageai"> immediately before it. Save.

Does Joomla have a built-in robots.txt editor?

Not a dedicated one. The robots.txt is a static file you edit via FTP, SSH, or your hosting file manager. Joomla 4's template editor only manages template files (not the root robots.txt). For a within-admin approach, use your hosting control panel's file manager, or install a file manager extension like eXtplorer.

I only have cPanel access — how do I edit robots.txt?

Log into cPanel → Files → File Manager. Navigate to your Joomla root (usually public_html). Find robots.txt, right-click → Edit. Add the AI bot Disallow rules from this guide. Click Save Changes. Done — no FTP or SSH needed.

Will blocking AI bots affect Joomla's SEO?

No. Blocking GPTBot, ClaudeBot, CCBot, Google-Extended, and other AI training bots has zero effect on Googlebot or Bingbot. Your Joomla site's rankings, SEF URLs, category pages, and article meta settings all continue working normally. Joomla's XML Sitemap and Open Graph tags are completely unaffected.

Why does Joomla's default robots.txt not block AI bots?

The default robots.txt.dist was written to protect Joomla system directories (/administrator/, /libraries/, etc.) from indexing. It was created before AI training crawlers existed as a category. The file has not been updated to account for GPTBot, CCBot, ClaudeBot, or any AI training bot. As of 2026, you need to manually add AI bot rules — they are not included in any Joomla default or update.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.