Skip to content
Guides/OpenLiteSpeed
OpenLiteSpeed · LiteSpeed · Web Server · cPanel8 min read

How to Block AI Bots on OpenLiteSpeed: Complete 2026 Guide

OpenLiteSpeed (OLS) is a high-performance open-source web server popular in shared hosting environments, cPanel/WHM setups, and as a faster drop-in replacement for Apache. It supports Apache-compatible .htaccess files — most Apache bot blocking rules work unchanged. This guide covers .htaccess rules, the WebAdmin console, mod_security, and LSCache considerations.

Bot blocking via .htaccess

OpenLiteSpeed supports Apache-compatible .htaccess files. Enable them first in the WebAdmin console if not already active:

Enable .htaccess in WebAdmin: WebAdmin Console → Virtual Hosts → [your vhost] → General → Enable .htaccess → Yes → Save → Graceful Restart.

.htaccess — RewriteRule approach (most compatible)

# .htaccess — place in your document root
RewriteEngine On

# Block AI training and scraping bots by User-Agent
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (DiffBot|SemrushBot|MJ12bot|DataForSeoBot|magpie-crawler) [NC]
RewriteRule .* - [F,L]
Flag explanation:
  • [NC] — No Case (case-insensitive matching)
  • [OR] — combine multiple RewriteCond with OR (default is AND)
  • [F] — Forbidden (returns 403)
  • [L] — Last rule (stop processing further rules)
  • .* - — match any URL, no substitution (dash = no rewrite)

Alternative — mod_setenvif approach (cleaner for many bots)

# .htaccess
SetEnvIfNoCase User-Agent "GPTBot" bad_bot
SetEnvIfNoCase User-Agent "ClaudeBot" bad_bot
SetEnvIfNoCase User-Agent "anthropic-ai" bad_bot
SetEnvIfNoCase User-Agent "CCBot" bad_bot
SetEnvIfNoCase User-Agent "Google-Extended" bad_bot
SetEnvIfNoCase User-Agent "AhrefsBot" bad_bot
SetEnvIfNoCase User-Agent "Bytespider" bad_bot
SetEnvIfNoCase User-Agent "Amazonbot" bad_bot
SetEnvIfNoCase User-Agent "Diffbot" bad_bot
SetEnvIfNoCase User-Agent "FacebookBot" bad_bot
SetEnvIfNoCase User-Agent "cohere-ai" bad_bot
SetEnvIfNoCase User-Agent "PerplexityBot" bad_bot
SetEnvIfNoCase User-Agent "YouBot" bad_bot

<RequireAll>
    Require all granted
    Require not env bad_bot
</RequireAll>
mod_setenvif availability: The SetEnvIfNoCase + <RequireAll> approach requires mod_authz_core and mod_setenvif. Both are typically available in OpenLiteSpeed, but verify in WebAdmin → Server → Modules. If unavailable, use the RewriteRule approach above.

Server-level rules via WebAdmin

For rules that apply across all virtual hosts and fire before LSCache, add them at the server level in WebAdmin:

  1. Log in to WebAdmin Console (typically https://your-server:7080)
  2. Navigate to Server ConfigurationRewrite
  3. Set Enable Rewrite to Yes
  4. In the Rewrite Rules field, add:
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot) [NC]
RewriteRule .* - [F,L]
  1. Click Save
  2. Apply changes: ActionsGraceful Restart
Server-level vs vhost-level: Server-level rewrite rules in WebAdmin apply to all virtual hosts and fire before cache lookup. Virtual host rules (in.htaccess or the vhost config) fire after. For blocking AI bots efficiently across all sites, server-level rules are more reliable with LSCache.

X-Robots-Tag via WebAdmin or .htaccess

Option 1: WebAdmin Custom Response Headers (recommended)

  1. WebAdmin Console → Virtual Hosts → [your vhost] → General
  2. Scroll to Custom Response Headers
  3. Add: X-Robots-Tag: noai, noimageai
  4. Save → Graceful Restart

Or at the server level (applies to all vhosts): Server Configuration GeneralCustom Response Headers.

Option 2: .htaccess with mod_headers

# .htaccess
Header always set X-Robots-Tag "noai, noimageai"
mod_headers requirement: The Header directive requires mod_headers. Verify it's loaded in WebAdmin → Server → Modules. If not available, use the WebAdmin Custom Response Headers GUI instead.

robots.txt as a static file

Place robots.txt in your document root. OpenLiteSpeed serves static files automatically — no additional configuration needed.

User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /

Sitemap: https://example.com/sitemap.xml

mod_security rules

If ModSecurity is installed on your OpenLiteSpeed server (available as a module), you can add WAF rules to block AI bots. ModSecurity rules fire at the server level before .htaccess processing:

# /etc/modsecurity/modsecurity.conf or a custom rules file
# Block AI training bots by User-Agent

SecRule REQUEST_HEADERS:User-Agent "@rx (?i)(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot)"     "id:10001,    phase:1,    deny,    status:403,    log,    msg:'AI bot blocked',    logdata:'Matched UA: %{REQUEST_HEADERS.User-Agent}'"

Enable in WebAdmin: Server → Security → ModSecurity → Enable ModSecurity → Yes. Add your rules file to the ModSecurity Rules path.

LSCache and bot blocking

LSCache (LiteSpeed Cache) serves cached responses before most request processing — including .htaccess rules. This means blocked bots may still receive cached pages. Three strategies to handle this:

Strategy 1: Server-level rewrite rules (recommended)

Place bot-blocking rules at the server level in WebAdmin (not in .htaccess). Server-level rules fire before the cache layer.

Strategy 2: LSCache bot exclusion config

Configure LSCache to not serve cached responses to known bots. In your .htaccess (WordPress/LiteSpeed Cache plugin) or WebAdmin LSCache config:

# wp-config.php or .htaccess (LSCache WordPress plugin)
# The LSCache plugin has built-in bot detection settings:
# WP Admin → LiteSpeed Cache → Cache → Do Not Cache → Bot IPs/UAs

Strategy 3: ModSecurity at the server level

ModSecurity fires in phase:1 (request headers), before cache lookup. Use ModSecurity rules for the most reliable blocking regardless of cache state.

Testing: After configuring, verify blocking works by simulating a bot UA: curl -A "GPTBot/1.0" https://yoursite.com. Should return 403. If it returns 200, the block is being bypassed by cache — move rules to server level.

Full .htaccess example

# .htaccess — OpenLiteSpeed / LiteSpeed Enterprise
# Place in your document root

# ── Enable rewriting ─────────────────────────────────────────────────────────
RewriteEngine On

# ── Block AI bots (RewriteRule approach) ─────────────────────────────────────
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (cohere-ai|PerplexityBot|YouBot) [NC]
RewriteRule .* - [F,L]

# ── X-Robots-Tag (requires mod_headers) ──────────────────────────────────────
Header always set X-Robots-Tag "noai, noimageai"

# ── Security headers ──────────────────────────────────────────────────────────
Header always set X-Content-Type-Options "nosniff"
Header always set X-Frame-Options "SAMEORIGIN"
Header always set Referrer-Policy "strict-origin-when-cross-origin"

# ── HTTPS redirect ────────────────────────────────────────────────────────────
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://%{HTTP_HOST}/$1 [R=301,L]

# ── WWW redirect ──────────────────────────────────────────────────────────────
RewriteCond %{HTTP_HOST} !^www. [NC]
RewriteRule ^(.*)$ https://www.%{HTTP_HOST}/$1 [R=301,L]

# ── Static file caching ───────────────────────────────────────────────────────
<FilesMatch ".(css|js|png|jpg|jpeg|gif|ico|woff2|svg)$">
    Header set Cache-Control "public, max-age=2592000"
</FilesMatch>

Verify and reload

# Test .htaccess is being read (check error log)
tail -f /usr/local/lsws/logs/error.log

# Graceful restart via CLI
/usr/local/lsws/bin/lswsctrl restart

# Or via WebAdmin: Actions → Graceful Restart

# Test bot blocking
curl -A "GPTBot/1.0" https://yoursite.com
# Expected: HTTP/1.1 403 Forbidden

FAQ

How do I block AI bots by User-Agent on OpenLiteSpeed?

Use RewriteCond %{HTTP_USER_AGENT} with [NC] flag and RewriteRule .* - [F,L] in .htaccess. Enable .htaccess support first in WebAdmin. For pre-cache blocking, add the same rules at the server level in WebAdmin → Server Configuration → Rewrite.

Does OpenLiteSpeed support .htaccess files?

Yes — enable in WebAdmin: Virtual Hosts → [vhost] → General → Enable .htaccess → Yes. Rewrite rules, access control, and header directives are Apache-compatible. Not all Apache modules are supported — check OLS documentation for the full list.

How do I add X-Robots-Tag on OpenLiteSpeed?

Via WebAdmin: Virtual Host → General → Custom Response Headers → add X-Robots-Tag: noai, noimageai. Or via .htaccess: Header always set X-Robots-Tag "noai, noimageai" (requires mod_headers).

What is the difference between OpenLiteSpeed and LiteSpeed Enterprise?

OLS is free and open-source. LiteSpeed Enterprise is the commercial version with cPanel/WHM integration, HTTP/3, QUIC, and enterprise support. Bot blocking configuration is identical — .htaccess rules work the same in both.

Does LSCache affect bot blocking?

Yes — LSCache may serve cached responses before .htaccess rules fire. Fix: add bot-blocking rules at the server level in WebAdmin (fires before cache), or use ModSecurity (phase:1, fires before cache), or configure LSCache to not cache bot requests.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.