WooCommerce stores are high-value targets for AI crawlers. Your product descriptions, pricing, reviews, and structured catalog are exactly what training datasets are built from — and the WooCommerce REST API at /wp-json/wc/ serves it all as machine-readable JSON. This guide covers every blocking method from basic robots.txt to server-level .htaccess rules.
WooCommerce REST API risk: Unlike a static site, WooCommerce exposes /wp-json/wc/v3/products, /wp-json/wc/v3/categories, and similar endpoints publicly (no authentication required by default). These return your full product catalog as structured JSON — a perfect AI training dataset. Block /wp-json/wc/ in robots.txt and in .htaccess.
| Method |
|---|
| robots.txt — protect shop & product pages Always — first thing to configure |
| Block WooCommerce REST API (/wp-json/wc/) If you have a public product catalog |
| noai meta tag via functions.php For product pages and shop pages |
| .htaccess server-level blocking Apache hosting — most effective |
| Cloudflare WAF rule Managed/shared hosting, or Kinsta/WP Engine |
WooCommerce generates several URL patterns worth protecting: /shop/, /product/, /product-category/, /cart/, /checkout/, and the REST API. Cart and checkout should always be disallowed — even for regular search engines.
In your WordPress dashboard: SEO → Tools → File Editor → Edit robots.txt. Yoast SEO (free) and Yoast WooCommerce SEO both expose this editor.
User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Google-Extended Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: meta-externalagent Disallow: / User-agent: Diffbot Disallow: / # Block all other AI bots from WooCommerce-specific paths User-agent: * Disallow: /cart/ Disallow: /checkout/ Disallow: /my-account/ Disallow: /wp-json/wc/
Connect to your server root (same directory as wp-config.php). Edit or create robots.txt directly. If Yoast has previously written a robots.txt, your edits will be overwritten next time Yoast regenerates — use option A or disable Yoast's robots.txt management (SEO → Search Appearance → Advanced → disable).
Yoast conflict warning: If Yoast SEO is active, it generates robots.txt dynamically on certain requests and may overwrite your physical file. Either use Yoast's editor (Option A) or go to SEO → Search Appearance → Advanced and disable "Yoast SEO manages robots.txt" before editing the physical file.
The WooCommerce REST API serves your product catalog, categories, tags, orders (public endpoints only), and attributes as structured JSON. By default, endpoints like /wp-json/wc/v3/products are publicly accessible without authentication. AI training crawlers will index these JSON responses.
User-agent: GPTBot Disallow: / Disallow: /wp-json/wc/ # OR if you want to only protect the API (not block all pages): User-agent: GPTBot Disallow: /wp-json/wc/ Disallow: /wp-json/wc/v3/
For a hard server-level block that stops AI bots from accessing the API regardless of robots.txt compliance (Bytespider and Diffbot often ignore robots.txt):
# Block AI bots from WooCommerce REST API
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/wp-json/wc/ [NC]
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Diffbot|cohere-ai|AI2Bot|DeepSeekBot|MistralBot|Amazonbot|Applebot-Extended|xAI-Bot|OAI-SearchBot|ChatGPT-User) [NC]
RewriteRule ^ - [F,L]
</IfModule>
# BEGIN WordPress
# @see https://wordpress.org/documentation/article/htaccess/
<IfModule mod_rewrite.c>
RewriteEngine On
...
</IfModule>
# END WordPressRequire WordPress auth for all REST endpoints: If your theme or plugins don't need public REST API access, you can also require authentication for all /wp-json/ requests. Add to functions.php: add_filter('rest_authentication_errors', fn($r) => is_null($r) ? new WP_Error('rest_forbidden', 'Forbidden', ['status' => 401]) : $r); — but test thoroughly as this can break Gutenberg and page builders.
Add <meta name="robots" content="noai, noimageai"> to product pages and the shop page. This tells compliant AI crawlers not to train on the page content or product images.
Add to your child theme's functions.php. Uses WooCommerce's is_woocommerce() conditional to target only shop, product, and archive pages.
<?php
/**
* Add noai meta tag to WooCommerce pages
* Targets: shop, product, product category/tag pages
*/
function openshadow_noai_on_woocommerce() {
if ( function_exists( 'is_woocommerce' ) && is_woocommerce() ) {
echo '<meta name="robots" content="noai, noimageai">' . PHP_EOL;
}
}
add_action( 'wp_head', 'openshadow_noai_on_woocommerce' );<?php
function openshadow_noai_global() {
echo '<meta name="robots" content="noai, noimageai">' . PHP_EOL;
}
add_action( 'wp_head', 'openshadow_noai_global' );<?php
/**
* Add noai meta tag to products unless explicitly opted in.
* Custom product meta field: _allow_ai_training (value: '1' = allow)
*/
function openshadow_product_noai() {
if ( ! is_singular( 'product' ) ) {
return;
}
$product_id = get_the_ID();
$allow_ai = get_post_meta( $product_id, '_allow_ai_training', true );
if ( '1' !== $allow_ai ) {
echo '<meta name="robots" content="noai, noimageai">' . PHP_EOL;
}
}
add_action( 'wp_head', 'openshadow_product_noai' );Apache-based hosting (most shared hosting) lets you block AI bots at the server level before WordPress even loads. This stops bots that ignore robots.txt, like Bytespider and Diffbot. Add rules above the # BEGIN WordPress block.
# Block AI training and scraping bots
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|webzio-extended|gemini-deep-research) [NC]
RewriteRule ^ - [F,L]
</IfModule>
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress<IfModule mod_rewrite.c>
RewriteEngine On
# Block AI bots from shop and product pages only
RewriteCond %{REQUEST_URI} ^/(shop|product|product-category|product-tag|wp-json/wc) [NC]
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|Diffbot|meta-externalagent|cohere-ai) [NC]
RewriteRule ^ - [F,L]
</IfModule>Hosting compatibility: This works on Apache (most shared hosts: Bluehost, SiteGround, Hostinger, GoDaddy, DreamHost). For nginx (Kinsta, Flywheel, WP Engine), use the Cloudflare WAF approach or ask your host to add nginx if ($http_user_agent) rules. nginx does not read .htaccess.
Cloudflare's WAF blocks bots before traffic reaches your WooCommerce server. This is the recommended approach for managed WordPress hosts (Kinsta, WP Engine, Flywheel, Pressable) that don't expose .htaccess or nginx config files.
Security → WAF → Custom Rules → Create rule. Set action to Block.
( http.user_agent contains "GPTBot" or http.user_agent contains "ChatGPT-User" or http.user_agent contains "OAI-SearchBot" or http.user_agent contains "ClaudeBot" or http.user_agent contains "anthropic-ai" or http.user_agent contains "Google-Extended" or http.user_agent contains "Bytespider" or http.user_agent contains "CCBot" or http.user_agent contains "PerplexityBot" or http.user_agent contains "meta-externalagent" or http.user_agent contains "Diffbot" or http.user_agent contains "cohere-ai" or http.user_agent contains "AI2Bot" or http.user_agent contains "DeepSeekBot" or http.user_agent contains "MistralBot" or http.user_agent contains "Amazonbot" or http.user_agent contains "Applebot-Extended" or http.user_agent contains "xAI-Bot" or http.user_agent contains "omgili" or http.user_agent contains "omgilibot" or http.user_agent contains "webzio-extended" or http.user_agent contains "gemini-deep-research" )
Use this expression to block AI bots from product/shop pages and the REST API while allowing them to crawl your blog and informational content:
(
(
http.request.uri.path contains "/shop" or
http.request.uri.path contains "/product" or
http.request.uri.path contains "/cart" or
http.request.uri.path contains "/checkout" or
http.request.uri.path contains "/wp-json/wc/"
)
and
(
http.user_agent contains "GPTBot" or
http.user_agent contains "ClaudeBot" or
http.user_agent contains "CCBot" or
http.user_agent contains "Bytespider" or
http.user_agent contains "Diffbot" or
http.user_agent contains "Google-Extended"
)
)Free plan note: Cloudflare's free plan supports 5 custom WAF rules. If you need more, use a single rule with all user agents combined (as shown above). Paid plans (Pro $20/mo) support more complex rule sets and Bot Fight Mode, which automatically identifies and blocks known bad bots including many AI crawlers.
25 bots that actively crawl e-commerce sites. Diffbot is particularly aggressive on WooCommerce — it's a commercial data broker that resells scraped product catalogs.
| Bot | Operator |
|---|---|
| GPTBot | OpenAI |
| ChatGPT-User | OpenAI |
| OAI-SearchBot | OpenAI |
| ClaudeBot | Anthropic |
| anthropic-ai | Anthropic |
| Google-Extended | |
| Bytespider | ByteDance |
| CCBot | Common Crawl |
| PerplexityBot | Perplexity |
| meta-externalagent | Meta |
| Amazonbot | Amazon |
| Applebot-Extended | Apple |
| xAI-Bot | xAI |
| DeepSeekBot | DeepSeek |
| MistralBot | Mistral |
| Diffbot | Diffbot |
| cohere-ai | Cohere |
| AI2Bot | Allen Institute |
| Ai2Bot-Dolma | Allen Institute |
| YouBot | You.com |
| DuckAssistBot | DuckDuckGo |
| omgili | Webz.io |
| omgilibot | Webz.io |
| webzio-extended | Webz.io |
| gemini-deep-research |
| Method | Stops Bytespider? | Stops Diffbot? |
|---|---|---|
| robots.txt | No (ignores) | No (ignores) |
| noai meta tag | No | No |
| .htaccess | Yes | Yes |
| Cloudflare WAF | Yes | Yes |
No. Blocking AI training bots (GPTBot, CCBot, ClaudeBot) does not affect Google or Bing search rankings. These are separate crawlers. Your products will still be indexed by Googlebot and Bingbot — those are not in the block list unless you add them explicitly.
It depends. Block training bots (GPTBot, CCBot) if you want to prevent AI companies from using your product descriptions. Allow AI search bots (OAI-SearchBot, PerplexityBot) if you want your products to appear in AI shopping recommendations. You can configure both selectively in robots.txt — different rules per user-agent.
Yes. /wp-json/wc/v3/products returns your full product catalog as structured JSON — no authentication needed by default. This is high-value training data. Block /wp-json/wc/ in robots.txt and optionally in .htaccess or Cloudflare WAF.
Yoast SEO exposes a robots.txt editor at SEO → Tools → File Editor. This is the easiest way to add AI bot disallow rules without SSH access. However, Yoast may periodically overwrite the physical robots.txt — if you edit the physical file instead, disable Yoast's robots.txt management in SEO → Search Appearance → Advanced first.
Priority order: (1) /wp-json/wc/ — structured product data, highest AI training value; (2) /product/ and /shop/ — product descriptions and images; (3) /cart/ and /checkout/ — always disallow (no training value, exposes session patterns).
Kinsta, WP Engine, and Flywheel use nginx and don't expose .htaccess. Your options are: (1) Yoast SEO robots.txt editor for signals-only blocking; (2) Cloudflare WAF if your DNS proxies through Cloudflare (free plan works); (3) Ask your host's support to add nginx user-agent rules — most managed hosts will do this on request.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.