Amazon runs three distinct crawlers — each with different purposes and different privacy implications. Most sites should only block one of them. Here's how to tell them apart and what to do.
Unlike most AI companies which run a single crawler, Amazon operates three separate bots with distinct purposes. Getting these wrong — especially blocking Amzn-SearchBot when you're an e-commerce brand — can hurt visibility in Amazon's AI experiences.
Amazon's primary web research and AI training crawler. Collects web content to improve Amazon's machine learning models — Alexa's language understanding, question answering, and general Amazon AI capabilities. This is the bot you're opting out of when you want to stop Amazon from training on your content.
Mozilla/5.0 (compatible; Amazonbot/0.1; +https://developer.amazon.com/amazonbot)Specifically powers Amazon's search experiences: Rufus (Amazon's AI shopping assistant, used by hundreds of millions of Amazon customers) and Alexa knowledge and shopping results. Amazon explicitly states Amzn-SearchBot does not crawl for generative AI model training.
Blocking Amzn-SearchBot means your products, content, or brand won't appear in Rufus AI answers when customers ask shopping questions on Amazon.com. For most e-commerce brands and publishers that want Amazon visibility, this is the wrong bot to block.
Amzn-SearchBotWhen a customer asks Alexa a question that needs real-time information, Amzn-User fetches live content from the web on that user's behalf — like a browser acting for a real person. Amazon states it does not crawl for AI training. This represents actual user intent: someone actively asking Alexa a question.
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amzn-User/0.1) Chrome/119.0.6045.214 Safari/537.36robots.txt# Block Amazon AI training crawler only # Allows Amzn-SearchBot (Rufus + Alexa search) and Amzn-User (live queries) User-agent: Amazonbot Disallow: /
Stops AI model training without affecting your visibility in Rufus AI answers or Alexa experiences.
# Block all Amazon crawlers User-agent: Amazonbot Disallow: / User-agent: Amzn-SearchBot Disallow: / User-agent: Amzn-User Disallow: /
Full Amazon block. Prevents training, Rufus/Alexa search indexing, and live query access. Appropriate if you want no Amazon AI touchpoints.
# Block Amazon AI training User-agent: Amazonbot Disallow: / # Block other major AI training crawlers User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Google-Extended Disallow: / User-agent: PerplexityBot Disallow: / User-agent: meta-externalagent Disallow: / User-agent: MistralBot Disallow: / User-agent: xAI-Bot Disallow: / User-agent: DeepSeekBot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: CCBot Disallow: / User-agent: Bytespider Disallow: / # Search engines — unaffected User-agent: Googlebot Allow: / User-agent: Bingbot Allow: / User-agent: Amzn-SearchBot Allow: /
import { MetadataRoute } from 'next';
export default function robots(): MetadataRoute.Robots {
return {
rules: [
// Block Amazon AI training only
{ userAgent: 'Amazonbot', disallow: ['/'] },
// Allow Amazon search/Alexa experiences (Rufus, etc.)
{ userAgent: 'Amzn-SearchBot', allow: ['/'] },
{ userAgent: 'Amzn-User', allow: ['/'] },
// Block other AI training crawlers
{ userAgent: 'GPTBot', disallow: ['/'] },
{ userAgent: 'ClaudeBot', disallow: ['/'] },
{ userAgent: 'anthropic-ai', disallow: ['/'] },
{ userAgent: 'Google-Extended', disallow: ['/'] },
{ userAgent: 'PerplexityBot', disallow: ['/'] },
{ userAgent: 'CCBot', disallow: ['/'] },
// Allow search engines
{ userAgent: 'Googlebot', allow: ['/'] },
{ userAgent: '*', allow: ['/'] },
],
sitemap: 'https://yoursite.com/sitemap.xml',
};
}For server-level enforcement that doesn't depend on Amazon honoring robots.txt:
# Block Amazonbot (AI training) — hard 403
if ($http_user_agent ~* "Amazonbot") {
return 403;
}
# Optionally block Amzn-SearchBot too
# if ($http_user_agent ~* "Amzn-SearchBot") {
# return 403;
# }Returns HTTP 403 before the request reaches your application. Combine with robots.txt for defense in depth.
(http.user_agent contains "Amazonbot")
Set the action to Block. To also block Amzn-SearchBot:
(http.user_agent contains "Amazonbot") or (http.user_agent contains "Amzn-SearchBot")
Cloudflare Dashboard → Security → WAF → Custom Rules → Create rule
Rufus is Amazon's AI shopping assistant, launched in 2024 and now available to hundreds of millions of Amazon customers. When a customer asks Rufus a question — "What's the best laptop for video editing?" or "Compare these two coffee makers" — Rufus draws on content indexed by Amzn-SearchBot, including product reviews, buying guides, and editorial content from the open web.
If you publish product reviews, buying guides, comparison content, or any content that Amazon customers might ask about — blocking Amzn-SearchBot means Rufus won't surface your site as a source. That's potential referral traffic and brand visibility you're opting out of. Blocking Amazonbot (the training crawler) carries no such cost.
For most content publishers, the right call is: block Amazonbot, allow Amzn-SearchBot. You stop contributing to Amazon's AI training datasets while keeping your content visible to Rufus users and Alexa queries.
# Check nginx access logs for Amazon bots grep "Amazonbot" /var/log/nginx/access.log | tail -20 grep "Amzn-SearchBot" /var/log/nginx/access.log | tail -10 # Confirm Amazonbot fetched robots.txt (then stopped) grep "Amazonbot" /var/log/nginx/access.log | grep "robots.txt" # If server-level blocked — confirm 403s grep "Amazonbot" /var/log/nginx/access.log | grep " 403 " # Check published IPs against your access logs # Amazon publishes Amazonbot IPs at: # https://developer.amazon.com/amazonbot/live-ip-addresses/
Seeing Amazonbot fetch /robots.txt followed by no content requests confirms the block is working. Amazon updates its robots.txt cache every 30 days — allow up to 24 hours for changes to take effect.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.
Scan My Site Free →