How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

Does Perplexity respect robots.txt?

Perplexity states they respect robots.txt. However, in 2024, Wired and other publications documented cases where Perplexity appeared to scrape paywalled and disallowed content. Perplexity attributed some of this to a third-party crawler they were using, not PerplexityBot itself. Since then, Perplexity has improved compliance. For sites that want certainty, server-level IP blocking or Cloudflare WAF rules provide stronger guarantees than robots.txt alone.

What is the difference between PerplexityBot and perplexity-user?

PerplexityBot is the background indexing crawler that Perplexity runs to build its knowledge base. perplexity-user is the real-time crawler used when a Perplexity user runs a search that triggers a live page fetch. Blocking PerplexityBot stops background indexing. Blocking perplexity-user stops real-time fetches during searches. Most publishers who want full protection should block both.

If I block PerplexityBot, will my site stop appearing in Perplexity answers?

Yes — blocking PerplexityBot reduces your chances of appearing as a citation in Perplexity answers significantly. Perplexity's answers are built from crawled and indexed content. If you block their crawler, your pages won't be in their index. For publishers who rely on referral traffic from AI search, this is a meaningful tradeoff to weigh. Perplexity's Publisher Program offers an alternative: opt in to share content and receive a revenue share instead.

Does blocking PerplexityBot affect Perplexity's ability to show my site in search results?

Yes. Unlike traditional search engines that index your site to direct traffic to you, Perplexity summarises content and often answers the user's question directly with a citation. Blocking their crawler means your content won't be summarised. Some publishers see blocking as protecting their content; others view Perplexity citation traffic as a new distribution channel. The right choice depends on your site's monetisation model.

Perplexity AICompliance ControversyAI Search

How to Block PerplexityBot: Stop Perplexity AI from Scraping Your Site

PerplexityBot was at the centre of a 2024 crawler controversy after publishers documented it scraping paywalled content. Here's how to block it — and the real tradeoff for publishers who want AI search visibility.

Updated March 2026

The robots.txt Controversy (2024)

In mid-2024, Wired, Forbes, and other publications documented cases where Perplexity appeared to summarise content from paywalled and robots.txt-disallowed pages. Perplexity attributed some incidents to a third-party crawler they were using (not PerplexityBot itself). Since then, Perplexity has stated improved compliance — but for publishers who want certainty, server-level blocking remains the safest option.

Perplexity Runs Two Separate Agents

Blocking one does not block the other. Full protection requires blocking both:

PerplexityBot

Background indexing crawler. Systematically crawls the web to build Perplexity's knowledge base and search index. Runs continuously, not triggered by users.

perplexity-user

Real-time search crawler. Fetches pages live when a user's query triggers a fresh page read. Similar to ChatGPT-User in that it's request-triggered, not autonomous.

How to Block PerplexityBot in robots.txt

Add both agents to your robots.txt for full coverage:

robots.txtBlock both Perplexity agents

User-agent: PerplexityBot
Disallow: /

User-agent: perplexity-user
Disallow: /

For server-level blocking (stronger guarantee than robots.txt), add to nginx:

nginx.confServer-level block

if ($http_user_agent ~* "PerplexityBot|perplexity-user") {
    return 403;
}

Cloudflare WAF option

In Cloudflare: Security → WAF → Custom Rules → Create Rule. Match: http.user_agent contains "PerplexityBot" or perplexity-user. Action: Block. This fires before the request reaches your origin.

The Visibility Tradeoff

Unlike traditional search engines, Perplexity often answers questions directly — summarising your content rather than linking to it. This creates an unusual tradeoff:

Why block

• Protects paywalled content from AI summarisation
• Prevents traffic cannibalism (Perplexity answers so users don't click through)
• Compliance concerns around data use
• Historical robots.txt violations make trust lower

Why allow

• Citation links drive referral traffic to your site
• Presence in AI search results grows your brand reach
• Perplexity's Publisher Program offers revenue share
• Blocking may hurt SEO if AI search becomes primary discovery

Publisher Program: Perplexity offers an opt-in publisher arrangement where verified publishers get attributed citations and a share of Pro subscription revenue. This is an alternative to blocking — you allow crawling in exchange for traffic and compensation.

Frequently Asked Questions

Does Perplexity now reliably respect robots.txt?

Perplexity states that PerplexityBot respects robots.txt Disallow directives. The 2024 incidents were primarily attributed to a third-party crawler they were using, not PerplexityBot itself. Current compliance is considered improved. That said, if you want certainty, server-level blocking (nginx, Cloudflare) is more reliable than robots.txt alone.

What user agent strings does PerplexityBot use?

The primary user agent is: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot). The real-time agent is perplexity-user. In robots.txt, use "PerplexityBot" and "perplexity-user" as the tokens.

Will blocking PerplexityBot remove my site from Perplexity search entirely?

Yes — blocking PerplexityBot means Perplexity cannot crawl your pages, so they won't appear in Perplexity's answers or search results. Existing indexed content may remain briefly but will eventually expire from their index as their crawl cycle refreshes.

Can I block PerplexityBot for paywalled content only?

Yes. Use path-specific Disallow directives: Disallow: /premium/ or Disallow: /members/ — this lets Perplexity index your free pages while blocking access to subscriber-only content.

Related Guides

How to Block OAI-SearchBot

OpenAI's competing AI search

How to Block DuckAssistBot

DuckDuckGo's AI assistant

How to Block Bingbot for Copilot

Microsoft's AI search

robots.txt for AI Bots (Complete Guide)

51+ crawlers, full reference table

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.