How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

What is the difference between traditional SEO and AI search optimisation (GEO)?

Traditional SEO focuses on keyword placement, link authority, and technical site health to rank in a list of blue links. Generative Engine Optimisation (GEO) focuses on being cited inside AI-generated answers. The key differences: AI search rewards structured, factual, citable content over keyword density; authority signals still matter but semantic clarity counts more; and being blocked from AI crawlers (even accidentally) means zero visibility in AI answers.

Guides›AI Search RankingUpdated March 2026 · 8 min read

How AI Search Engines Decide What to Surface

What Perplexity, ChatGPT Search, Google AI Overviews, and Claude look for when choosing which pages to cite — and exactly how to make sure yours is one of them.

PerplexityChatGPT SearchGoogle AI OverviewsGEOAI citation

In This Guide

1.How AI Search Works (vs Traditional SEO)
2.The 4 Platforms: Perplexity, ChatGPT, Google, Claude
3.The 7 Ranking Signals That Actually Matter
4.Training Bots vs Search Bots: Critical Difference
5.AI Search Optimisation Checklist
6.FAQ

How AI Search Works (vs Traditional SEO)

Traditional search returns a ranked list of links. You click, you land on a page. AI search — Perplexity, ChatGPT Search, Google AI Overviews — returns a synthesised answer with cited sources. The user often never visits your page at all.

This changes everything about what "visibility" means. In traditional SEO, ranking #7 still sends you clicks. In AI search, only cited sources get attributed — and citation doesn't always correlate with traditional ranking.

Traditional SEO vs AI Search (GEO)

Goal

Rank in blue links list

Get cited in AI answer

Signal weight

Backlinks, keyword density

Content clarity, structure, trustworthiness

Crawl frequency

Periodic (days/weeks)

Real-time (Perplexity) or training-based

Zero-click impact

Featured snippets only

Every AI answer is zero-click by default

Blocking mistake

Googlebot block = no rank

Blocking search bots = invisible in AI answers

Schema markup

Helpful for rich results

Critical for AI understanding your content type

The emerging term for AI search optimisation is GEO (Generative Engine Optimisation), sometimes called AEO (Answer Engine Optimisation). It is not a replacement for SEO — it is a layer on top. Sites with strong technical SEO foundations perform better in AI search too. But there are specific AI-only signals that traditional SEO completely ignores.

The 4 Platforms: How Each Works

Each AI search platform has a different architecture. Understanding the difference helps you prioritise what to fix first.

🔍

Perplexity AI

Real-time web search + synthesis

PerplexityBot

Perplexity crawls the live web before answering every query. It uses PerplexityBot (and sometimes anthropic-ai via Claude models) to fetch pages in real-time, then synthesises across multiple sources. This means your page must be accessible right now — not just indexed historically.

What it prioritises:

✓PerplexityBot not blocked in robots.txt — non-negotiable
✓Fast page load (slow pages are skipped under time pressure)
✓Clean, parseable HTML — minimal JavaScript rendering required
✓Specific, citable facts and statistics
✓Clear heading hierarchy (H1 → H2 → H3)
✓Authoritative domain signals (age, backlinks still matter)

💬

ChatGPT Search (OAI-SearchBot)

Real-time search + trained knowledge hybrid

OAI-SearchBot

ChatGPT search uses OAI-SearchBot for real-time web retrieval and ChatGPT-User for browsing on behalf of users. It blends live search results with its pre-trained knowledge. This means even if your page isn't crawled in real-time, your brand can still appear — but citation sources are pulled from live results.

What it prioritises:

✓OAI-SearchBot allowed in robots.txt (separate from GPTBot)
✓llms.txt for brand/content context between sessions
✓High-quality backlink profile (uses Bing index signals)
✓Structured data helps parse content type and entities
✓Clear "About" and "Who wrote this" signals for trust

🔷

Google AI Overviews

Index-based synthesis (Gemini-powered)

Google-Extended

Google AI Overviews (formerly SGE) are powered by Gemini and draw from Google's existing search index. You cannot be in AI Overviews if you're not indexed by Google first. Google-Extended is the specific crawler used for Gemini training and AI features — blocking it via robots.txt opts your site out of AI Overviews entirely.

What it prioritises:

✓Google-Extended not blocked in robots.txt — if blocked, you're out of AI Overviews
✓E-E-A-T signals (Experience, Expertise, Authoritativeness, Trustworthiness)
✓Structured data (JSON-LD) for content type recognition
✓Core Web Vitals — Google AI Overviews inherit traditional ranking signals
✓Concise, direct answers near the top of the page
✓FAQPage and HowTo schema especially well-represented in AI Overviews

🟠

Claude (Anthropic)

Training-based knowledge + web tools

ClaudeBot / anthropic-ai

Claude's base knowledge comes from training data (where ClaudeBot was used to crawl). Claude also has web access tools. Unlike Perplexity, Claude doesn't cite sources in every response by default — but when used in agentic workflows or via web tools, it follows similar signals to Perplexity.anthropic-ai is the newer crawler name used for AI feature indexing.

What it prioritises:

✓llms.txt is directly supported — Anthropic's tools explicitly read it
✓High-quality, citable factual content in the training corpus
✓ClaudeBot / anthropic-ai allowed if you want future training inclusion
✓Clean text structure — Claude is highly sensitive to content clarity

The 7 Ranking Signals That Actually Matter

Across all four platforms, these are the signals with the highest leverage. Ranked by impact:

AI Search Bots Not Blocked

● Critical

This is the one. If PerplexityBot, OAI-SearchBot, or Google-Extended is blocked in your robots.txt — intentionally or accidentally — you are invisible in that platform's answers. Full stop.

The most common mistake: adding a wildcard block (User-agent: * / Disallow: /) that sweeps up search bots alongside training bots. Check your robots.txt carefully. Blocking GPTBot is fine. Blocking PerplexityBot is not.

🔧 Check your robots.txt with the Open Shadow robots.txt Analyzer.

Structured Data (JSON-LD)

● High

JSON-LD schema tells AI models what type of content they're reading. An Article schema says "this is an editorial piece by an author." A FAQPage schema says "these are questions and answers — cite them." A Product schema says "here are specs and pricing."

Without schema, AI models have to infer content type from raw text — and they often get it wrong, which means your page gets cited in the wrong context or not at all.

Priority schemas for AI search: Article, FAQPage, HowTo, Organization, Product, LocalBusiness.

llms.txt File

● High

llms.txt is a markdown file at your domain root that tells AI assistants exactly what your site is about, what content is valuable, and which pages to prioritise. It's like a site brief you write specifically for AI models.

Anthropic, Perplexity, and several AI agents explicitly read llms.txt during context-building. Sites with a well-written llms.txt get better contextual framing in AI answers — your brand identity is more consistent across AI responses.

🔧 Generate your llms.txt with the Open Shadow llms.txt Generator.

Content Clarity and Answer Structure

● High

AI search models extract the most citable, confident-sounding claims from your content. Pages that bury their answer in 500 words of preamble are less likely to be cited than pages that put the direct answer first.

Effective structure for AI citation: • Lead with the direct answer (first 100 words) • Use clear H2/H3 headings that mirror likely search queries • Include specific, verifiable facts and data points • FAQ sections are extremely powerful — they match question-intent queries 1:1 • Short, punchy paragraphs over walls of text

Sitemap.xml

● Medium

A sitemap tells all crawlers — including AI search bots — which pages exist and when they were last updated. Without a sitemap, important deep pages may never be discovered or may be discovered stale.

Most modern frameworks auto-generate sitemaps (Next.js has built-in sitemap support; WordPress has Yoast/RankMath). If you don't have one, this is the fastest fix in the list.

Meta Description and Open Graph

● Medium

Meta descriptions aren't ranking signals in traditional SEO — but in AI search, they're content signals. AI models read your meta description as a compressed summary of the page. A vague or missing description makes the model work harder to infer what the page is about.

Open Graph tags (og:title, og:description) provide a second layer of content framing. They're parsed by social crawlers, AI summary tools, and link previewers. A well-written og:description can influence how AI tools describe your page in answers.

Domain Authority and Trust Signals

● Medium

AI search isn't immune to authority. Perplexity and ChatGPT Search both weight high-authority domains more heavily when there are multiple sources making the same claim. A claim on a site with 10k backlinks beats the same claim on a new domain, all else equal.

However, authority is less dominant in AI search than in traditional SEO. A new site with excellent content structure, schema, llms.txt, and a clean robots.txt can outpunch established sites that haven't adapted to AI search signals.

Training Bots vs Search Bots: The Critical Difference

This is the single most common and most damaging mistake site owners make. Training bots and search bots are completely separate — but they are often blocked together.

Bot Type Reference

Bot Name

Type

Blocking impact

PerplexityBot

🔍 AI Search

Invisible in Perplexity answers

OAI-SearchBot

🔍 AI Search

Invisible in ChatGPT Search

ChatGPT-User

🔍 AI Search (browse)

ChatGPT can't browse your pages

Google-Extended

🔍 AI Search

Invisible in Google AI Overviews

anthropic-ai

🔍 AI Search/indexing

Claude AI features lose your content

GPTBot

🏋️ AI Training

OpenAI training only — OK to block

ClaudeBot

🏋️ AI Training

Anthropic training only — OK to block

CCBot

🏋️ AI Training

Common Crawl training — OK to block

Bytespider

🏋️ AI Training

ByteDance training — OK to block

cohere-ai

🏋️ AI Training

Cohere training — OK to block

The safe pattern: Block GPTBot, ClaudeBot, CCBot, Bytespider (training bots) — your content stays out of AI training corpora. Keep PerplexityBot, OAI-SearchBot, ChatGPT-User, Google-Extended, anthropic-ai allowed — you stay visible in AI search answers.

Use our robots.txt Generator to build exactly this configuration with one click (preset: "Block AI Training Only").

AI Search Optimisation Checklist

Work through this in order. Items near the top have the highest leverage.

1PerplexityBot, OAI-SearchBot, Google-Extended not blocked in robots.txtCritical

2llms.txt file present at domain root with accurate site descriptionHigh

3sitemap.xml present and submitted to Google Search ConsoleHigh

4JSON-LD schema on all key pages (Article, FAQPage, Organization)High

5FAQPage JSON-LD on any page with Q&A contentHigh

6meta description present and ≥50 characters on all key pagesMedium

7og:title and og:description present on all pagesMedium

8No noindex or noai in X-Robots-Tag or meta robots (unless intentional)Medium

9Direct answer placed in first 100 words on key landing pagesMedium

10H2/H3 headings written as natural questions where applicableMedium

11Author and "About" signals on article/blog contentLow

12Core Web Vitals passing (LCP <2.5s, CLS <0.1)Low

Check your score automatically

The AI Search Visibility Checker scores your site against 7 of these checks in under 10 seconds.

Check My Score →

FAQ

QHow does Perplexity decide which pages to cite?▼

Perplexity crawls the live web before answering every query. It prioritises pages that are accessible to its bot (not blocked in robots.txt), load quickly, have clear structured content, contain specific factual claims, and have a coherent heading hierarchy. Authority signals (domain age, backlinks) are a secondary factor.

QDoes having an llms.txt file help with AI search ranking?▼

It helps with citation framing more than raw ranking. llms.txt gives AI assistants a curated description of your site and its most important pages — leading to more accurate, consistent brand representation in AI answers. Anthropic's tools and several AI agents explicitly read it.

QWhat is the difference between traditional SEO and GEO (Generative Engine Optimisation)?▼

Traditional SEO targets blue-link rankings. GEO targets citations inside AI-generated answers. GEO rewards content clarity, schema markup, and AI bot access over keyword density and link quantity. The two are complementary — strong technical SEO helps GEO — but GEO requires specific additional signals.

QWill blocking AI training bots hurt my AI search ranking?▼

No. Training bots (GPTBot, CCBot, Bytespider) and search bots (PerplexityBot, OAI-SearchBot, Google-Extended) are completely separate. You can block training crawlers freely. The mistake is accidentally blocking search bots at the same time — which happens when people use overly broad wildcard rules.

QHow do I check if my site is set up for AI search?▼

Use the Open Shadow AI Search Visibility Checker (/tools/ai-visibility). It scores 7 key signals in under 10 seconds and gives you specific, prioritised fixes.

QHow long does it take for changes to affect AI search visibility?▼

Perplexity is near-real-time — changes can reflect within hours or days. ChatGPT Search re-crawls on its own schedule (typically days to weeks). Google AI Overviews reflect changes when Google re-indexes your page, which for established sites is typically days. Training-based knowledge (Claude base model) only updates with new training runs — typically months.

Related Tools & Guides

🔎

AI Search Visibility Checker

Score your site against 7 AI search signals.

📄

llms.txt Generator

Build your llms.txt in 60 seconds.

🔍

robots.txt Analyzer

Check which bots you're allowing or blocking.

📖

llms.txt Complete Guide

Full spec, templates, and implementation.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.