Skip to content
GuidesAI Search RankingUpdated March 2026 · 8 min read

How AI Search Engines Decide What to Surface

What Perplexity, ChatGPT Search, Google AI Overviews, and Claude look for when choosing which pages to cite — and exactly how to make sure yours is one of them.

PerplexityChatGPT SearchGoogle AI OverviewsGEOAI citation

How AI Search Works (vs Traditional SEO)

Traditional search returns a ranked list of links. You click, you land on a page. AI search — Perplexity, ChatGPT Search, Google AI Overviews — returns a synthesised answer with cited sources. The user often never visits your page at all.

This changes everything about what "visibility" means. In traditional SEO, ranking #7 still sends you clicks. In AI search, only cited sources get attributed — and citation doesn't always correlate with traditional ranking.

Traditional SEO vs AI Search (GEO)

Goal
Rank in blue links list
Get cited in AI answer
Signal weight
Backlinks, keyword density
Content clarity, structure, trustworthiness
Crawl frequency
Periodic (days/weeks)
Real-time (Perplexity) or training-based
Zero-click impact
Featured snippets only
Every AI answer is zero-click by default
Blocking mistake
Googlebot block = no rank
Blocking search bots = invisible in AI answers
Schema markup
Helpful for rich results
Critical for AI understanding your content type

The emerging term for AI search optimisation is GEO (Generative Engine Optimisation), sometimes called AEO (Answer Engine Optimisation). It is not a replacement for SEO — it is a layer on top. Sites with strong technical SEO foundations perform better in AI search too. But there are specific AI-only signals that traditional SEO completely ignores.

The 4 Platforms: How Each Works

Each AI search platform has a different architecture. Understanding the difference helps you prioritise what to fix first.

🔍

Perplexity AI

Real-time web search + synthesis

PerplexityBot

Perplexity crawls the live web before answering every query. It uses PerplexityBot (and sometimes anthropic-ai via Claude models) to fetch pages in real-time, then synthesises across multiple sources. This means your page must be accessible right now — not just indexed historically.

What it prioritises:

  • PerplexityBot not blocked in robots.txt — non-negotiable
  • Fast page load (slow pages are skipped under time pressure)
  • Clean, parseable HTML — minimal JavaScript rendering required
  • Specific, citable facts and statistics
  • Clear heading hierarchy (H1 → H2 → H3)
  • Authoritative domain signals (age, backlinks still matter)
💬

ChatGPT Search (OAI-SearchBot)

Real-time search + trained knowledge hybrid

OAI-SearchBot

ChatGPT search uses OAI-SearchBot for real-time web retrieval and ChatGPT-User for browsing on behalf of users. It blends live search results with its pre-trained knowledge. This means even if your page isn't crawled in real-time, your brand can still appear — but citation sources are pulled from live results.

What it prioritises:

  • OAI-SearchBot allowed in robots.txt (separate from GPTBot)
  • llms.txt for brand/content context between sessions
  • High-quality backlink profile (uses Bing index signals)
  • Structured data helps parse content type and entities
  • Clear "About" and "Who wrote this" signals for trust
🔷

Google AI Overviews

Index-based synthesis (Gemini-powered)

Google-Extended

Google AI Overviews (formerly SGE) are powered by Gemini and draw from Google's existing search index. You cannot be in AI Overviews if you're not indexed by Google first. Google-Extended is the specific crawler used for Gemini training and AI features — blocking it via robots.txt opts your site out of AI Overviews entirely.

What it prioritises:

  • Google-Extended not blocked in robots.txt — if blocked, you're out of AI Overviews
  • E-E-A-T signals (Experience, Expertise, Authoritativeness, Trustworthiness)
  • Structured data (JSON-LD) for content type recognition
  • Core Web Vitals — Google AI Overviews inherit traditional ranking signals
  • Concise, direct answers near the top of the page
  • FAQPage and HowTo schema especially well-represented in AI Overviews
🟠

Claude (Anthropic)

Training-based knowledge + web tools

ClaudeBot / anthropic-ai

Claude's base knowledge comes from training data (where ClaudeBot was used to crawl). Claude also has web access tools. Unlike Perplexity, Claude doesn't cite sources in every response by default — but when used in agentic workflows or via web tools, it follows similar signals to Perplexity.anthropic-ai is the newer crawler name used for AI feature indexing.

What it prioritises:

  • llms.txt is directly supported — Anthropic's tools explicitly read it
  • High-quality, citable factual content in the training corpus
  • ClaudeBot / anthropic-ai allowed if you want future training inclusion
  • Clean text structure — Claude is highly sensitive to content clarity

The 7 Ranking Signals That Actually Matter

Across all four platforms, these are the signals with the highest leverage. Ranked by impact:

1

AI Search Bots Not Blocked

Critical

This is the one. If PerplexityBot, OAI-SearchBot, or Google-Extended is blocked in your robots.txt — intentionally or accidentally — you are invisible in that platform's answers. Full stop.

The most common mistake: adding a wildcard block (User-agent: * / Disallow: /) that sweeps up search bots alongside training bots. Check your robots.txt carefully. Blocking GPTBot is fine. Blocking PerplexityBot is not.

2

Structured Data (JSON-LD)

High

JSON-LD schema tells AI models what type of content they're reading. An Article schema says "this is an editorial piece by an author." A FAQPage schema says "these are questions and answers — cite them." A Product schema says "here are specs and pricing."

Without schema, AI models have to infer content type from raw text — and they often get it wrong, which means your page gets cited in the wrong context or not at all.

Priority schemas for AI search: Article, FAQPage, HowTo, Organization, Product, LocalBusiness.

3

llms.txt File

High

llms.txt is a markdown file at your domain root that tells AI assistants exactly what your site is about, what content is valuable, and which pages to prioritise. It's like a site brief you write specifically for AI models.

Anthropic, Perplexity, and several AI agents explicitly read llms.txt during context-building. Sites with a well-written llms.txt get better contextual framing in AI answers — your brand identity is more consistent across AI responses.

4

Content Clarity and Answer Structure

High

AI search models extract the most citable, confident-sounding claims from your content. Pages that bury their answer in 500 words of preamble are less likely to be cited than pages that put the direct answer first.

Effective structure for AI citation: • Lead with the direct answer (first 100 words) • Use clear H2/H3 headings that mirror likely search queries • Include specific, verifiable facts and data points • FAQ sections are extremely powerful — they match question-intent queries 1:1 • Short, punchy paragraphs over walls of text

5

Sitemap.xml

Medium

A sitemap tells all crawlers — including AI search bots — which pages exist and when they were last updated. Without a sitemap, important deep pages may never be discovered or may be discovered stale.

Most modern frameworks auto-generate sitemaps (Next.js has built-in sitemap support; WordPress has Yoast/RankMath). If you don't have one, this is the fastest fix in the list.

6

Meta Description and Open Graph

Medium

Meta descriptions aren't ranking signals in traditional SEO — but in AI search, they're content signals. AI models read your meta description as a compressed summary of the page. A vague or missing description makes the model work harder to infer what the page is about.

Open Graph tags (og:title, og:description) provide a second layer of content framing. They're parsed by social crawlers, AI summary tools, and link previewers. A well-written og:description can influence how AI tools describe your page in answers.

7

Domain Authority and Trust Signals

Medium

AI search isn't immune to authority. Perplexity and ChatGPT Search both weight high-authority domains more heavily when there are multiple sources making the same claim. A claim on a site with 10k backlinks beats the same claim on a new domain, all else equal.

However, authority is less dominant in AI search than in traditional SEO. A new site with excellent content structure, schema, llms.txt, and a clean robots.txt can outpunch established sites that haven't adapted to AI search signals.

AI Search Optimisation Checklist

Work through this in order. Items near the top have the highest leverage.

1PerplexityBot, OAI-SearchBot, Google-Extended not blocked in robots.txtCritical
2llms.txt file present at domain root with accurate site descriptionHigh
3sitemap.xml present and submitted to Google Search ConsoleHigh
4JSON-LD schema on all key pages (Article, FAQPage, Organization)High
5FAQPage JSON-LD on any page with Q&A contentHigh
6meta description present and ≥50 characters on all key pagesMedium
7og:title and og:description present on all pagesMedium
8No noindex or noai in X-Robots-Tag or meta robots (unless intentional)Medium
9Direct answer placed in first 100 words on key landing pagesMedium
10H2/H3 headings written as natural questions where applicableMedium
11Author and "About" signals on article/blog contentLow
12Core Web Vitals passing (LCP <2.5s, CLS <0.1)Low

Check your score automatically

The AI Search Visibility Checker scores your site against 7 of these checks in under 10 seconds.

Check My Score →

FAQ

QHow does Perplexity decide which pages to cite?

Perplexity crawls the live web before answering every query. It prioritises pages that are accessible to its bot (not blocked in robots.txt), load quickly, have clear structured content, contain specific factual claims, and have a coherent heading hierarchy. Authority signals (domain age, backlinks) are a secondary factor.

QDoes having an llms.txt file help with AI search ranking?

It helps with citation framing more than raw ranking. llms.txt gives AI assistants a curated description of your site and its most important pages — leading to more accurate, consistent brand representation in AI answers. Anthropic's tools and several AI agents explicitly read it.

QWhat is the difference between traditional SEO and GEO (Generative Engine Optimisation)?

Traditional SEO targets blue-link rankings. GEO targets citations inside AI-generated answers. GEO rewards content clarity, schema markup, and AI bot access over keyword density and link quantity. The two are complementary — strong technical SEO helps GEO — but GEO requires specific additional signals.

QWill blocking AI training bots hurt my AI search ranking?

No. Training bots (GPTBot, CCBot, Bytespider) and search bots (PerplexityBot, OAI-SearchBot, Google-Extended) are completely separate. You can block training crawlers freely. The mistake is accidentally blocking search bots at the same time — which happens when people use overly broad wildcard rules.

QHow do I check if my site is set up for AI search?

Use the Open Shadow AI Search Visibility Checker (/tools/ai-visibility). It scores 7 key signals in under 10 seconds and gives you specific, prioritised fixes.

QHow long does it take for changes to affect AI search visibility?

Perplexity is near-real-time — changes can reflect within hours or days. ChatGPT Search re-crawls on its own schedule (typically days to weeks). Google AI Overviews reflect changes when Google re-indexes your page, which for established sites is typically days. Training-based knowledge (Claude base model) only updates with new training runs — typically months.