How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

What's the difference between Google-Extended and Googlebot?

Googlebot crawls for Google Search indexing — it determines your rankings. Google-Extended crawls for AI training data (Gemini, Vertex AI). They are completely independent. You can block one without affecting the other. Most publishers who block Google-Extended still get full Search indexing from Googlebot.

Open Shadow ← All Guides

AI Search · Google

How to Block Google-Extended

Google-Extended is Google's dedicated AI training crawler for Gemini and Bard. Here's how to opt out — without touching your Search rankings.

✓ Respects robots.txt

Unlike Bytespider, Google-Extended reliably honors Disallow directives

✓ No SEO impact

Completely separate from Googlebot — blocking it won't hurt rankings

2-line fix

The robots.txt block takes 60 seconds to implement

What is Google-Extended?

Google-Extended is a standalone user agent token that Google introduced in September 2023 specifically for crawling content used to train its AI products — Gemini (formerly Bard) and Vertex AI. It is distinct from Googlebot, which crawls exclusively for Search indexing.

Before Google-Extended existed, Google had no clean separation between Search crawling and AI training. The introduction of this separate token was a direct response to publisher pressure — giving websites a way to opt out of AI training without sacrificing Search visibility.

Google-Extended is used to power the knowledge base that makes Gemini's responses more accurate, up-to-date, and factually grounded. If you are a news publisher, creative content creator, or any site where your unique writing represents commercial value, you may have legitimate reasons to opt out.

Google-Extended vs. Googlebot: Key Differences

	Google-Extended	Googlebot
Purpose	AI model training (Gemini, Vertex AI)	Search indexing and ranking
Affects SEO?	No	Yes — directly
User agent token	Google-Extended	Googlebot
Respects robots.txt?	Yes ✓	Yes ✓
Safe to block?	Yes — no SEO consequence	Only if you want to disappear from Google
Introduced	September 2023	1996

Option 1: Block via `robots.txt` (Recommended)

The robots.txt block is the standard, Google-endorsed method for opting out of Gemini AI training. Add these two lines to your robots.txt file:

Block entire site from Google-ExtendedRecommended

robots.txt

User-agent: Google-Extended
Disallow: /

Place at the top of your robots.txt or after your existing Googlebot rules.

Block specific paths only

robots.txt

# Block Google-Extended from premium/paywalled content
User-agent: Google-Extended
Disallow: /articles/
Disallow: /premium/
Disallow: /blog/

# Allow Googlebot to index everything normally
User-agent: Googlebot
Allow: /

Block Google-Extended alongside other AI crawlers

robots.txt

# Block AI training crawlers — preserves Search indexing
User-agent: Google-Extended
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Bytespider
Disallow: /

# Allow all standard search bots
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

Option 2: Per-Page `noai` Meta Tag

For granular control — blocking specific pages from AI training without modifying robots.txt — add the noai and noimageai meta tags. Google has stated it honors these signals.

HTML <head>

<!-- Block AI training on this page (text + images) -->
<meta name="robots" content="noai, noimageai">

<!-- Or target Google-Extended specifically -->
<meta name="google-extended" content="noindex">

⚠️ Important caveat

The noai meta tag is a proposed standard with mixed adoption. Google has signaled intent to honor it, but the robots.txt block above is more reliable and universally accepted. Use both for belt-and-suspenders coverage.

Option 3: Next.js / Vercel Config

For Next.js apps, generate robots.txt programmatically via the App Router:

app/robots.ts

import { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      {
        // Block Google-Extended from AI training
        userAgent: 'Google-Extended',
        disallow: ['/'],
      },
      {
        // Allow Googlebot to index normally
        userAgent: 'Googlebot',
        allow: ['/'],
      },
      {
        // Block other AI training crawlers
        userAgent: ['GPTBot', 'ClaudeBot', 'PerplexityBot', 'Bytespider'],
        disallow: ['/'],
      },
      {
        // Allow all other well-behaved bots
        userAgent: '*',
        allow: ['/'],
      },
    ],
    sitemap: 'https://yoursite.com/sitemap.xml',
  };
}

Verify Your Block is Working

After updating robots.txt, verify the block is correctly configured using these methods:

Step 1 — Check your live robots.txt

Visit your robots.txt directly and confirm the Google-Extended rules appear:

https://yoursite.com/robots.txt

Step 2 — Google Search Console robots.txt tester

Use Google Search Console's robots.txt Tester to simulate Google-Extended fetching your pages. Enter Google-Extended as the user agent.

Search Console → Settings → robots.txt Tester

Step 3 — Check server logs for the user agent

Scan your access logs for Google-Extended visits to confirm it's hitting robots.txt and backing off:

# Apache / nginx access log
grep "Google-Extended" /var/log/nginx/access.log | tail -20

# Look for: 200 on /robots.txt followed by no further requests
# If you see 200s on content pages, your block isn't working

Step 4 — Use Open Shadow's robots.txt checker

Run your site through Open Shadow's robot checker to confirm Google-Extended is blocked alongside other AI bots:

→ Check your robots.txt now

Should You Block Google-Extended?

✓ Block it if you are:

→ A news publisher or journalist — your original reporting has direct commercial value
→ A content creator whose writing is your product
→ Running a paywalled site — Gemini shouldn't answer questions your subscribers pay to access
→ Concerned about AI-generated competition cannibalizing your traffic
→ An academic or research institution with IP concerns

Consider allowing if you are:

→ A business whose goal is brand awareness — Gemini citations can drive discovery
→ Running documentation or open-source projects — AI training amplifies your reach
→ Operating a content marketing funnel where AI mentions bring leads
→ Wanting to appear in Google AI Overviews for commercial queries

Frequently Asked Questions

Does blocking Google-Extended affect my Google Search rankings?▼

No. Google-Extended is completely separate from Googlebot, which handles Search indexing. Blocking Google-Extended with robots.txt does not affect your Google Search rankings, crawl frequency, or indexing in any way.

What does Google-Extended actually collect?▼

Google-Extended crawls your site's text content to improve Gemini and Vertex AI — building training datasets and helping Google refine AI responses. It does not crawl for advertising purposes. Blocking it means your content won't be used to train or improve Google's AI models going forward.

Does Google-Extended respect robots.txt?▼

Yes. Unlike some AI crawlers (notably Bytespider), Google-Extended reliably honors robots.txt Disallow directives. Google has publicly committed to respecting this. A simple robots.txt block is sufficient for most publishers.

Does blocking Google-Extended remove my existing content from Gemini?▼

No. Blocking Google-Extended prevents future crawling and training, but does not remove content already incorporated into existing models. There is currently no mechanism to retroactively remove content from trained AI models. The block is forward-looking only.

Should I block Google-Extended if I want to appear in AI Overviews?▼

AI Overviews (AI-generated summaries in Search) are powered by Googlebot, not Google-Extended. Blocking Google-Extended should not prevent your content from appearing in AI Overviews. However, Google's guidance on this continues to evolve — monitor their official documentation for changes.