Skip to content
AI Search · Google

How to Block Google-Extended

Google-Extended is Google's dedicated AI training crawler for Gemini and Bard. Here's how to opt out — without touching your Search rankings.

✓ Respects robots.txt
Unlike Bytespider, Google-Extended reliably honors Disallow directives
✓ No SEO impact
Completely separate from Googlebot — blocking it won't hurt rankings
2-line fix
The robots.txt block takes 60 seconds to implement

What is Google-Extended?

Google-Extended is a standalone user agent token that Google introduced in September 2023 specifically for crawling content used to train its AI products — Gemini (formerly Bard) and Vertex AI. It is distinct from Googlebot, which crawls exclusively for Search indexing.

Before Google-Extended existed, Google had no clean separation between Search crawling and AI training. The introduction of this separate token was a direct response to publisher pressure — giving websites a way to opt out of AI training without sacrificing Search visibility.

Google-Extended is used to power the knowledge base that makes Gemini's responses more accurate, up-to-date, and factually grounded. If you are a news publisher, creative content creator, or any site where your unique writing represents commercial value, you may have legitimate reasons to opt out.

Google-Extended vs. Googlebot: Key Differences

Google-ExtendedGooglebot
PurposeAI model training (Gemini, Vertex AI)Search indexing and ranking
Affects SEO?NoYes — directly
User agent tokenGoogle-ExtendedGooglebot
Respects robots.txt?Yes ✓Yes ✓
Safe to block?Yes — no SEO consequenceOnly if you want to disappear from Google
IntroducedSeptember 20231996

Option 1: Block via robots.txt (Recommended)

The robots.txt block is the standard, Google-endorsed method for opting out of Gemini AI training. Add these two lines to your robots.txt file:

Block entire site from Google-ExtendedRecommended
robots.txt
User-agent: Google-Extended
Disallow: /

Place at the top of your robots.txt or after your existing Googlebot rules.

Block specific paths only
robots.txt
# Block Google-Extended from premium/paywalled content
User-agent: Google-Extended
Disallow: /articles/
Disallow: /premium/
Disallow: /blog/

# Allow Googlebot to index everything normally
User-agent: Googlebot
Allow: /
Block Google-Extended alongside other AI crawlers
robots.txt
# Block AI training crawlers — preserves Search indexing
User-agent: Google-Extended
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Bytespider
Disallow: /

# Allow all standard search bots
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

Option 2: Per-Page noai Meta Tag

For granular control — blocking specific pages from AI training without modifying robots.txt — add the noai and noimageai meta tags. Google has stated it honors these signals.

HTML <head>
<!-- Block AI training on this page (text + images) -->
<meta name="robots" content="noai, noimageai">

<!-- Or target Google-Extended specifically -->
<meta name="google-extended" content="noindex">

⚠️ Important caveat

The noai meta tag is a proposed standard with mixed adoption. Google has signaled intent to honor it, but the robots.txt block above is more reliable and universally accepted. Use both for belt-and-suspenders coverage.

Option 3: Next.js / Vercel Config

For Next.js apps, generate robots.txt programmatically via the App Router:

app/robots.ts
import { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      {
        // Block Google-Extended from AI training
        userAgent: 'Google-Extended',
        disallow: ['/'],
      },
      {
        // Allow Googlebot to index normally
        userAgent: 'Googlebot',
        allow: ['/'],
      },
      {
        // Block other AI training crawlers
        userAgent: ['GPTBot', 'ClaudeBot', 'PerplexityBot', 'Bytespider'],
        disallow: ['/'],
      },
      {
        // Allow all other well-behaved bots
        userAgent: '*',
        allow: ['/'],
      },
    ],
    sitemap: 'https://yoursite.com/sitemap.xml',
  };
}

Verify Your Block is Working

After updating robots.txt, verify the block is correctly configured using these methods:

Step 1 — Check your live robots.txt

Visit your robots.txt directly and confirm the Google-Extended rules appear:

https://yoursite.com/robots.txt
Step 2 — Google Search Console robots.txt tester

Use Google Search Console's robots.txt Tester to simulate Google-Extended fetching your pages. Enter Google-Extended as the user agent.

Search Console → Settings → robots.txt Tester
Step 3 — Check server logs for the user agent

Scan your access logs for Google-Extended visits to confirm it's hitting robots.txt and backing off:

# Apache / nginx access log
grep "Google-Extended" /var/log/nginx/access.log | tail -20

# Look for: 200 on /robots.txt followed by no further requests
# If you see 200s on content pages, your block isn't working
Step 4 — Use Open Shadow's robots.txt checker

Run your site through Open Shadow's robot checker to confirm Google-Extended is blocked alongside other AI bots:

→ Check your robots.txt now

Should You Block Google-Extended?

✓ Block it if you are:

  • A news publisher or journalist — your original reporting has direct commercial value
  • A content creator whose writing is your product
  • Running a paywalled site — Gemini shouldn't answer questions your subscribers pay to access
  • Concerned about AI-generated competition cannibalizing your traffic
  • An academic or research institution with IP concerns

Consider allowing if you are:

  • A business whose goal is brand awareness — Gemini citations can drive discovery
  • Running documentation or open-source projects — AI training amplifies your reach
  • Operating a content marketing funnel where AI mentions bring leads
  • Wanting to appear in Google AI Overviews for commercial queries

Frequently Asked Questions

Does blocking Google-Extended affect my Google Search rankings?
No. Google-Extended is completely separate from Googlebot, which handles Search indexing. Blocking Google-Extended with robots.txt does not affect your Google Search rankings, crawl frequency, or indexing in any way.
What does Google-Extended actually collect?
Google-Extended crawls your site's text content to improve Gemini and Vertex AI — building training datasets and helping Google refine AI responses. It does not crawl for advertising purposes. Blocking it means your content won't be used to train or improve Google's AI models going forward.
Does Google-Extended respect robots.txt?
Yes. Unlike some AI crawlers (notably Bytespider), Google-Extended reliably honors robots.txt Disallow directives. Google has publicly committed to respecting this. A simple robots.txt block is sufficient for most publishers.
Does blocking Google-Extended remove my existing content from Gemini?
No. Blocking Google-Extended prevents future crawling and training, but does not remove content already incorporated into existing models. There is currently no mechanism to retroactively remove content from trained AI models. The block is forward-looking only.
Should I block Google-Extended if I want to appear in AI Overviews?
AI Overviews (AI-generated summaries in Search) are powered by Googlebot, not Google-Extended. Blocking Google-Extended should not prevent your content from appearing in AI Overviews. However, Google's guidance on this continues to evolve — monitor their official documentation for changes.

Related Guides & Tools

See What's Crawling Your Site

Run a free AI visibility check to see which bots have access to your content — and what they can see.