Skip to content
MetaRespects robots.txtAI Training

How to Block meta-externalagent: Stop Meta AI Training on Your Site

meta-externalagent is Meta's dedicated AI training crawler for Llama and other models. It's completely separate from facebookexternalhit (the link preview bot). Here's how to block training without breaking your Facebook link previews.

Updated March 2026

meta-externalagent ≠ facebookexternalhit

This is the most common confusion with Meta's crawlers. They are completely different bots with different purposes:

facebookexternalhitLink preview crawler. Fetches pages when URLs are shared on Facebook, Instagram, Messenger, or WhatsApp to generate title, description, and thumbnail.
meta-externalagentAI training crawler. Crawls the web to collect training data for Llama and other Meta AI models. Separate purpose, separate user agent.

What Does meta-externalagent Do?

meta-externalagent is Meta's dedicated web crawler for AI training data collection. It crawls websites to gather content that feeds into Meta's AI training pipeline — primarily for the Llama family of open-source language models and Meta's internal AI research.

Meta launched the Llama model family as open-source — meaning the trained models are freely available for anyone to use, fine-tune, and deploy. This open-source approach means your content, once used for training, potentially influences not just Meta's products but thousands of downstream applications built on Llama.

The user agent string is: meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)

How to Block meta-externalagent

Add this to your robots.txt to block AI training while keeping link previews working:

robots.txtBlock AI training, keep link previews
# Block Meta AI training
User-agent: meta-externalagent
Disallow: /

# Keep Facebook/Instagram link previews working
# (do NOT block facebookexternalhit unless intended)
# User-agent: facebookexternalhit
# Disallow: /

For server-level enforcement:

nginxBlock by user agent
if ($http_user_agent ~* "meta-externalagent") {
    return 403;
}

Don't accidentally block link previews

If you block facebookexternalhit, URLs shared on Facebook, Instagram, Messenger, and WhatsApp will show as plain text with no title, image, or description. Only block meta-externalagent unless you specifically want to kill link previews.

The Llama Open-Source Problem

What makes meta-externalagent unique among training crawlers is that the resulting models are open-source. When OpenAI trains GPT using GPTBot, the model is proprietary — you know who has it. When Meta trains Llama, the model is released publicly.

This means your content, once absorbed into Llama training data, potentially influences:

📱
Meta's own products
Meta AI in Facebook, Instagram, WhatsApp, and Ray-Ban Meta smart glasses all use Llama models.
🚀
Thousands of startups
Llama is the most popular open-source LLM. Thousands of companies build products on top of it — chatbots, coding assistants, content generators, and more.
🏢
Enterprise deployments
Major companies deploy Llama internally for document analysis, customer support, and internal tools. Your content may power corporate AI you've never heard of.
🔬
Research and fine-tuning
Researchers fine-tune Llama for specialized tasks. Your content could end up in medical AI, legal AI, or other domain-specific applications.

What Blocking Does (and Doesn't) Do

What it stops
  • • Meta from crawling your site for AI training data
  • • New content from entering future Llama training sets
  • • meta-externalagent from accessing your pages
What it doesn't stop
  • • Content already used in Llama 2/3 training
  • • Facebook/Instagram link previews (facebookexternalhit)
  • • Meta accessing your content via Common Crawl or Diffbot
  • • Google or Bing rankings (unaffected)

Frequently Asked Questions

Does Meta respect robots.txt for meta-externalagent?

Yes. Meta has stated that meta-externalagent respects robots.txt. Unlike Bytespider (ByteDance's crawler, which has been documented ignoring robots.txt), meta-externalagent is generally considered compliant. A robots.txt block is sufficient for most publishers.

Will blocking meta-externalagent affect my Facebook link previews?

No. Facebook link previews are generated by facebookexternalhit, which is a completely separate bot. Blocking meta-externalagent only stops AI training crawls. Your Open Graph previews on Facebook, Instagram, Messenger, and WhatsApp will continue working normally.

Does blocking meta-externalagent also block Meta AI from browsing my site?

meta-externalagent is primarily a training crawler. Meta AI's live browsing (when users ask Meta AI to read a URL) may use a different user agent. The exact user agent for Meta AI's live browsing is not fully documented. Blocking meta-externalagent focuses on training data collection.

Should I also block Diffbot to stop Llama training?

Yes, if your goal is comprehensive Llama training opt-out. Meta has sourced training data through Diffbot (a third-party data broker). Blocking both meta-externalagent and Diffbot closes two known input vectors for Llama training data.

Related Guides

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

Related Guides