How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

Which Mistral models does MistralBot train?

MistralBot feeds training data for Mistral's frontier model families, including Mistral Large, Mistral Small, and the open-weight Mixtral models. It also feeds Le Chat, Mistral's consumer AI assistant. As Mistral releases new model generations, web crawl data from MistralBot forms part of the training pipeline for each new release.

Does GDPR give me extra rights against MistralBot?

Potentially. Mistral AI is a French company operating under GDPR, which may give EU-based publishers Article 21 objection rights against automated processing for AI training purposes. While this area of law is still evolving, Mistral's EU incorporation means it faces stricter obligations than US-based AI companies. For practical purposes, robots.txt is still the fastest and most reliable opt-out mechanism.

CCBot is said to feed Mistral — is blocking MistralBot enough?

No, not entirely. Mistral (like many AI labs) uses a combination of data sources including Common Crawl datasets (collected by CCBot) and direct web crawling via MistralBot. CCBot is a separate crawler operated by the Common Crawl Foundation. To fully block your content from Mistral's training pipeline, you should block both MistralBot and CCBot in robots.txt.

Open Shadow ← All Guides

AI Training · Mistral AI (France)

How to Block MistralBot

MistralBot is Mistral AI's training crawler — the French lab behind Mistral Large, Mixtral, and Le Chat. Active since early 2024, it crawls web content to train Europe's most prominent AI model family.

✓ Respects robots.txt

Mistral reliably honors Disallow directives — robots.txt block is sufficient

EU / GDPR subject

As a French company, Mistral faces stricter data obligations than US-based AI labs

Block CCBot too

Mistral also trains on Common Crawl data — block CCBot for full coverage

What Does MistralBot Collect?

MistralBot crawls publicly available web content to build training datasets for Mistral AI's model families — including Mistral Large (their flagship proprietary model), Mistral Small, and the open-weight Mixtral series. It also feeds Le Chat, Mistral's consumer AI assistant.

Mistral AI is the standout European AI lab — founded in Paris in 2023 and backed by Andreessen Horowitz, Nvidia, and others. Its models are widely used in enterprise AI applications and embedded into platforms like Slack, Microsoft Azure, and Google Cloud. MistralBot's crawl activity has grown in step with each new model release.

Like most AI labs, Mistral draws from two data channels: its own direct web crawling via MistralBot, and licensed/open datasets including Common Crawl (collected by CCBot). Blocking MistralBot stops the direct crawl pipeline; blocking CCBot cuts the Common Crawl supply that feeds Mistral and 50+ other AI models simultaneously.

MistralBot user agent

Mozilla/5.0 (compatible; MistralBot/1.0; +https://mistral.ai/bot)

In robots.txt, use the token MistralBot — single user agent, no alternate tokens to worry about.

Option 1: Block via `robots.txt` (Recommended)

Block entire siteRecommended

robots.txt

User-agent: MistralBot
Disallow: /

One rule — MistralBot uses a single user agent token with no known alternates.

Block MistralBot + CCBot for full Mistral coverageRecommended for publishers

robots.txt

# Block Mistral's direct crawler
User-agent: MistralBot
Disallow: /

# Block CCBot — Common Crawl feeds Mistral, GPT, Llama, Gemini, and 50+ others
User-agent: CCBot
Disallow: /

CCBot is the single highest-leverage AI training opt-out — one block affects 50+ models.

Block specific paths only

robots.txt

# Protect premium/original content
User-agent: MistralBot
Disallow: /articles/
Disallow: /research/
Disallow: /premium/

Block all major AI training crawlers at once

robots.txt

# Block all major AI training crawlers
User-agent: MistralBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

# Search engines — unaffected
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

Comprehensive training opt-out. No effect on search rankings.

Option 2: Next.js App Router

app/robots.ts

import { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      { userAgent: 'MistralBot', disallow: ['/'] },
      { userAgent: 'CCBot', disallow: ['/'] },
      { userAgent: 'GPTBot', disallow: ['/'] },
      { userAgent: 'ClaudeBot', disallow: ['/'] },
      { userAgent: 'anthropic-ai', disallow: ['/'] },
      { userAgent: 'Google-Extended', disallow: ['/'] },
      { userAgent: 'PerplexityBot', disallow: ['/'] },
      { userAgent: 'xAI-Bot', disallow: ['/'] },
      { userAgent: 'Bytespider', disallow: ['/'] },
      { userAgent: 'Googlebot', allow: ['/'] },
      { userAgent: '*', allow: ['/'] },
    ],
    sitemap: 'https://yoursite.com/sitemap.xml',
  };
}

Option 3: nginx — Hard 403 Block

Mistral reliably respects robots.txt, so a server-level block is optional. Use it if you want hard enforcement, want to eliminate crawler load from your logs, or prefer not to rely on Mistral's compliance.

nginx.conf

# In your server {} block
if ($http_user_agent ~* "MistralBot") {
    return 403;
}

Option 4: Cloudflare WAF Rule

Cloudflare WAF → Custom Rules → Expression

(http.user_agent contains "MistralBot")

Set the action to Block. Blocks at the edge — zero load on your server.

Cloudflare Dashboard → Security → WAF → Custom Rules → Create rule

The EU / GDPR Angle

Mistral AI is a French company, which makes it subject to GDPR and the EU AI Act in a way that US-based AI companies are not. This matters for publishers in a few ways:

Article 21 objection rights

EU-based publishers may have grounds to object to automated processing of their content for AI training under GDPR Article 21. This legal avenue is still being tested, but Mistral's EU incorporation means it faces real legal exposure — more so than a US company operating from outside the EU's jurisdiction.

EU AI Act training data requirements

Under the EU AI Act (effective 2026), high-impact AI models must maintain detailed documentation of training data sources and honor copyright opt-outs. Mistral, as a EU-incorporated company, has compliance obligations here that incentivize it to honor opt-out requests.

Practical upshot

For most publishers, robots.txt is still the fastest and most reliable opt-out. The GDPR/EU AI Act angle provides additional leverage if you want to send a formal opt-out request beyond robots.txt — contact Mistral at legal@mistral.ai with a description of your content and opt-out request.

Verify Your Block

bash

# Check nginx access logs for MistralBot
grep "MistralBot" /var/log/nginx/access.log | tail -20

# Confirm it's fetching robots.txt
grep "MistralBot" /var/log/nginx/access.log | grep "robots.txt"

# If server-level blocked — confirm 403s
grep "MistralBot" /var/log/nginx/access.log | grep " 403 "

Seeing MistralBot fetch /robots.txt and then stop making content requests means the block is working correctly.

Frequently Asked Questions

Does MistralBot respect robots.txt?

Yes. Mistral AI has committed to honoring robots.txt Disallow directives. As a European company subject to GDPR and the EU AI Act, Mistral has legal obligations that reinforce this — beyond just reputational incentives. A robots.txt block is sufficient for most publishers.

Which AI models does blocking MistralBot protect against?

Blocking MistralBot stops direct crawls for Mistral's model family: Mistral Large, Mistral Small, Mistral 7B, Mixtral 8x7B, Mixtral 8x22B, and Le Chat. To also block Common Crawl data that feeds Mistral, add CCBot to your robots.txt as well.

Is blocking MistralBot enough, or do I need to block CCBot too?

For full coverage against Mistral's training pipeline, block both. MistralBot is Mistral's direct crawler; CCBot collects data for Common Crawl, which Mistral (and many other AI labs) license as a training data source. CCBot is the single highest-leverage opt-out you can make — one Disallow blocks 50+ AI models simultaneously.

Will blocking MistralBot affect my search rankings?

No. MistralBot is a training crawler only. Mistral does not operate a public web search product that indexes your site for public queries. Blocking it has zero effect on your Google, Bing, or any other search ranking.

Does Mistral have a content removal form?

Mistral does not currently operate a widely-documented public removal request form. For formal opt-out or removal requests beyond robots.txt, contact legal@mistral.ai. Given Mistral's EU legal obligations, formal written requests carry more weight than with some US-based AI companies.

What is the difference between MistralBot and Mixtral?

MistralBot is Mistral AI's web crawler — it collects training data. Mixtral is the name of Mistral's open-weight mixture-of-experts model family (Mixtral 8x7B, 8x22B, etc.). They're related only in that MistralBot feeds data to the systems that train Mixtral and other Mistral models.