MistralBot is Mistral AI's training crawler — the French lab behind Mistral Large, Mixtral, and Le Chat. Active since early 2024, it crawls web content to train Europe's most prominent AI model family.
MistralBot crawls publicly available web content to build training datasets for Mistral AI's model families — including Mistral Large (their flagship proprietary model), Mistral Small, and the open-weight Mixtral series. It also feeds Le Chat, Mistral's consumer AI assistant.
Mistral AI is the standout European AI lab — founded in Paris in 2023 and backed by Andreessen Horowitz, Nvidia, and others. Its models are widely used in enterprise AI applications and embedded into platforms like Slack, Microsoft Azure, and Google Cloud. MistralBot's crawl activity has grown in step with each new model release.
Like most AI labs, Mistral draws from two data channels: its own direct web crawling via MistralBot, and licensed/open datasets including Common Crawl (collected by CCBot). Blocking MistralBot stops the direct crawl pipeline; blocking CCBot cuts the Common Crawl supply that feeds Mistral and 50+ other AI models simultaneously.
Mozilla/5.0 (compatible; MistralBot/1.0; +https://mistral.ai/bot)In robots.txt, use the token MistralBot — single user agent, no alternate tokens to worry about.
robots.txt (Recommended)User-agent: MistralBot Disallow: /
One rule — MistralBot uses a single user agent token with no known alternates.
# Block Mistral's direct crawler User-agent: MistralBot Disallow: / # Block CCBot — Common Crawl feeds Mistral, GPT, Llama, Gemini, and 50+ others User-agent: CCBot Disallow: /
CCBot is the single highest-leverage AI training opt-out — one block affects 50+ models.
# Protect premium/original content User-agent: MistralBot Disallow: /articles/ Disallow: /research/ Disallow: /premium/
# Block all major AI training crawlers User-agent: MistralBot Disallow: / User-agent: CCBot Disallow: / User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Google-Extended Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Bytespider Disallow: / User-agent: meta-externalagent Disallow: / User-agent: xAI-Bot Disallow: / User-agent: Applebot-Extended Disallow: / # Search engines — unaffected User-agent: Googlebot Allow: / User-agent: Bingbot Allow: /
Comprehensive training opt-out. No effect on search rankings.
import { MetadataRoute } from 'next';
export default function robots(): MetadataRoute.Robots {
return {
rules: [
{ userAgent: 'MistralBot', disallow: ['/'] },
{ userAgent: 'CCBot', disallow: ['/'] },
{ userAgent: 'GPTBot', disallow: ['/'] },
{ userAgent: 'ClaudeBot', disallow: ['/'] },
{ userAgent: 'anthropic-ai', disallow: ['/'] },
{ userAgent: 'Google-Extended', disallow: ['/'] },
{ userAgent: 'PerplexityBot', disallow: ['/'] },
{ userAgent: 'xAI-Bot', disallow: ['/'] },
{ userAgent: 'Bytespider', disallow: ['/'] },
{ userAgent: 'Googlebot', allow: ['/'] },
{ userAgent: '*', allow: ['/'] },
],
sitemap: 'https://yoursite.com/sitemap.xml',
};
}Mistral reliably respects robots.txt, so a server-level block is optional. Use it if you want hard enforcement, want to eliminate crawler load from your logs, or prefer not to rely on Mistral's compliance.
# In your server {} block
if ($http_user_agent ~* "MistralBot") {
return 403;
}(http.user_agent contains "MistralBot")
Set the action to Block. Blocks at the edge — zero load on your server.
Cloudflare Dashboard → Security → WAF → Custom Rules → Create rule
Mistral AI is a French company, which makes it subject to GDPR and the EU AI Act in a way that US-based AI companies are not. This matters for publishers in a few ways:
EU-based publishers may have grounds to object to automated processing of their content for AI training under GDPR Article 21. This legal avenue is still being tested, but Mistral's EU incorporation means it faces real legal exposure — more so than a US company operating from outside the EU's jurisdiction.
Under the EU AI Act (effective 2026), high-impact AI models must maintain detailed documentation of training data sources and honor copyright opt-outs. Mistral, as a EU-incorporated company, has compliance obligations here that incentivize it to honor opt-out requests.
For most publishers, robots.txt is still the fastest and most reliable opt-out. The GDPR/EU AI Act angle provides additional leverage if you want to send a formal opt-out request beyond robots.txt — contact Mistral at legal@mistral.ai with a description of your content and opt-out request.
# Check nginx access logs for MistralBot grep "MistralBot" /var/log/nginx/access.log | tail -20 # Confirm it's fetching robots.txt grep "MistralBot" /var/log/nginx/access.log | grep "robots.txt" # If server-level blocked — confirm 403s grep "MistralBot" /var/log/nginx/access.log | grep " 403 "
Seeing MistralBot fetch /robots.txt and then stop making content requests means the block is working correctly.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.
Scan My Site Free →