How to Block cohere-ai: Cohere's Undocumented Web Crawler
cohere-ai crawls your site without any official documentation explaining what it collects or why. It's operated by Cohere — the enterprise AI lab behind Command R. Only ~13% of major websites block it.
Updated March 2026
Why "Undocumented" Matters
Most major AI companies publish documentation explaining their crawlers. OpenAI documents GPTBot, Anthropic documents ClaudeBot, Google documents Google-Extended. Cohere has published no official documentation for cohere-ai — no help page, no blog post, no developer docs explaining what it does or how it uses collected data.
This lack of transparency means publishers must infer the crawler's purpose from Cohere's business model and observed behavior. When in doubt, blocking is the conservative choice.
What We Know About cohere-ai
Cohere is a Canadian-American AI company founded in 2019, focused on enterprise AI. Its products include Command R and Command R+ (retrieval-augmented generation models), the Aya multilingual model family, and Embed (embedding models for semantic search). Cohere's customers are primarily enterprises — banks, healthcare companies, and tech firms.
The cohere-ai crawler has been identified through server log analysis by security researchers and bot tracking services. Based on Cohere's business (building language and embedding models), the crawler likely serves one or both of these purposes:
The user agent string is: Mozilla/5.0 (compatible; cohere-ai/1.0; +http://www.cohere.ai/bot.html)
How to Block cohere-ai
Add this to your robots.txt:
User-agent: cohere-ai Disallow: /
Because cohere-ai is undocumented, consider adding server-level enforcement:
if ($http_user_agent ~* "cohere-ai") {
return 403;
}Field: User Agent Operator: contains Value: cohere-ai Action: Block
Why Only 13% of Sites Block cohere-ai
The low blocking rate isn't because cohere-ai is safe — it's because most publishers don't know it exists.
What Blocking Does (and Doesn't) Do
- • Cohere from crawling your content going forward
- • New content from entering Cohere's training pipeline
- • Live retrieval of your pages for Cohere's AI products
- • Content Cohere has already crawled
- • Other AI crawlers (GPTBot, ClaudeBot, etc.)
- • Cohere accessing your content via Common Crawl or data brokers
- • Google or Bing rankings (unaffected)
Frequently Asked Questions
Does cohere-ai respect robots.txt?
Based on available evidence, it appears to. Cohere is a US-based, venture-backed company with major enterprise customers (including banks and healthcare firms) that expect compliance. However, because the crawler is undocumented, this cannot be officially confirmed. For guaranteed enforcement, add server-level blocking.
Is Cohere different from OpenAI and Anthropic?
Yes. Cohere is primarily B2B — it sells AI infrastructure to enterprises for internal use cases (document search, summarization, customer support). It doesn't have a major consumer product like ChatGPT or Claude. This enterprise focus means your content may end up powering internal corporate AI tools rather than a public chatbot.
Does blocking cohere-ai affect my SEO?
No. cohere-ai has no relationship with Google, Bing, or any search engine. Blocking it has zero effect on your search rankings or visibility.
Should I block cohere-ai if I already block GPTBot and ClaudeBot?
If your policy is to block AI training crawlers, then yes. cohere-ai likely serves a similar training purpose. The lack of documentation makes it a higher-risk crawler to leave unblocked — you don't know exactly what it's doing with your content.
Related Guides
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.