About
The Allen Institute for AI's research crawler, used to build Dolma — one of the largest open datasets for training language models. Unlike commercial crawlers, Ai2Bot's data is publicly released for research. Behind some of the most widely-used open-source LLM training corpora.
Purpose
Open research dataset collection for AI model training
User Agent String
Mozilla/5.0 (compatible; Ai2Bot/1.0; +https://allenai.org/crawler)
How to Control in robots.txt
🚫 Block Ai2Bot
User-agent: Ai2Bot Disallow: /
✅ Allow Ai2Bot
User-agent: Ai2Bot Allow: /
Complete Guide: How to Block Ai2Bot
Server-level blocking, nginx configs, Cloudflare rules, Next.js middleware, and more →
Is Ai2Bot crawling your site?
Enter your URL below — scan takes under 5 seconds.
Free · No signup · Instant results