About
The Allen Institute for AI's research crawler, used to build Dolma — one of the largest open datasets for training language models. Unlike commercial crawlers, Ai2Bot's data is publicly released for research. Behind some of the most widely-used open-source LLM training corpora.
Purpose
Open research dataset collection for AI model training
User Agent String
Mozilla/5.0 (compatible; Ai2Bot/1.0; +https://allenai.org/crawler)
How to Control in robots.txt
🚫 Block Ai2Bot
User-agent: Ai2Bot Disallow: /
✅ Allow Ai2Bot
User-agent: Ai2Bot Allow: /
Complete Guide: How to Block Ai2Bot
Server-level blocking, nginx configs, Cloudflare rules, Next.js middleware, and more →
Is Ai2Bot crawling your site?
Run a free scan to check if Allen Institute for AI (Ai2)'s crawler is accessing your website.
Check if Ai2Bot is crawling YOUR site →