About
Hugging Face's web crawler used to build and refresh AI training datasets hosted on the Hugging Face Hub. Hugging Face is the world's leading AI model and dataset repository, hosting over 500,000 models. Content scraped by HuggingFaceBot can appear in publicly accessible datasets used to train models by any organisation that downloads them — making it one of the broadest vectors for unattributed AI training use.
Purpose
AI dataset collection for Hugging Face Hub
User Agent String
Mozilla/5.0 (compatible; HuggingFaceBot/1.0; +https://huggingface.co)
How to Control in robots.txt
🚫 Block HuggingFaceBot
User-agent: HuggingFaceBot Disallow: /
✅ Allow HuggingFaceBot
User-agent: HuggingFaceBot Allow: /
Is HuggingFaceBot crawling your site?
Run a free scan to check if Hugging Face's crawler is accessing your website.
Check if HuggingFaceBot is crawling YOUR site →