The Blind Machine 2 years ago https://foundation.mozilla.org/en/research/library/generative-ai-training-data/common-crawl/