Note on 7th April 2025

7th April 2025

If you're a startup running your own crawlers to gather data for whatever purpose, you should try really hard not to make the world a worse place by driving up costs for the sites you are scraping.

There's really no excuse for crawling Wikipedia ("65% of our most expensive traffic comes from bots") when they offer a comprehensive collection of bulk download options.

Do better!

Posted 7th April 2025 at 7:06 pm

Simon Willison’s Weblog

Recent articles

Monthly briefing