How the wayback machine works
31st August 2002
How the Wayback Machine Works is a must read for anyone geeky enough to be interested in cheap clustered databases on a huge scale. The interview includes some fascinating details on the cost effectiveness of Linux clusters:
What’s amazing to me is the fact that the hardware is free. For doing things even in the hundreds of terabytes, it costs in the hundreds of thousands of dollars. When you talk to most people in IT departments, they spend a couple hundred thousand dollars just on a CPU, much less a terabyte of disk storage. You buy from EMC a terabyte for maybe $300,000. That’s just the storage for 1 TB. We can buy 100 TBs with 250 CPUs to work on it, all on a high-speed switch with redundancy built in.
More recent articles
- Weeknotes: Llama 3, AI for Data Journalism, llm-evals and datasette-secrets - 23rd April 2024
- Options for accessing Llama 3 from the terminal using LLM - 22nd April 2024
- AI for Data Journalism: demonstrating what we can do with this stuff right now - 17th April 2024
- Three major LLM releases in 24 hours (plus weeknotes) - 10th April 2024
- Building files-to-prompt entirely using Claude 3 Opus - 8th April 2024
- Running OCR against PDFs and images directly in your browser - 30th March 2024
- llm cmd undo last git commit - a new plugin for LLM - 26th March 2024
- Building and testing C extensions for SQLite with ChatGPT Code Interpreter - 23rd March 2024
- Claude and ChatGPT for ad-hoc sidequests - 22nd March 2024
- Weeknotes: the aftermath of NICAR - 16th March 2024