Any source available to download sample data (in 10+ GB) for testing?
15th October 2012
My answer to Any source available to download sample data (in 10+ GB) for testing? on Quora
Wikipedia has some pretty interesting dumps, in both XML and SQL format: http://meta.wikimedia.org/wiki/I...
It’s pretty easy to generate 10GB of random data for testing though, which may be a better option as you could better approximate the kind of data your application will be dealing with. There’s a neat Ruby module for doing this called Faker (itself a port of the Perl module of the same name): http://faker.rubyforge.org/—and here’s a Python port of the Ruby one: https://github.com/threadsafelab...
More recent articles
- Weeknotes: Llama 3, AI for Data Journalism, llm-evals and datasette-secrets - 23rd April 2024
- Options for accessing Llama 3 from the terminal using LLM - 22nd April 2024
- AI for Data Journalism: demonstrating what we can do with this stuff right now - 17th April 2024
- Three major LLM releases in 24 hours (plus weeknotes) - 10th April 2024
- Building files-to-prompt entirely using Claude 3 Opus - 8th April 2024
- Running OCR against PDFs and images directly in your browser - 30th March 2024
- llm cmd undo last git commit - a new plugin for LLM - 26th March 2024
- Building and testing C extensions for SQLite with ChatGPT Code Interpreter - 23rd March 2024
- Claude and ChatGPT for ad-hoc sidequests - 22nd March 2024
- Weeknotes: the aftermath of NICAR - 16th March 2024