Simon Willison’s Weblog

Blogmarks tagged csv in Nov

Filters: Type: blogmark × Month: Nov × csv ×

Big Data Workflow with Pandas and SQLite (via) Handy tutorial on dealing with larger data (in this case a 3.9GB CSV file) by incrementally loading it into pandas and writing it out to SQLite. # 28th November 2017, 11:02 pm

Added TSV example to the README · simonw/csvs-to-sqlite@957d4f5. Thanks to a pull request from Jani Monoses, csvs-to-sqlite can now handle TSV (or any other separator) as well as regular CSVs. # 26th November 2017, 7:02 am

harelba/q (via) q is a neat command-line utility that lets you run SQL queries directly against CSV and TSV files. Internally it works by firing up an in-memory SQLite database, and as of the latest release (1.7.1) you can use the new --save-db-to-disk option to save that in-memory database to disk. # 25th November 2017, 5:49 pm

csvs-to-sqlite: Refactoring columns into separate lookup tables. I just shipped a new version of csvs-to-sqlite with the ability to extract specified columns into a separate SQLite lookup table by passing additional command-line arguments. # 17th November 2017, 6:41 am

simonw/csvs-to-sqlite. I built a simple tool for bulk converting multiple CSV files into a SQLite database. # 13th November 2017, 6:49 am

No PDFs! The Sunlight Foundation point out that PDFs are a terrible way of implementing “more transparent government” due to their general lack of structure. At the Guardian (and I’m sure at other newspapers) we waste an absurd amount of time manually extracting data from PDF files and turning it in to something more useful. Even CSV is significantly more useful for many types of information. # 1st November 2009, 12:04 pm