Simon Willison’s Weblog

10 items tagged “csv”

Big Data Workflow with Pandas and SQLite (via) Handy tutorial on dealing with larger data (in this case a 3.9GB CSV file) by incrementally loading it into pandas and writing it out to SQLite. # 28th November 2017, 11:02 pm

Added TSV example to the README · simonw/[email protected]. Thanks to a pull request from Jani Monoses, csvs-to-sqlite can now handle TSV (or any other separator) as well as regular CSVs. # 26th November 2017, 7:02 am

New in Datasette: filters, foreign keys and search

I’ve released Datasette 0.13 with a number of exciting new features (Datasette previously).

[... 1143 words]

harelba/q (via) q is a neat command-line utility that lets you run SQL queries directly against CSV and TSV files. Internally it works by firing up an in-memory SQLite database, and as of the latest release (1.7.1) you can use the new --save-db-to-disk option to save that in-memory database to disk. # 25th November 2017, 5:49 pm

csvs-to-sqlite: Refactoring columns into separate lookup tables. I just shipped a new version of csvs-to-sqlite with the ability to extract specified columns into a separate SQLite lookup table by passing additional command-line arguments. # 17th November 2017, 6:41 am

simonw/csvs-to-sqlite. I built a simple tool for bulk converting multiple CSV files into a SQLite database. # 13th November 2017, 6:49 am

The Absurdly Underestimated Dangers of CSV Injection. This is horrifying. A plain old CSV file intended for import into Excel can embed formulas (a value prefixed with an equals symbol) which can execute system commands—with a big honking security prompt that most people will likely ignore. Even worse: they can embed IMPORTXML() functions that can silently leak data from the rest of the sheet to an external URL—and those will work against Google Sheets as well as Excel. # 10th October 2017, 4:13 am

No PDFs! The Sunlight Foundation point out that PDFs are a terrible way of implementing “more transparent government” due to their general lack of structure. At the Guardian (and I’m sure at other newspapers) we waste an absurd amount of time manually extracting data from PDF files and turning it in to something more useful. Even CSV is significantly more useful for many types of information. # 1st November 2009, 12:04 pm

UnicodeDictWriter—write unicode strings out to Excel compatible CSV files using Python. Stuart Langridge and I spent quite a while this morning battling with Excel. The magic combination for storing unicode text in a CSV file such that Excel correctly reads it is UTF-16, a byte order mark and tab delimiters rather than commas. # 20th August 2008, 12:19 pm

Guide to the Dabble DB Plugin API (via) This is really nice—Dabble POSTs your plugin script a bunch of CSV values, your script returns CSV for the derived fields. Doesn’t seem to state which flavour of CSV though. # 9th January 2007, 11:37 am