Simon Willison’s Weblog

98 items tagged “projects”

Weeknotes: The Squirrel Census, Genome SQL query

This week was mostly about incremental improvements. And squirrels.

[... 911 words]

Weeknotes: PG&E outages, and Open Source works!

My big focus this week was the PG&E outages project. I’m really pleased with how this turned out: the San Francisco Chronicle used data from it for their excellent PG&E outage interactive (mixing in data on wind conditions) and it earned a bunch of interest on Twitter and some discussion on Hacker News.

[... 452 words]

Tracking PG&E outages by scraping to a git repo

PG&E have cut off power to several million people in northern California, supposedly as a precaution against wildfires.

[... 833 words]

SQL Murder Mystery in Datasette (via) “A crime has taken place and the detective needs your help. The detective gave you the  crime scene report, but you somehow lost it. You vaguely remember that the crime  was a murder that occurred sometime on ​Jan.15, 2018 and that it took place in SQL  City. Start by retrieving the corresponding crime scene report from the police  department’s database.”—Really fun game to help exercise your skills with SQL by the NU Knight Lab. I loaded their SQLite database into Datasette so you can play in your browser. # 7th October 2019, 11:37 pm

Weeknotes: Dogsheep

Having figured out my Stanford schedule, this week I started getting back into the habit of writing some code.

[... 1367 words]

twitter-to-sqlite 0.6, with track and follow. I shipped a new release of my twitter-to-sqlite command-line tool this evening. It now includes experimental features for subscribing to the Twitter streaming API: you can track keywords or follow users and matching Tweets will be written to a SQLite database in real-time as they come in through the API. Since Datasette supports mutable databases now you can run Datasette against the database and run queries against the tweets as they are inserted into the tables. # 6th October 2019, 4:54 am

Weeknotes: Design thinking for journalists, genome-to-sqlite, datasette-atom

I haven’t had much time for code this week: we’ve had a full five day workshop at JSK with Tran Ha (a JSK alumni) learning how to apply Design Thinking to our fellowship projects and generally to challenges facing journalism.

[... 869 words]

genome-to-sqlite. I just found out 23andMe let you export your genome as a zipped TSV file, so I wrote a little Python command-line tool to import it into a SQLite database. # 19th September 2019, 3:58 pm

Weeknotes: ONA19, twitter-to-sqlite, datasette-rure

I’ve decided to start writing weeknotes for the duration of my JSK fellowship. Here goes!

[... 919 words]

sqlite-utils 1.11. Amjith Ramanujam contributed an excellent new feature to sqlite-utils, which I’ve now released as part of version 1.11. Previously you could enable SQLite full-text-search on a table using the .enable_fts() method (or the “sqlite-utils enable-fts” CLI command) but it wouldn’t reflect future changes to the table—you had to use populate_fts() any time you inserted new records. Thanks to Amjith you can now pass create_triggers=True (or --create-triggers) to cause sqlite-utils to automatically add triggers that keeps the FTS index up-to-date any time a row is inserted, updated or deleted from the table. # 3rd September 2019, 1:05 am

Working with many-to-many relationships in sqlite-utils (via) I just released sqlite-utils 1.9 with syntactic sugar support for creating many-to-many relationships for records stored in SQLite databases. # 4th August 2019, 3:57 am

Single sign-on against GitHub using ASGI middleware

I released Datasette 0.29 last weekend, the first version of Datasette to be built on top of ASGI (discussed previously in Porting Datasette to ASGI, and Turtles all the way down).

[... 1612 words]

datasette-cors (via) My other Datasette ASGI plugin: this one wraps my asgi-cors project and lets you configure CORS access from a list of domains (or a set of domain wildcards) so you can make JavaScript calls to a Datasette instance from a specific set of other hosts. # 8th July 2019, 4:30 am

datasette-auth-github (via) My first big ASGI plugin for Datasette: datasette-auth-github adds the ability to require users to authenticate against the GitHub OAuth API. You can whitelist specific users, or you can restrict access to members of specific GitHub organizations or teams. While it’s structured as a Datasette plugin it also includes ASGI middleware which can be applied to any ASGI application. # 8th July 2019, 4:28 am

Datasette 0.29 (via) I shipped Datasette 0.29! • ASGI all the way down! Plus a new asgi_wrapper plugin hook letting plugins do all kinds of powerful new things • New mechanism for secret plugin configuration options • Facet by date • ?_through= for joins through m2m tables. Much more. # 8th July 2019, 4:26 am

db-to-sqlite 1.0 release. I’ve released version 1.0 of my db-to-sqlite tool, which lets you create a SQLite database copy of any database supported by SQLAlchemy (I’ve tested it against MySQL and PostgreSQL). The tool has a bunch of new features: you can use --redact to redact specific columns, specify --table multiple times to copy a subset of tables, and the --all option now efficiently adds all foreign keys at the end of the import. The project now has unit tests which run against MySQL and PostgreSQL in Travis CI. Also included in the README: a shell one-liner for creating a local SQLite copy of a remote Heroku Postgres database based on extracting the connection string from a Heroku config environment variable. # 1st July 2019, 1:35 am

Porting Datasette to ASGI, and Turtles all the way down

This evening I finally closed a Datasette issue that I opened more than 13 months ago: #272: Port Datasette to ASGI. A few notes on why this is such an important step for the project.

[... 1082 words]

json-flatten. A little Python library I wrote that attempts to flatten a JSON object into a set of key/value pairs suitable for transmitting in a query string or using to construct an HTML form. I first wrote this back in 2015 as a Gist—I’ve reconstructed the Gist commit history in a new repository and shipped it to PyPI. # 22nd June 2019, 4:51 am

Convert Locations.kml (pulled from an iPhone backup) to SQLite. I’ve been playing around with data from my iPhone using the iPhone Backup Extractor app and one of the things it exports for you is a Locations.kml file full of location history data. I wrote a tiny script using Python’s ElementTree XMLPullParser to efficiently iterate through the Placemarks and yield them as dictionaries, which I then batch-inserted into sqlite-utils to create a SQLite database. # 14th June 2019, 12:45 am

paginate-json (via) I released a fun tiny utility: paginate-json, which knows how to paginate through JSON APIs that use the HTTP Link header for pagination. I built it so I could pull data from the GitHub API and pipe it directly into SQLite via sqlite-utils. # 12th June 2019, 3:22 pm

datasette-render-binary (via) Yet another tiny Datasette plugin. This one attempts to render binary data in a slightly more readable fashion—it shows ASCII characters as they are, and shows all other data as monospace octets. Useful as a tool for exploring new unfamiliar databases as it makes it easier to spot if a binary column may contain a decipherable binary format. # 9th June 2019, 4:22 pm

datasette-bplist (via) It turns out an OS X laptop is positively crammed with SQLite databases, and many of them contain values that are data structures encoded using Apple’s binary plist format. datasette-bplist is my new plugin to help explore those files: it provides a display hook for rendering their contents, and a custom bplist_to_json() SQL function which can be used to extract and query information that is embedded in those values. The README includes tips on how to pull interesting EXIF data out of the SQLite database that sits behind Apple Photos. # 9th June 2019, 1:26 am

datasette-jq (via) I released another tiny Datasette plugin: datasette-jq registers a single custom SQL function, jq(), which lets you execute the jq expression language against a JSON column (or literal value) to filter and transform the JSON data. The README includes a link to a live demo—it’s a neat way to play with the jq micro-language. # 30th May 2019, 1:52 am

sqlite-utils 1.0. I just released sqlite-utils 1.0, with a couple of handy new features over 0.14: it can now automatically add columns to a database table if you attempt to insert data which doesn’t quite fit (using alter=True in the Python API or the --alter option to the “sqlite-utils insert” command). It also has the ability to output nested JSON column values on the command-line using the new --json-cols option. This is the first project I’ve marked as a 1.0 release in a very long time—I’ll be sticking to semver for this project from now on, bumping the major version only in the case of a backwards incompatible change. # 25th May 2019, 1:20 am

Datasette 0.28—and why master should always be releasable

It’s been quite a while since the last substantial release of Datasette. Datasette 0.27 came out all the way back in January.

[... 1326 words]

asgi-cors (via) I’ve been trying out the new ASGI 3.0 spec and I just released my first piece of ASGI middleware: asgi-cors, which lets you wrap an ASGI application with Access-Control-Allow-Origin CORS headers (either “*” or dynamic headers based on an origin whitelist). # 7th May 2019, 12:12 am

Hello world for ASGI running on Glitch (via) I’m continuing to experiment with Python 3 running on Glitch. This evening on my walk home from work I built this “hello world” demo on my phone, partly to see if Glitch was a workable mobile development environment—it passed with flying colours! The demo is a simple hello world implemented using the new ASGI 3.0 specification, running on the daphne reference server. Click the “via” link for my accompanying thread on Twitter, which includes a short screencast (also recorded on my phone) showing Glitch in action. # 26th April 2019, 5:06 am

Running Datasette on Glitch

The worst part of any software project is setting up a development environment. It’s by far the biggest barrier for anyone trying to get started learning to code. I’ve been a developer for more than twenty years and I still feel the pain any time I want to do something new.

[... 998 words]

csv-diff 0.3.1 (via) I released a minor update to my csv-diff CLI tool today which does a better job of displaying a human-readable representation of rows that have been added or removed from a file—previously they were represented as an ugly JSON dump. My script monitoring changes to the official list of trees in San Francisco has been running for a month now and has captured 23 commits! # 7th April 2019, 8:03 pm

Generating a commit log for San Francisco’s official list of trees

San Francisco has a neat open data portal (as do an increasingly large number of cities these days). For a few years my favourite file on there has been Street Tree List, a list of all 190,000 trees in the city maintained by the Department of Public Works.

[... 1051 words]