Simon Willison’s Weblog

Subscribe
Atom feed for projects

487 posts tagged “projects”

Posts about projects I have worked on.

2019

db-to-sqlite 1.0 release. I’ve released version 1.0 of my db-to-sqlite tool, which lets you create a SQLite database copy of any database supported by SQLAlchemy (I’ve tested it against MySQL and PostgreSQL). The tool has a bunch of new features: you can use --redact to redact specific columns, specify --table multiple times to copy a subset of tables, and the --all option now efficiently adds all foreign keys at the end of the import. The project now has unit tests which run against MySQL and PostgreSQL in Travis CI. Also included in the README: a shell one-liner for creating a local SQLite copy of a remote Heroku Postgres database based on extracting the connection string from a Heroku config environment variable.

# 1st July 2019, 1:35 am / projects, datasette, sqlite, mysql, postgresql, heroku

Porting Datasette to ASGI, and Turtles all the way down

Visit Porting Datasette to ASGI, and Turtles all the way down

This evening I finally closed a Datasette issue that I opened more than 13 months ago: #272: Port Datasette to ASGI. A few notes on why this is such an important step for the project.

[... 1,082 words]

json-flatten. A little Python library I wrote that attempts to flatten a JSON object into a set of key/value pairs suitable for transmitting in a query string or using to construct an HTML form. I first wrote this back in 2015 as a Gist—I’ve reconstructed the Gist commit history in a new repository and shipped it to PyPI.

# 22nd June 2019, 4:51 am / projects, json, python

Convert Locations.kml (pulled from an iPhone backup) to SQLite. I’ve been playing around with data from my iPhone using the iPhone Backup Extractor app and one of the things it exports for you is a Locations.kml file full of location history data. I wrote a tiny script using Python’s ElementTree XMLPullParser to efficiently iterate through the Placemarks and yield them as dictionaries, which I then batch-inserted into sqlite-utils to create a SQLite database.

# 14th June 2019, 12:45 am / kml, projects, sqlite, sqlite-utils, xml

paginate-json (via) I released a fun tiny utility: paginate-json, which knows how to paginate through JSON APIs that use the HTTP Link header for pagination. I built it so I could pull data from the GitHub API and pipe it directly into SQLite via sqlite-utils.

# 12th June 2019, 3:22 pm / projects, webapis, json, sqlite-utils

datasette-render-binary (via) Yet another tiny Datasette plugin. This one attempts to render binary data in a slightly more readable fashion—it shows ASCII characters as they are, and shows all other data as monospace octets. Useful as a tool for exploring new unfamiliar databases as it makes it easier to spot if a binary column may contain a decipherable binary format.

# 9th June 2019, 4:22 pm / projects, datasette

datasette-bplist (via) It turns out an OS X laptop is positively crammed with SQLite databases, and many of them contain values that are data structures encoded using Apple’s binary plist format. datasette-bplist is my new plugin to help explore those files: it provides a display hook for rendering their contents, and a custom bplist_to_json() SQL function which can be used to extract and query information that is embedded in those values. The README includes tips on how to pull interesting EXIF data out of the SQLite database that sits behind Apple Photos.

# 9th June 2019, 1:26 am / osx, projects, datasette

datasette-jq (via) I released another tiny Datasette plugin: datasette-jq registers a single custom SQL function, jq(), which lets you execute the jq expression language against a JSON column (or literal value) to filter and transform the JSON data. The README includes a link to a live demo—it’s a neat way to play with the jq micro-language.

# 30th May 2019, 1:52 am / projects, datasette, jq

sqlite-utils 1.0. I just released sqlite-utils 1.0, with a couple of handy new features over 0.14: it can now automatically add columns to a database table if you attempt to insert data which doesn’t quite fit (using alter=True in the Python API or the --alter option to the “sqlite-utils insert” command). It also has the ability to output nested JSON column values on the command-line using the new --json-cols option. This is the first project I’ve marked as a 1.0 release in a very long time—I’ll be sticking to semver for this project from now on, bumping the major version only in the case of a backwards incompatible change.

# 25th May 2019, 1:20 am / projects, versioning, sqlite, sqlite-utils, semver

Datasette 0.28—and why master should always be releasable

It’s been quite a while since the last substantial release of Datasette. Datasette 0.27 came out all the way back in January.

[... 1,326 words]

asgi-cors (via) I’ve been trying out the new ASGI 3.0 spec and I just released my first piece of ASGI middleware: asgi-cors, which lets you wrap an ASGI application with Access-Control-Allow-Origin CORS headers (either “*” or dynamic headers based on an origin whitelist).

# 7th May 2019, 12:12 am / projects, asgi, security, cors

Hello world for ASGI running on Glitch (via) I’m continuing to experiment with Python 3 running on Glitch. This evening on my walk home from work I built this “hello world” demo on my phone, partly to see if Glitch was a workable mobile development environment—it passed with flying colours! The demo is a simple hello world implemented using the new ASGI 3.0 specification, running on the daphne reference server. Click the “via” link for my accompanying thread on Twitter, which includes a short screencast (also recorded on my phone) showing Glitch in action.

# 26th April 2019, 5:06 am / glitch, asgi, projects

Running Datasette on Glitch

The worst part of any software project is setting up a development environment. It’s by far the biggest barrier for anyone trying to get started learning to code. I’ve been a developer for more than twenty years and I still feel the pain any time I want to do something new.

[... 998 words]

csv-diff 0.3.1 (via) I released a minor update to my csv-diff CLI tool today which does a better job of displaying a human-readable representation of rows that have been added or removed from a file—previously they were represented as an ugly JSON dump. My script monitoring changes to the official list of trees in San Francisco has been running for a month now and has captured 23 commits!

# 7th April 2019, 8:03 pm / projects, csv, diff

Generating a commit log for San Francisco’s official list of trees

Visit Generating a commit log for San Francisco's official list of trees

San Francisco has a neat open data portal (as do an increasingly large number of cities these days). For a few years my favourite file on there has been Street Tree List, a list of all 190,000 trees in the city maintained by the Department of Public Works.

[... 1,051 words]

I commissioned an oil painting of Barbra Streisand’s cloned dogs

Visit I commissioned an oil painting of Barbra Streisand’s cloned dogs

Last year, Barbra Streisand cloned her dog, Sammie.

[... 517 words]

sqlite-utils: a Python library and CLI tool for building SQLite databases

sqlite-utils is a combination Python library and command-line tool I’ve been building over the past six months which aims to make creating new SQLite databases as quick and easy as possible.

[... 1,237 words]

db-to-sqlite (via) I just released version 0.2 of a tiny CLI utility I’ve been working on. It builds on top of SQLAlchemy and lets you connect to any SQLAlchemy-supported database and convert the data from it to a local SQLite database file. The new --all option will mirror all available tables (including foreign key relationships), or you can use --sql to save the results of custom SQL queries.

# 8th February 2019, 6:08 am / projects, sqlite

Datasette 0.27 (via) The latest release of Datasette introduces an option to output tables and SQL query results as newline-delimited JSON—plus a new “datasette plugins” command for listing available plugins.

# 1st February 2019, 4:39 am / projects, json, datasette

Exploring search relevance algorithms with SQLite

SQLite isn’t just a fast, high quality embedded database: it also incorporates a powerful full-text search engine in the form of the FTS4 and FTS5 extensions. You’ve probably used these a bunch of times already: many iOS, Android and desktop applications use SQLite under-the-hood and use it to implement their built-in search.

[... 1,390 words]

2018

Building smaller Python Docker images

Changes are afoot at Zeit Now, my preferred hosting provider for the past year (see previous posts). They have announced Now 2.0, an intriguing new approach to providing auto-scaling immutable deployments. It’s built on top of lambdas, and comes with a whole host of new constraints: code needs to fit into a 5MB bundle for example (though it looks like this restriction will soon be relaxed a littleupdate November 19th you can now bump this up to 50MB).

[... 1,872 words]

The interesting ideas in Datasette

Datasette (previously) is my open source tool for exploring and publishing structured data. There are a lot of ideas embedded in Datasette. I realized that I haven’t put many of them into writing.

[... 2,857 words]

Analyzing US Election Russian Facebook Ads

Two interesting data sources have emerged in the past few weeks concerning the Russian impact on the 2016 US elections.

[... 922 words]

datasette-vega (via) I wrote a visualization plugin for Datasette that uses the excellent Vega “visualization grammar” library to provide bar, line and scatter charts configurable against any Datasette table or SQL query.

# 29th June 2018, 3 pm / datasette, visualization, plugins, projects

Datasette 0.23: CSV, SpatiaLite and more (via) The big new feature in 0.23 is CSV export: any Datasette table or query can now be exported as CSV, including the option to get all matching rows in one giant CSV file taking advantage of Python 3 async and Datasette’s efficient keyset pagination. Also in this release: improved support for SpatiaLite and various JSON API improvements including the ability to expand foreign key labels in JSON and CSV responses.

# 18th June 2018, 3:34 pm / projects, csv, datasette

Datasette Facets

Datasette 0.22 is out with the most significant new feature I’ve added since the initial release: faceted browse.

[... 1,189 words]

Datasette 0.21: New _shape=, new _size=, search within columns. Nothing earth-shattering here but it’s accumulated enough small improvements that it warranted a new release. You can now send ?_shape=array to get back a plain JSON array of results, ?_size=XXX|max to get back a specific number of rows from a table view and ?_search_COLUMN=text to run full-text search against a specific column.

# 5th May 2018, 11:25 pm / datasette, projects

Make Near Me (via) The natural evolution of owlsnearme.com—Make Near Me uses the Zeit Now API to allow anyone to deploy their own version of Owls Near Me for any species! I announced this on stage at Zeit Day SF 2018 as part of my talk on Datasette and Datasette Publish.

# 28th April 2018, 9:28 pm / zeit-now, projects, owlsnearyou

Exploring the UK Register of Members Interests with SQL and Datasette

Ever wondered which UK Members of Parliament get gifted the most helicopter rides? How about which MPs have been given Christmas hampers by the Sultan of Brunei? (David Cameron, William Hague and Michael Howard apparently). Here’s how to dig through the Register of Members Interests using SQL and Datasette.

[... 1,167 words]

JSON Escape Text. I built a tiny tool for turning text into an escaped JSON string—I needed it to help create descriptions and canned SQL queries for adding to Datasette’s metadata.json files.

# 25th April 2018, 4:13 am / projects, json, datasette