Simon Willison’s Weblog

40 items tagged “datasette”

Datasette 0.23: CSV, SpatiaLite and more (via) The big new feature in 0.23 is CSV export: any Datasette table or query can now be exported as CSV, including the option to get all matching rows in one giant CSV file taking advantage of Python 3 async and Datasette’s efficient keyset pagination. Also in this release: improved support for SpatiaLite and various JSON API improvements including the ability to expand foreign key labels in JSON and CSV responses. # 18th June 2018, 3:34 pm

SpatiaLite — Datasette documentation. Datasette’s documentation now includes extensive coverage of the SpatiaLite extension for SQLite: how to install it, how to import latitude/longitude points, shapefiles and GeoJSON data into SpatiaLite tables, and how to run SQL queries against it that take advantage of spatial indexes. I’m learning SpatiaLite at the moment and filling out the documentation with each new trick I learn as I go—as Mark Pilgrim once taught me, the best way to learn a new technology is to write about it. # 30th May 2018, 4:34 am

Datasette Facets

Datasette 0.22 is out with the most significant new feature I’ve added since the initial release: faceted browse.

[... 1189 words]

sqlitebiter. SImilar to my csvs-to-sqlite tool, but sqlitebiter handles “CSV/Excel/HTML/JSON/LTSV/Markdown/SQLite/SSV/TSV/Google-Sheets”. Most interestingly, it works against HTML pages—run “sqlitebiter -v url ’https://en.wikipedia.org/wiki/Comparison_of_firewalls’” and it will scrape that Wikipedia page and create a SQLite table for each of the HTML tables it finds there. # 17th May 2018, 10:40 pm

Datasette: Full-text search. I wrote some documentation for Datasette’s full-text search feature, which detects tables which have been configured to use the SQLite FTS module and adds a search input box and support for a _search= querystring parameter. # 12th May 2018, 12:09 pm

Datasette: The Metropolitan Museum of Art (via) The Metropolitan Museum of Art publish a CSV file on GitHub with details of 464,360 items from their collection. I turned it into a searchable Datasette instance. # 9th May 2018, 6:38 pm

Notes from my appearance on the Changelog podcast

After I spoke at Zeit Day SF last weekend I sat down with Adam Stacoviak to record a 25 minute segment for episode 296 of the Changelog podcast, talking about Datasette. We covered a lot of ground!

[... 536 words]

Datasette 0.21: New _shape=, new _size=, search within columns. Nothing earth-shattering here but it’s accumulated enough small improvements that it warranted a new release. You can now send ?_shape=array to get back a plain JSON array of results, ?_size=XXX|max to get back a specific number of rows from a table view and ?_search_COLUMN=text to run full-text search against a specific column. # 5th May 2018, 11:25 pm

Datasette—a talk at Zeit Day SF 2018 (via) Slides from the talk I gave today about Datasette and Datasette Publish at the Zeit Day SF conference. # 28th April 2018, 9:31 pm

Exploring the UK Register of Members Interests with SQL and Datasette

Ever wondered which UK Members of Parliament get gifted the most helicopter rides? How about which MPs have been given Christmas hampers by the Sultan of Brunei? (David Cameron, William Hague and Michael Howard apparently). Here’s how to dig through the Register of Members Interests using SQL and Datasette.

[... 1167 words]

JSON Escape Text. I built a tiny tool for turning text into an escaped JSON string—I needed it to help create descriptions and canned SQL queries for adding to Datasette’s metadata.json files. # 25th April 2018, 4:13 am

Datasette ClusterMap Plugin – Querying UK Food Standards Agency (FSA) Food Hygiene Ratings Open Data (via) Tony Hirst wrote a tutorial on using datasette-cluster-map to analyze food hygiene ratings data from the FSA # 20th April 2018, 8:50 pm

Datasette plugins, and building a clustered map visualization

Datasette now supports plugins!

[... 751 words]

Datasette 0.19: Plugins Documentation (via) I’ve released the first preview of Datasette’s new plugin support, which uses the pluggy package originally developed for py.test. So far the only two plugin hooks are for SQLite connection creation (allowing custom SQL functions to be registered) and Jinja2 template environment initialization (for custom template tags), but this release is mainly about exercising the plugin registration mechanism and starting to gather feedback. Lots more to come. # 17th April 2018, 3:59 am

Datasette 0.18: units (via) This release features the first Datasette feature that was entirely designed and implemented by someone else (yay open source)—Russ Garrett wanted unit support (Hz, ft etc) for his Wireless Telegraphy Register project. It’s a really neat implementation: you can tell Datasette what units are in use for a particular database column and it will display the correct SI symbols on the page. Specifying units also enables unit-aware filtering: if Datasette knows that a column is measured in meters you can now query it for all rows that are less than 50 feet for example. # 14th April 2018, 3:56 pm

Wireless Telegraphy Register (via) Russ Garrett used Datasette to build a browsable interface to the UK’s register of business radio licenses, using data from Ofcom. # 12th April 2018, 4:08 pm

Datasette 0.15: sort by column (via) I’ve released the latest version of Datasette to PyPI. The key new feature is the ability to sort tables by column, using clickable column headers or directly via the new _sort= and _sort_desc= querystring parameters. # 9th April 2018, 5:25 pm

Baltimore Sun Public Salary Records (via) The Baltimore Sun have published an interactive search engine for public salaries of Maryland state employees, and it’s powered by Datasette! Since data journalism is one of my key use-cases for Datasette I’m incredibly excited to see this in the wild. They’ve also published the underlying source code (see the via link) which is a really nice example of how to use Datasette’s custom templates and canned query functionality. # 28th March 2018, 5:12 pm

Cloud-first: Rapid webapp deployment using containers (via) The Research Software Engineering group at ICL have written a tutorial on deploying web apps as Docker containers using Azure and they use Datasette as the example application. # 28th March 2018, 3:50 pm

Observable notebook: San Francisco trees from Datasette. I used an Observable notebook to rebuild my San Francisco tree search demo against a Datasette API of a CSV of trees published by the SF Department of Public Works. The map updates live as you type a query, and every cell can be toggled to view the underlying source code. # 1st February 2018, 12:37 am

Datasette Demo (video) from the SF Python Meetup

I gave a short talk about Datasette last month at the SF Python Meetup Holiday Party. They’ve just posted the video, so here it is:

[... 63 words]

Analyzing my Twitter followers with Datasette

I decided to do some ad-hoc analsis of my social network on Twitter this afternoon… and since everything is more fun if you bundle it up into a SQLite database and publish it to the internet I performed the analysis using Datasette.

[... 1314 words]

How to turn a list of JSON objects into a Datasette. ramadis on GitHub cleaned up data on 184,879 crimes reported in Buenos Aires since 2016 and shared them on GitHub as a JSON file. Here are my notes on how to use Pandas to convert JSON into SQLite and publish it using Datasette. # 20th January 2018, 1:07 am

Datasette Publish: a web app for publishing CSV files as an online database

I’ve just released Datasette Publish, a web tool for turning one or more CSV files into an online database with a JSON API.

[... 863 words]

How to compile and run the SQLite JSON1 extension on OS X. Thanks, Stack Overflow! I’ve been battling this one for a while—it turns out you can download the SQLite source bundle, compile just the json1.c file using gcc and load that extension in Python’s sqlite3 module (or with Datasette’s --load-extension= option) to gain access to the full suite of SQLite JSON functions—json(), json_extract() etc. # 10th January 2018, 9:01 pm

csvkit. “A suite of command-line tools for converting to and working with CSV”—includes a huge range of utilities for things like converting Excel and JSON to CSV, grepping, sorting and extracting a subset of columns, combining multiple CSV files together and exporting CSV to a relational database. Worth reading through the tutorial which shows how the different commands can be piped together. # 8th January 2018, 9:03 pm

Himalayan Database: From Visual FoxPro GUI to JSON API with Datasette (via) The Himalayan Database is a compilation of records for all expeditions that have climbed in the Nepalese Himalaya, originally compiled by journalist Elizabeth Hawley over several decades. The database is published as a Visual FoxPro database—here Raffaele Messuti‏ provides step-by-step instructions for extracting the data from the published archive, converting them to CSV using dbfcsv and then converting the CSVs to SQLite using csvs-to-sqlite so you can browse them using Datasette. # 8th January 2018, 7:26 pm

Building a location to time zone API with SpatiaLite, OpenStreetMap and Datasette

Given a latitude and longitude, how can we tell what time zone that point lies within? Here’s how I built a simple JSON API to answer that question, using a combination of data from OpenStreetMap, the SpatiaLite extension for SQLite and my Datasette API tool.

[... 2152 words]

VICE News Police Shootings in Datasette (via) VICE News collected data on both fatal and nonfatal police shootings from the 50 largest local police departments in the United States. They released the data under a CC BY-NC-SA 4.0 license so I’ve converted it to SQLite and loaded it into Datasette. # 11th December 2017, 10:47 pm

Datasette 0.14: customization edition. I just released the latest version of Datasette with a strong theme of customization: Datasette now supports custom templates and CSS styling hooks, and the metadata format has been expanded to allow per-database and per-table source/license/description information. You can also now define named canned queries which will be packaged up with your data. # 10th December 2017, 1:55 am