Simon Willison’s Weblog

Weeknotes: datasette-ics, datasette-upload-csvs, datasette-configure-fts, asgi-csrf

I’ve been preparing for the NICAR 2020 Data Journalism conference this week which has lead me into a flurry of activity across a plethora of different projects and plugins.

datasette-ics

NICAR publish their schedule as a CSV file. I couldn’t resist loading it into a Datasette on Glitch, which inspired me to put together a plugin I’ve been wanting for ages: datasette-ics, a register_output_renderer() plugin that can produce a subscribable iCalendar file from an arbitrary SQL query.

It’s based on datasette-atom and works in a similar way: you construct a query that outputs a required set of columns (event_name and event_dtstart as a minimum), then add the .ics extension to get back an iCalendar file.

You can optionally also include event_dtend, event_duration, event_description, event_uid and most importantly event_tz, which can contain a timezone string. Figuring out how to handle timezones was the fiddliest part of the project.

If you’re going to NICAR, subscribe to https://nicar-2020.glitch.me/data/calendar.ics in a calendar application to get the full 261 item schedule.

If you just want to see what the iCalendar feed looks like, add ?_plain=1 to preview it with a text/plain content type: https://nicar-2020.glitch.me/data/calendar.ics?_plain=1—and here’s the SQL query that powers it.

datasette-upload-csvs

My work on Datasette Cloud is inspiring all kinds of interesting work on plugins. I released datasette-upload-csvs a while ago, but now that Datasette has official write support I’ve been upgrading the plugin to hopefully achieve its full potential.

In particular, I’ve been improving its usability. CSV files can be big—and if you’re uploading 100MB of CSV it’s not particularly reassuring if your browser just sits for a few minutes spinning on the status bar.

So I added two progress bars to the plugins. The first is a client-side progress bar that shows you the progress of the initial file upload. I used the XMLHttpRequest pattern (and the drag-and-drop recipe) from Joseph Zimmerman’s useful article How To Make A Drag-and-Drop File Uploader With Vanilla JavaScriptfetch() doesn’t reliably report upload progres just yet.

I’m using Starlette and asyncio so uploading large files doesn’t tie up server resources in the same way that it would if I was using processes and threads.

The second progress bar relates to server-side processing of the file: churning through 100,000 rows of CSV data and inserting them into SQLite can take a while, and I wanted users to be able to see what was going on.

Here’s an animation screenshot of how the interface looks now:

Uploading a CSV

Implementing this was trickier. In the end I took advantage of the new dedicaed write thread made available by datasette.execute_write_fn()—since that thread has exclusive access to write to the database, I create a SQLite table called _csv_progress_ and write a new record to it every 10 rows. I use the number of bytes in the CSV file as the total and track how far through that file Python’s CSV parser has got using file.tell().

It seems to work really well. The full server-side code is here—the progress bar itself then polls Datasette’s JSON API for the record in the _csv_progress_ table.

datasette-configure-fts

SQLite ships with a decent implementation of full-text search. Datasette knows how to tell if a table has been configured for full-text search and adds a search box to the table page, documented here.

datasette-configure-fts is a new plugin that provides an interface for configuring search against existing SQLite tables. Under the hood it uses the sqlite-utils full-text search methods to configure the table and set up triggers to keep the index updated as data in the table changes.

It’s pretty simple, but it means that users of Datasette Cloud can upload a potentially enormous CSV file and then click to set specific columns as searchable. It’s a fun example of the kind of things that can be built with Datasette`s new write capabilities.

asgi-csrf

CSRF is one of my favourite web application security vulnerabilties—I first wrote about it on this blog back in 2005!

I was surprised to see that the Starlette/ASGI ecosystem doesn’t yet have much in the way of CSRF prevention. The best option I could find to use the WTForms library with Starlette.

I don’t need a full forms library for my purposes (at least not yet) but I needed CSRF protection for datasete-configure-fts, so I’ve started working on a small ASGI middleware library called asgi-csrf.

It’s modelled on a subset of Django’s robust CSRF prevention. The README warns people NOT to trust it yet—there are still some OWASP recommendations that it needs to apply (issue here) and I’m not yet ready to declare it robust and secure. It’s a start though, and feels like exactly the kind of problem that ASGI middleware is meant to address.

This is Weeknotes: datasette-ics, datasette-upload-csvs, datasette-configure-fts, asgi-csrf by Simon Willison, posted on 4th March 2020.

Tagged , , , , , , , , , ,

Next: datasette-search-all: a new plugin for searching multiple Datasette tables at once

Previous: Weeknotes: Datasette Writes