Weeknotes: datasette-ics, datasette-upload-csvs, datasette-configure-fts, asgi-csrf
I’ve been preparing for the NICAR 2020 Data Journalism conference this week which has lead me into a flurry of activity across a plethora of different projects and plugins.
NICAR publish their schedule as a CSV file. I couldn’t resist loading it into a Datasette on Glitch, which inspired me to put together a plugin I’ve been wanting for ages: datasette-ics, a register_output_renderer() plugin that can produce a subscribable iCalendar file from an arbitrary SQL query.
It’s based on datasette-atom and works in a similar way: you construct a query that outputs a required set of columns (
event_dtstart as a minimum), then add the
.ics extension to get back an iCalendar file.
You can optionally also include
event_uid and most importantly
event_tz, which can contain a timezone string. Figuring out how to handle timezones was the fiddliest part of the project.
If you’re going to NICAR, subscribe to https://nicar-2020.glitch.me/data/calendar.ics in a calendar application to get the full 261 item schedule.
If you just want to see what the iCalendar feed looks like, add
?_plain=1 to preview it with a
text/plain content type: https://nicar-2020.glitch.me/data/calendar.ics?_plain=1—and here’s the SQL query that powers it.
My work on Datasette Cloud is inspiring all kinds of interesting work on plugins. I released datasette-upload-csvs a while ago, but now that Datasette has official write support I’ve been upgrading the plugin to hopefully achieve its full potential.
In particular, I’ve been improving its usability. CSV files can be big—and if you’re uploading 100MB of CSV it’s not particularly reassuring if your browser just sits for a few minutes spinning on the status bar.
So I added two progress bars to the plugins. The first is a client-side progress bar that shows you the progress of the initial file upload. I used the
fetch() doesn’t reliably report upload progres just yet.
I’m using Starlette and
asyncio so uploading large files doesn’t tie up server resources in the same way that it would if I was using processes and threads.
The second progress bar relates to server-side processing of the file: churning through 100,000 rows of CSV data and inserting them into SQLite can take a while, and I wanted users to be able to see what was going on.
Here’s an animation screenshot of how the interface looks now:
Implementing this was trickier. In the end I took advantage of the new dedicaed write thread made available by
datasette.execute_write_fn()—since that thread has exclusive access to write to the database, I create a SQLite table called
_csv_progress_ and write a new record to it every 10 rows. I use the number of bytes in the CSV file as the total and track how far through that file Python’s CSV parser has got using
It seems to work really well. The full server-side code is here—the progress bar itself then polls Datasette’s JSON API for the record in the
SQLite ships with a decent implementation of full-text search. Datasette knows how to tell if a table has been configured for full-text search and adds a search box to the table page, documented here.
datasette-configure-fts is a new plugin that provides an interface for configuring search against existing SQLite tables. Under the hood it uses the sqlite-utils full-text search methods to configure the table and set up triggers to keep the index updated as data in the table changes.
It’s pretty simple, but it means that users of Datasette Cloud can upload a potentially enormous CSV file and then click to set specific columns as searchable. It’s a fun example of the kind of things that can be built with Datasette`s new write capabilities.
CSRF is one of my favourite web application security vulnerabilties—I first wrote about it on this blog back in 2005!
I was surprised to see that the Starlette/ASGI ecosystem doesn’t yet have much in the way of CSRF prevention. The best option I could find to use the WTForms library with Starlette.
I don’t need a full forms library for my purposes (at least not yet) but I needed CSRF protection for
datasete-configure-fts, so I’ve started working on a small ASGI middleware library called asgi-csrf.
It’s modelled on a subset of Django’s robust CSRF prevention. The README warns people NOT to trust it yet—there are still some OWASP recommendations that it needs to apply (issue here) and I’m not yet ready to declare it robust and secure. It’s a start though, and feels like exactly the kind of problem that ASGI middleware is meant to address.
More recent articles
- ChatGPT should include inline tips - 30th May 2023
- Lawyer cites fake cases invented by ChatGPT, judge is not amused - 27th May 2023
- llm, ttok and strip-tags - CLI tools for working with ChatGPT and other LLMs - 18th May 2023
- Delimiters won't save you from prompt injection - 11th May 2023
- Weeknotes: sqlite-utils 3.31, download-esm, Python in a sandbox - 10th May 2023
- Leaked Google document: "We Have No Moat, And Neither Does OpenAI" - 4th May 2023
- Midjourney 5.1 - 4th May 2023
- Prompt injection explained, with video, slides, and a transcript - 2nd May 2023
- download-esm: a tool for downloading ECMAScript modules - 2nd May 2023
- Let's be bear or bunny - 1st May 2023