Weeknotes: datasette-ics, datasette-upload-csvs, datasette-configure-fts, asgi-csrf
4th March 2020
I’ve been preparing for the NICAR 2020 Data Journalism conference this week which has lead me into a flurry of activity across a plethora of different projects and plugins.
NICAR publish their schedule as a CSV file. I couldn’t resist loading it into a Datasette on Glitch, which inspired me to put together a plugin I’ve been wanting for ages: datasette-ics, a register_output_renderer() plugin that can produce a subscribable iCalendar file from an arbitrary SQL query.
It’s based on datasette-atom and works in a similar way: you construct a query that outputs a required set of columns (
event_dtstart as a minimum), then add the
.ics extension to get back an iCalendar file.
You can optionally also include
event_uid and most importantly
event_tz, which can contain a timezone string. Figuring out how to handle timezones was the fiddliest part of the project.
If you’re going to NICAR, subscribe to https://nicar-2020.glitch.me/data/calendar.ics in a calendar application to get the full 261 item schedule.
If you just want to see what the iCalendar feed looks like, add
?_plain=1 to preview it with a
text/plain content type: https://nicar-2020.glitch.me/data/calendar.ics?_plain=1—and here’s the SQL query that powers it.
My work on Datasette Cloud is inspiring all kinds of interesting work on plugins. I released datasette-upload-csvs a while ago, but now that Datasette has official write support I’ve been upgrading the plugin to hopefully achieve its full potential.
In particular, I’ve been improving its usability. CSV files can be big—and if you’re uploading 100MB of CSV it’s not particularly reassuring if your browser just sits for a few minutes spinning on the status bar.
So I added two progress bars to the plugins. The first is a client-side progress bar that shows you the progress of the initial file upload. I used the
fetch() doesn’t reliably report upload progres just yet.
I’m using Starlette and
asyncio so uploading large files doesn’t tie up server resources in the same way that it would if I was using processes and threads.
The second progress bar relates to server-side processing of the file: churning through 100,000 rows of CSV data and inserting them into SQLite can take a while, and I wanted users to be able to see what was going on.
Here’s an animation screenshot of how the interface looks now:
Implementing this was trickier. In the end I took advantage of the new dedicaed write thread made available by
datasette.execute_write_fn()—since that thread has exclusive access to write to the database, I create a SQLite table called
_csv_progress_ and write a new record to it every 10 rows. I use the number of bytes in the CSV file as the total and track how far through that file Python’s CSV parser has got using
SQLite ships with a decent implementation of full-text search. Datasette knows how to tell if a table has been configured for full-text search and adds a search box to the table page, documented here.
datasette-configure-fts is a new plugin that provides an interface for configuring search against existing SQLite tables. Under the hood it uses the sqlite-utils full-text search methods to configure the table and set up triggers to keep the index updated as data in the table changes.
It’s pretty simple, but it means that users of Datasette Cloud can upload a potentially enormous CSV file and then click to set specific columns as searchable. It’s a fun example of the kind of things that can be built with Datasette`s new write capabilities.
I was surprised to see that the Starlette/ASGI ecosystem doesn’t yet have much in the way of CSRF prevention. The best option I could find to use the WTForms library with Starlette.
I don’t need a full forms library for my purposes (at least not yet) but I needed CSRF protection for
datasete-configure-fts, so I’ve started working on a small ASGI middleware library called asgi-csrf.
It’s modelled on a subset of Django’s robust CSRF prevention. The README warns people NOT to trust it yet—there are still some OWASP recommendations that it needs to apply (issue here) and I’m not yet ready to declare it robust and secure. It’s a start though, and feels like exactly the kind of problem that ASGI middleware is meant to address.
More recent articles
- Weeknotes: Getting ready for NICAR - 27th February 2024
- The killer app of Gemini Pro 1.5 is video - 21st February 2024
- Weeknotes: a Datasette release, an LLM release and a bunch of new plugins - 9th February 2024
- LLM 0.13: The annotated release notes - 26th January 2024
- Weeknotes: datasette-test, datasette-build, PSF board retreat - 21st January 2024
- Talking about Open Source LLMs on Oxide and Friends - 17th January 2024
- Publish Python packages to PyPI with a python-lib cookiecutter template and GitHub Actions - 16th January 2024
- What I should have said about the term Artificial Intelligence - 9th January 2024
- Weeknotes: Page caching and custom templates for Datasette Cloud - 7th January 2024