Weeknotes: SBA Covid-19 PPP loans, Datasette talks, Datasette plugin upgrades
This week I’ve mainly been exploring Small Business Administration Covid-19 loans data, pitching some talks and upgrading some plugins for compatibility with Datasette 0.44+.
SBA PPP Covid-19 loan data
On Monday the Small Business Administration and the Treasury Department released detailed loan-level data for loans made under the Paycheck Protection Program as part of their Covid-19 response.
They released the data as a zip file full of CSVs on their Box account (the first time I’ve seen Box used for this kind of government data release).
The most interesting file in there was
foia_150k_plus.csv—a file containing 661,218 loans over $150,000. So I loaded it into Datasette and published it at https://sba-loans-covid-19.datasettes.com/loans_150k_plus/foia_150k_plus
I made one modification to the data: on the suggestion of David Cramer I imported a list of NAICS code descriptions from the US Census and set up the
NAICSCode column as a foreign key to that table.
Here’s a custom query showing the NAICS codes with the most loan claims > $150k—Offices of Dentists come in 8th place with 10,627 loans!
My Twitter thread has more commentary on things I found exploring the data, and my sba-loans-covid-19-datasette GitHub repo describes the exact steps I went through to create the Datasette instance (using csvs-to-sqlite and sqlite-utils).
Pitching some talks
I haven’t done any public speaking in a while, and the pandemic means I’m not going to be giving any in-person talks for the forseeable future... so I spent some time pitching talks to remote events.
I’ll be speaking at Django London on July 14th and I have a few other submissions in the pipeline.
I’m also attending (virtually) the SRCCON journalism conference next week. They asked me to put together a short video introduction to Datasette, which I’ve embedded below. I’ll be hanging out and talking to anyone who’s interested in learning more about the project, or who can help me figure out what direction to take it next.
SRCCON 2020: Datasette from OpenNews Source on Vimeo.
Datasette 0.44 broke some of my existing plugins due to a change in how it handles ASGI lifespan events. I’ve upgraded the following this week:
- datasette-configure-fts 1.0—a plugin for configuring which columns in a table are enabled for full-text search.
- datasette-edit-tables 0.2a—tools for renaming tables and adding columns. This isn’t particularly useful yet but I’m excited about its potential.
- datasette-media 0.3—a plugin for serving media from disk based on paths served out of the SQLite database.
- datasette-search-all 0.3—a plugin providing a mechanism for searching all FTS-enabled tables at once, discussed here previously.
sqlite-utils 2.11 is the first release of
sqlite-utils that was entirely written by someone else! Thomas Sibley added a new
--truncate option for emptying a table (safely within a transaction) before populating it and made an improvement to how transactions work generally.
Thomas inspired me to start thinking more carefully about how transactions should work with the library.
More recent articles
- Lawyer cites fake cases invented by ChatGPT, judge is not amused - 27th May 2023
- llm, ttok and strip-tags - CLI tools for working with ChatGPT and other LLMs - 18th May 2023
- Delimiters won't save you from prompt injection - 11th May 2023
- Weeknotes: sqlite-utils 3.31, download-esm, Python in a sandbox - 10th May 2023
- Leaked Google document: "We Have No Moat, And Neither Does OpenAI" - 4th May 2023
- Midjourney 5.1 - 4th May 2023
- Prompt injection explained, with video, slides, and a transcript - 2nd May 2023
- download-esm: a tool for downloading ECMAScript modules - 2nd May 2023
- Let's be bear or bunny - 1st May 2023
- Weeknotes: Miscellaneous research into Rye, ChatGPT Code Interpreter and openai-to-sqlite - 1st May 2023