Weeknotes: evernote-to-sqlite, Datasette Weekly, scrapers, csv-diff, sqlite-utils
16th October 2020
This week I built
evernote-to-sqlite (see Building an Evernote to SQLite exporter), launched the Datasette Weekly newsletter, worked on some scrapers and pushed out some small improvements to several other projects.
The Datasette Weekly newsletter
After procrastinating on it for several months I finally launched the new Datasette Weekly newsletter!
My plan is to put this out once a week with a combination of news from the Datasette/Dogsheep/sqlite-utils ecosystem of tools, plus tips and tricks for using them to solve data problems.
I’m using Substack because people I trust use it for their newsletters and I decided that picking an option and launching was more important than spending even more time procrastinating on picking the best possible newsletter platform. So far it seems fit for purpose, and it provides an export option should I decide to move to something else.
I’ve been writing some scraper code to help out with a student journalism project at Stanford. I ended up using Selenium Python running in a Jupyter Notebook.
Historically I’ve avoided Selenium due to how weird and complex it has been to use in the past. I’ve now completely changed my mind: these days it’s a really solid option for browser automation driven by Python thanks to
geckodriver, which I recently learned can be installed using Homebrow.
I’m trying to get more of my tools past the 1.0 mark, mainly to indicate to potential users that I won’t be breaking backwards compatibility without bumping them to 2.0.
The version 1.0 release notes are as follows:
--show-unchangedoption for outputting the unchanged values of rows that had at least one change. #9
- Fix for bug with column names that contained a
- Fix for error when no
--keyprovided—thanks, @MainHanzo. #3
- CSV delimiter sniffer now
;delimited files. #6
sqlite-utils 2.22 adds some minor features—an
--encoding option for processing TSV and CSV files in encodings other than UTF-8, and more support for loading SQLite extensions modules.
Full release notes:
--encodingoption for processing CSV and TSV files that use a non-utf-8 encoding, for both the
--load-extensionoption is now available to many more commands. (#137)
--load-extension=spatialitecan be used to load SpatiaLite from common installation locations, if it is available. (#136)
- Tests now also run against Python 3.9. (#184)
pk=["id"]now has the same effect as passing
No new release yet, but I’ve landed some small new features to the
Inspired by the GitHub and WordPress APIs, Datasette’s JSON API now supports
Link: HTTP header pagination (#1014).
This is part of my ongoing effort to redesign the default JSON format ready for Datasette 1.0. I started a new plugin called datasette-json-preview to let me iterate on that format independent of Datasette itself.
I added an “x” button for clearing filters to the table page (#1016) demonstrated by this GIF:
TIL this week
Releases this week
More recent articles
- Datasette Enrichments: a new plugin framework for augmenting your data - 1st December 2023
- llamafile is the new best way to run a LLM on your own computer - 29th November 2023
- Prompt injection explained, November 2023 edition - 27th November 2023
- I'm on the Newsroom Robots podcast, with thoughts on the OpenAI board - 25th November 2023
- Weeknotes: DevDay, GitHub Universe, OpenAI chaos - 22nd November 2023
- Deciphering clues in a news article to understand how it was reported - 22nd November 2023
- Exploring GPTs: ChatGPT in a trench coat? - 15th November 2023
- Financial sustainability for open source projects at GitHub Universe - 10th November 2023
- ospeak: a CLI tool for speaking text in the terminal via OpenAI - 7th November 2023
- DALL-E 3, GPT4All, PMTiles, sqlite-migrate, datasette-edit-schema - 30th October 2023