Weeknotes: evernote-to-sqlite, Datasette Weekly, scrapers, csv-diff, sqlite-utils
This week I built
evernote-to-sqlite (see Building an Evernote to SQLite exporter), launched the Datasette Weekly newsletter, worked on some scrapers and pushed out some small improvements to several other projects.
The Datasette Weekly newsletter
After procrastinating on it for several months I finally launched the new Datasette Weekly newsletter!
My plan is to put this out once a week with a combination of news from the Datasette/Dogsheep/sqlite-utils ecosystem of tools, plus tips and tricks for using them to solve data problems.
You can read the first edition here, which covers Datasette 0.50, git scraping, sqlite-utils extract and features datasette-graphql as the plugin of the week.
I’m using Substack because people I trust use it for their newsletters and I decided that picking an option and launching was more important than spending even more time procrastinating on picking the best possible newsletter platform. So far it seems fit for purpose, and it provides an export option should I decide to move to something else.
I’ve been writing some scraper code to help out with a student journalism project at Stanford. I ended up using Selenium Python running in a Jupyter Notebook.
Historically I’ve avoided Selenium due to how weird and complex it has been to use in the past. I’ve now completely changed my mind: these days it’s a really solid option for browser automation driven by Python thanks to
geckodriver, which I recently learned can be installed using Homebrow.
I’m trying to get more of my tools past the 1.0 mark, mainly to indicate to potential users that I won’t be breaking backwards compatibility without bumping them to 2.0.
I built csv-diff for my San Francisco Trees project last year. It produces human-readable diffs for CSV files.
The version 1.0 release notes are as follows:
--show-unchangedoption for outputting the unchanged values of rows that had at least one change. #9
- Fix for bug with column names that contained a
- Fix for error when no
--keyprovided—thanks, @MainHanzo. #3
- CSV delimiter sniffer now
;delimited files. #6
sqlite-utils 2.22 adds some minor features—an
--encoding option for processing TSV and CSV files in encodings other than UTF-8, and more support for loading SQLite extensions modules.
Full release notes:
--encodingoption for processing CSV and TSV files that use a non-utf-8 encoding, for both the
--load-extensionoption is now available to many more commands. (#137)
--load-extension=spatialitecan be used to load SpatiaLite from common installation locations, if it is available. (#136)
- Tests now also run against Python 3.9. (#184)
pk=["id"]now has the same effect as passing
No new release yet, but I’ve landed some small new features to the
Inspired by the GitHub and WordPress APIs, Datasette’s JSON API now supports
Link: HTTP header pagination (#1014).
This is part of my ongoing effort to redesign the default JSON format ready for Datasette 1.0. I started a new plugin called datasette-json-preview to let me iterate on that format independent of Datasette itself.
Jacob Fenton suggested an “Edit SQL” button on canned queries. That’s a great idea, so I built it—this issue comment links to some demos, e.g. this one here.
I added an “x” button for clearing filters to the table page (#1016) demonstrated by this GIF:
TIL this week
Releases this week
- sqlite-utils 2.22—2020-10-16
- csv-diff 1.0—2020-10-16
- swarm-to-sqlite 0.3.2—2020-10-12
- evernote-to-sqlite 0.2—2020-10-12
- evernote-to-sqlite 0.1—2020-10-11
- xml-analyser 1.0—2020-10-11
- datasette-json-preview 0.1—2020-10-11
More recent articles
- Lawyer cites fake cases invented by ChatGPT, judge is not amused - 27th May 2023
- llm, ttok and strip-tags - CLI tools for working with ChatGPT and other LLMs - 18th May 2023
- Delimiters won't save you from prompt injection - 11th May 2023
- Weeknotes: sqlite-utils 3.31, download-esm, Python in a sandbox - 10th May 2023
- Leaked Google document: "We Have No Moat, And Neither Does OpenAI" - 4th May 2023
- Midjourney 5.1 - 4th May 2023
- Prompt injection explained, with video, slides, and a transcript - 2nd May 2023
- download-esm: a tool for downloading ECMAScript modules - 2nd May 2023
- Let's be bear or bunny - 1st May 2023
- Weeknotes: Miscellaneous research into Rye, ChatGPT Code Interpreter and openai-to-sqlite - 1st May 2023