2022 in projects and blogging
31st December 2022
In lieu of my regular weeknotes (I took two weeks off for the holidays) here’s a look back at 2022, mainly in terms of projects and things I’ve written about.
January
-
What’s new in sqlite-utils 3.20 and 3.21: --lines, --text, --convert—some neat new features for
sqlite-utils
that expanded its capabilities in terms of log processing. - Datasette 0.60: The annotated release notes—improvements to Datasette’s internal APIs for writing to the database, plus a new plugin hook for custom table filters.
February
- Help scraping: track changes to CLI tools by recording their --help using Git—a new twist on git scraping, which I used to try and spot when AWS released the Lambda Function URLs I had been waiting for.
- Google Drive to SQLite—a project for creating a SQLite database of metadata for files in your Google Drive.
- Tutorial: Exploring a database with Datasette—the first official Datasette tutorial.
- Tutorial: Learn SQL with Datasette—tutorial number two.
March
- shot-scraper: automated screenshots for documentation, built on Playwright. This was a really fun new project which I expanded in all sorts of directions as the year went on.
-
Instantly create a GitHub repository to take screenshots of a web page—using
shot-scraper
and GitHub Action template repositories.
April
- Pillar Point Stewards, pypi-to-sqlite, improvements to shot-scraper and appreciating datasette-dashboards—Pillar Point Stewards was a project I built with Natalie Downe to manage the stewardship program she was running for our local reef.
- Building a Covid sewage Twitter bot (and other weeknotes) describes how I built @covidsewage to publish the latest Covid sewage numbers for the San Francisco Bay Area.
- Automatically opening issues when tracked file content changes was even more GitHub Actions + Git scraping hackery, this time to alert me when a file I was tracking in another GitHub repository changed by opening a new issue.
May
- Datasette Lite: a server-side Python web application running in a browser. This may have been my most impactful project of the year: I got Datasette (a Python server-side web application) working entirely in the browser, using WebAssembly and Pyodide!
- A Datasette tutorial written by GPT-3 was the first of my public explorations of GPT-3. I spent much of the rest of the year exploring its capabilities further.
June
- How to use the GPT-3 language model—I realized that most people I knew hadn’t even tried GPT-3 yet, so I wrote a tutorial to help them get started.
- Weeknotes: Datasette Cloud ready to preview—I continued to make progress on Datasette Cloud throughout the year, but it’s still not quite at a place where I’m happy to throw it open to a wider audience.
-
A tiny web app to create images from OpenStreetMap maps—another
shot-scraper
inspired project, which gives me the ability to easily create map images with custom markers. - Joining CSV files in your browser using Datasette Lite—a massive upgrade to Datasette Lite in terms of its usability for real-world problems.
- First impressions of DALL-E, generating images from text—I got access to the DALL-E preview.
- s3-ocr: Extract text from PDF files stored in an S3 bucket—I originally built this for the San Francisco Microscopical Society, but I’m confident it can be useful for all sorts of other archival projects.
July
- Using GPT-3 to explain how code works—more fun with GPT-3.
- sqlite-comprehend: run AWS entity extraction against content in a SQLite database was another Microscopical Society project.
- Tutorial: Cleaning data with sqlite-utils and Datasette—adapted from my talk at HYTRADBOI.
August
-
Plugin support for Datasette Lite is really fun: you can now add
?plugin=datasette-mp3-audio
to the URL to install plugins. - Analyzing ScotRail audio announcements with Datasette---from prototype to production which used that new feature in Datasette Lite!
- Building a searchable archive for the San Francisco Microscopical Society was the inspiration for several other projects I built this year.
- Stable Diffusion is a really big deal. I was definitely right about this! Generative AI got very, very big over the rest of the year.
September
- Exploring the training data behind Stable Diffusion was the collaboration I did with Andy Budd to explore the LAION dataset used to create Stable Diffusion. This has played a meaningful role in the ongoing conversation about the ethics of generative AI.
- Prompt injection attacks against GPT-3 ended up as a series of posts about a security vulnerability in many applications built on top of GPT-3 that has so far proven stubborn in the face of potential fixes.
- TIL: Deploying Python web apps as AWS Lambda functions—I’d been wanting to figure this out for years!
- Exploring 10m scraped Shutterstock videos used to train Meta’s Make-A-Video text-to-video model
- A tool to run caption extraction against online videos using Whisper and GitHub Issues/Actions describes my winning entry in the Bellingcat Hackathon, taking advantage of OpenAI’s Whisper speech to text model.
October
- Software engineering practices described a set of less obvious software engineering practices I think are well worth adopting in 2022.
- Measuring traffic during the Half Moon Bay Pumpkin Festival was another fun Git scraping hack.
- Weeknotes: DjangoCon, SQLite in Django, datasette-gunicorn talked about some of my work relating to DjangoCon 2022.
- The Perfect Commit took one of the key ideas from my DjangoCon talk and expanded it into a standalone essay.
November
- Designing a write API for Datasette set the scene for my work towards Datasette 1.0.
- Tracking Mastodon user numbers over time with a bucket of tricks was more Git scraping plus a new way of taking advantage of SQLite and WebAssembly.
- Coping strategies for the serial project hoarder is the detailed write-up of my talk from DjangoCon 2022.
December
- Datasette’s new JSON write API: The first alpha of Datasette 1.0—the first alpha! It’s finally starting to happen.
- AI assisted learning: Learning Rust with ChatGPT, Copilot and Advent of Code describes my attempts to learn Rust using the just-released ChatGPT, plus Copilot. I got to day 15 before the time commitment for the increasingly difficult puzzles became too much.
- Over-engineering Secret Santa with Python cryptography and Datasette was a fun quick project which gave me a chance to explore Python’s cryptography library.
- Datasette 1.0a2: Upserts and finely grained permissions—the last alpha release of the year.
More recent articles
- Notes from Bing Chat—Our First Encounter With Manipulative AI - 19th November 2024
- Project: Civic Band - scraping and searching PDF meeting minutes from hundreds of municipalities - 16th November 2024
- Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - 12th November 2024