440 items tagged “projects”
Posts about projects I have worked on.
2024
Building a tool showing how Gemini Pro can return bounding boxes for objects in images
I was browsing through Google’s Gemini documentation while researching how different multi-model LLM APIs work when I stumbled across this note in the vision documentation:
[... 1,792 words]My @covidsewage bot now includes useful alt text. I've been running a @covidsewage Mastodon bot for a while now, posting daily screenshots (taken with shot-scraper) of the Santa Clara County COVID in wastewater dashboard.
Prior to today the screenshot was accompanied by the decidedly unhelpful alt text "Screenshot of the latest Covid charts".
I finally fixed that today, closing issue #2 more than two years after I first opened it.
The screenshot is of a Microsoft Power BI dashboard. I hoped I could scrape the key information out of it using JavaScript, but the weirdness of their DOM proved insurmountable.
Instead, I'm using GPT-4o - specifically, this Python code (run using a python -c
block in the GitHub Actions YAML file):
import base64, openai client = openai.OpenAI() with open('/tmp/covid.png', 'rb') as image_file: encoded_image = base64.b64encode(image_file.read()).decode('utf-8') messages = [ {'role': 'system', 'content': 'Return the concentration levels in the sewersheds - single paragraph, no markdown'}, {'role': 'user', 'content': [ {'type': 'image_url', 'image_url': { 'url': 'data:image/png;base64,' + encoded_image }} ]} ] completion = client.chat.completions.create(model='gpt-4o', messages=messages) print(completion.choices[0].message.content)
I'm base64 encoding the screenshot and sending it with this system prompt:
Return the concentration levels in the sewersheds - single paragraph, no markdown
Given this input image:
Here's the text that comes back:
The concentration levels of SARS-CoV-2 in the sewersheds from collected samples are as follows: San Jose Sewershed has a high concentration, Palo Alto Sewershed has a high concentration, Sunnyvale Sewershed has a high concentration, and Gilroy Sewershed has a medium concentration.
The full implementation can be found in the GitHub Actions workflow, which runs on a schedule at 7am Pacific time every day.
Claude’s API now supports CORS requests, enabling client-side applications
Anthropic have enabled CORS support for their JSON APIs, which means it’s now possible to call the Claude LLMs directly from a user’s browser.
[... 625 words]Fix @covidsewage bot to handle a change to the underlying website. I've been running @covidsewage on Mastodon since February last year tweeting a daily screenshot of the Santa Clara County charts showing Covid levels in wastewater.
A few days ago the county changed their website, breaking the bot. The chart now lives on their new COVID in wastewater page.
It's still a Microsoft Power BI dashboard in an <iframe>
, but my initial attempts to scrape it didn't quite work. Eventually I realized that Cloudflare protection was blocking my attempts to access the page, but thankfully sending a Firefox user-agent fixed that problem.
The new recipe I'm using to screenshot the chart involves a delightfully messy nested set of calls to shot-scraper - first using shot-scraper javascript
to extract the URL attribute for that <iframe>
, then feeding that URL to a separate shot-scraper
call to generate the screenshot:
shot-scraper -o /tmp/covid.png $(
shot-scraper javascript \
'https://publichealth.santaclaracounty.gov/health-information/health-data/disease-data/covid-19/covid-19-wastewater' \
'document.querySelector("iframe").src' \
-b firefox \
--user-agent 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:128.0) Gecko/20100101 Firefox/128.0' \
--raw
) --wait 5000 -b firefox --retina
Upgrading my cookiecutter templates to use python -m pytest.
Every now and then I get caught out by weird test failures when I run pytest
and it turns out I'm running the wrong installation of that tool, so my tests fail because that pytest
is executing in a different virtual environment from the one needed by the tests.
The fix for this is easy: run python -m pytest
instead, which guarantees that you will run pytest
in the same environment as your currently active Python.
Yesterday I went through and updated every one of my cookiecutter
templates (python-lib, click-app, datasette-plugin, sqlite-utils-plugin, llm-plugin) to use this pattern in their READMEs and generated repositories instead, to help spread that better recipe a little bit further.
datasette-checkbox. I built this fun little Datasette plugin today, inspired by a conversation I had in Datasette Office Hours.
If a user has the update-row
permission and the table they are viewing has any integer columns with names that start with is_
or should_
or has_
, the plugin adds interactive checkboxes to that table which can be toggled to update the underlying rows.
This makes it easy to quickly spin up an interface that allows users to review and update boolean flags in a table.
I have ambitions for a much more advanced version of this, where users can do things like add or remove tags from rows directly in that table interface - but for the moment this is a neat starting point, and it only took an hour to build (thanks to help from Claude to build an initial prototype, chat transcript here).
Datasette 1.0a15. Mainly bug fixes, but a couple of minor new features:
- Datasette now defaults to hiding SQLite "shadow" tables, as seen in extensions such as SQLite FTS and sqlite-vec. Virtual tables that it makes sense to display, such as FTS core tables, are no longer hidden. Thanks, Alex Garcia. (#2296)
- The Datasette homepage is now duplicated at
/-/
, using the defaultindex.html
template. This ensures that the information on that page is still accessible even if the Datasette homepage has been customized using a customindex.html
template, for example on sites like datasette.io. (#2393)
Datasette also now serves more user-friendly CSRF pages, an improvement which required me to ship asgi-csrf 0.10.
Share Claude conversations by converting their JSON to Markdown. Anthropic's Claude is missing one key feature that I really appreciate in ChatGPT: the ability to create a public link to a full conversation transcript. You can publish individual artifacts from Claude, but I often find myself wanting to publish the whole conversation.
Before ChatGPT added that feature I solved it myself with this ChatGPT JSON transcript to Markdown Observable notebook. Today I built the same thing for Claude.
Here's how to use it:
The key is to load a Claude conversation on their website with your browser DevTools network panel open and then filter URLs for chat_
. You can use the Copy -> Response right click menu option to get the JSON for that conversation, then paste it into that new Observable notebook to get a Markdown transcript.
I like sharing these by pasting them into a "secret" Gist - that way they won't be indexed by search engines (adding more AI generated slop to the world) but can still be shared with people who have the link.
Here's an example transcript from this morning. I started by asking Claude:
I want to breed spiders in my house to get rid of all of the flies. What spider would you recommend?
When it suggested that this was a bad idea because it might attract pests, I asked:
What are the pests might they attract? I really like possums
It told me that possums are attracted by food waste, but "deliberately attracting them to your home isn't recommended" - so I said:
Thank you for the tips on attracting possums to my house. I will get right on that! [...] Once I have attracted all of those possums, what other animals might be attracted as a result? Do you think I might get a mountain lion?
It emphasized how bad an idea that would be and said "This would be extremely dangerous and is a serious public safety risk.", so I said:
OK. I took your advice and everything has gone wrong: I am now hiding inside my house from the several mountain lions stalking my backyard, which is full of possums
Claude has quite a preachy tone when you ask it for advice on things that are clearly a bad idea, which makes winding it up with increasingly ludicrous questions a lot of fun.
django-http-debug, a new Django app mostly written by Claude
Yesterday I finally developed something I’ve been casually thinking about building for a long time: django-http-debug. It’s a reusable Django app—something you can pip install
into any Django project—which provides tools for quickly setting up a URL that returns a canned HTTP response and logs the full details of any incoming request to a database table.
Datasette 1.0a14: The annotated release notes
Released today: Datasette 1.0a14. This alpha includes significant contributions from Alex Garcia, including some backwards-incompatible changes in the run-up to the 1.0 release.
[... 1,424 words]Image resize and quality comparison. Another tiny tool I built with Claude 3.5 Sonnet and Artifacts. This one lets you select an image (or drag-drop one onto an area) and then displays that same image as a JPEG at 1, 0.9, 0.7, 0.5, 0.3 quality settings, then again but with at half the width. Each image shows its size in KB and can be downloaded directly from the page.
I'm trying to use more images on my blog (example 1, example 2) and I like to reduce their file size and quality while keeping them legible.
The prompt sequence I used for this was:
Build an artifact (no React) that I can drop an image onto and it presents that image resized to different JPEG quality levels, each with a download link
Claude produced this initial artifact. I followed up with:
change it so that for any image it provides it in the following:
- original width, full quality
- original width, 0.9 quality
- original width, 0.7 quality
- original width, 0.5 quality
- original width, 0.3 quality
- half width - same array of qualities
For each image clicking it should toggle its display to full width and then back to max-width of 80%
Images should show their size in KB
Claude produced this v2.
I tweaked it a tiny bit (modifying how full-width images are displayed) - the final source code is available here. I'm hosting it on my own site which means the Download links work correctly - when hosted on claude.site
Claude's CSP headers prevent those from functioning.
llm-gguf. I just released a new alpha plugin for LLM which adds support for running models from Meta's new Llama 3.1 family that have been packaged as GGUF files - it should work for other GGUF chat models too.
If you've already installed LLM the following set of commands should get you setup with Llama 3.1 8B:
llm install llm-gguf
llm gguf download-model \
https://huggingface.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf \
--alias llama-3.1-8b-instruct --alias l31i
This will download a 4.92GB GGUF from lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF on Hugging Face and save it (at least on macOS) to your ~/Library/Application Support/io.datasette.llm/gguf/models
folder.
Once installed like that, you can run prompts through the model like so:
llm -m l31i "five great names for a pet lemur"
Or use the llm chat
command to keep the model resident in memory and run an interactive chat session with it:
llm chat -m l31i
I decided to ship a new alpha plugin rather than update my existing llm-llama-cpp plugin because that older plugin has some design decisions baked in from the Llama 2 release which no longer make sense, and having a fresh plugin gave me a fresh slate to adopt the latest features from the excellent underlying llama-cpp-python library by Andrei Betlen.
Weeknotes: GPT-4o mini, LLM 0.15, sqlite-utils 3.37 and building a staging environment
Upgrades to LLM to support the latest models, and a whole bunch of invisible work building out a staging environment for Datasette Cloud.
[... 730 words]LLM 0.15. A new release of my LLM CLI tool for interacting with Large Language Models from the terminal (see this recent talk for plenty of demos).
This release adds support for the brand new GPT-4o mini:
llm -m gpt-4o-mini "rave about pelicans in Spanish"
It also sets that model as the default used by the tool if no other model is specified. This replaces GPT-3.5 Turbo, the default since the first release of LLM. 4o-mini is both cheaper and way more capable than 3.5 Turbo.
Hacker News homepage with links to comments ordered by most recent first (via) Conversations on Hacker News are displayed as a tree, which can make it difficult to spot new comments added since the last time you viewed the thread.
There's a workaround for this using the Hacker News Algolia Search interface: search for story:STORYID
, select "comments" and the result will be a list of comments sorted by most recent first.
I got fed up of doing this manually so I built a quick tool in an Observable Notebook that documents the hack, provides a UI for pasting in a Hacker News URL to get back that search interface link and also shows the most recent items on the homepage with links to their most recently added comments.
See also my How to read Hacker News threads with most recent comments first TIL from last year.
datasette-python.
I just released a small new plugin for Datasette to assist with debugging. It adds a python
subcommand which runs a Python process in the same virtual environment as Datasette itself.
I built it initially to help debug some issues in Datasette installed via Homebrew. The Homebrew installation has its own virtual environment, and sometimes it can be useful to run commands like pip list
in the same environment as Datasette itself.
Now you can do this:
brew install datasette
datasette install datasette-python
datasette python -m pip list
I built a similar plugin for LLM last year, called llm-python - it's proved useful enough that I duplicated the design for Datasette.
Box shadow CSS generator (via) Another example of a tiny personal tool I built using Claude 3.5 Sonnet and artifacts. In this case my prompt was:
CSS for a slight box shadow, build me a tool that helps me twiddle settings and preview them and copy and paste out the CSS
I changed my mind half way through typing the prompt and asked it for a custom tool, and it built me this!
Here's the full transcript - in a follow-up prompt I asked for help deploying it and it rewrote the tool to use <script type="text/babel">
and the babel-standalone library to add React JSX support directly in the browser - a bit of a hefty dependency (387KB compressed / 2.79MB total) but I think acceptable for this kind of one-off tool.
Being able to knock out tiny custom tools like this on a whim is a really interesting new capability. It's also a lot of fun!
Chrome Prompt Playground.
Google Chrome Canary is currently shipping an experimental on-device LLM, in the form of Gemini Nano. You can access it via the new window.ai
API, after first enabling the "Prompt API for Gemini Nano" experiment in chrome://flags
(and then waiting an indeterminate amount of time for the ~1.7GB model file to download - I eventually spotted it in ~/Library/Application Support/Google/Chrome Canary/OptGuideOnDeviceModel
).
I got Claude 3.5 Sonnet to build me this playground interface for experimenting with the model. You can execute prompts, stream the responses and all previous prompts and responses are stored in localStorage
.
Here's the full Sonnet transcript, and the final source code for the app.
The best documentation I've found for the new API is is explainers-by-googlers/prompt-api on GitHub.
Compare PDFs. Inspired by this thread on Hacker News about the C++ diff-pdf tool I decided to see what it would take to produce a web-based PDF diff visualization tool using Claude 3.5 Sonnet.
It took two prompts:
Build a tool where I can drag and drop on two PDF files and it uses PDF.js to turn each of their pages into canvas elements and then displays those pages side by side with a third image that highlights any differences between them, if any differences exist
That give me a React app that didn't quite work, so I followed-up with this:
rewrite that code to not use React at all
Which gave me a working tool! You can see the full Claude transcript in this Gist. Here's a screenshot of the tool in action:
Being able to knock out little custom interactive web tools like this in a couple of minutes is so much fun.
New blog feature: Support for markdown in quotations. Another incremental improvement to my blog. I've been collecting quotations here since 2006 - I now render them using Markdown (previously they were just plain text). Here's one example. The full set of 920 (and counting) quotations can be explored using this search filter.
Datasette 0.64.8. A very small Datasette release, fixing a minor potential security issue where the name of missing databases or tables was reflected on the 404 page in a way that could allow an attacker to present arbitrary text to a user who followed a link. Not an XSS attack (no code could be executed) but still a potential vector for confusing messages.
Building search-based RAG using Claude, Datasette and Val Town
Retrieval Augmented Generation (RAG) is a technique for adding extra “knowledge” to systems built on LLMs, allowing them to answer questions against custom information not included in their training data. A common way to implement this is to take a question from a user, translate that into a set of search queries, run those against a search engine and then feed the results back into the LLM to generate an answer.
[... 3,372 words]llm-claude-3 0.4. LLM plugin release adding support for the new Claude 3.5 Sonnet model:
pipx install llm
llm install -U llm-claude-3
llm keys set claude
# paste AP| key here
llm -m claude-3.5-sonnet \
'a joke about a pelican and a walrus having lunch'
Weeknotes: Datasette Studio and a whole lot of blogging
I’m still spinning back up after my trip back to the UK, so actual time spent building things has been less than I’d like. I presented an hour long workshop on command-line LLM usage, wrote five full blog entries (since my last weeknotes) and I’ve also been leaning more into short-form link blogging—a lot more prominent on this site now since my homepage redesign last week.
[... 736 words]Tags with descriptions. Tiny new feature on my blog: I can now add optional descriptions to my tag pages, for example on datasette and sqlite-utils and prompt-injection.
I built this feature on a live call this morning as an unplanned demonstration of GitHub's new Copilot Workspace feature, where you can run a prompt against a repository and have it plan, implement and file a pull request implementing a change to the code.
My prompt was:
Add a feature that lets me add a description to my tag pages, stored in the database table for tags and visible on the /tags/x/ page at the top
It wasn't as compelling a demo as I expected: Copilot Workspace currently has to stream an entire copy of each file it modifies, which can take a long time if your codebase includes several large files that need to be changed.
It did create a working implementation on its first try, though I had given it an extra tip not to forget the database migration. I ended up making a bunch of changes myself before I shipped it, listed in the pull request.
I've been using Copilot Workspace quite a bit recently as a code explanation tool - I'll prompt it to e.g. "add architecture documentation to the README" on a random repository not owned by me, then read its initial plan to see what it's figured out without going all the way through to the implementation and PR phases. Example in this tweet where I figured out the rough design of the Jina AI Reader API for this post.
Language models on the command-line
I gave a talk about accessing Large Language Models from the command-line last week as part of the Mastering LLMs: A Conference For Developers & Data Scientists six week long online conference. The talk focused on my LLM Python command-line utility and ways you can use it (and its plugins) to explore LLMs and use them for useful tasks.
[... 4,992 words]PDF to Podcast (via) At first glance this project by Stephan Fitzpatrick is a cute demo of a terrible sounding idea... but then I tried it out and the results are weirdly effective. You can listen to a fake podcast version of the transformers paper, or upload your own PDF (with your own OpenAI API key) to make your own.
It's open source (Apache 2) so I had a poke around in the code. It gets a lot done with a single 180 line Python script.
When I'm exploring code like this I always jump straight to the prompt - it's quite long, and starts like this:
Your task is to take the input text provided and turn it into an engaging, informative podcast dialogue. The input text may be messy or unstructured, as it could come from a variety of sources like PDFs or web pages. Don't worry about the formatting issues or any irrelevant information; your goal is to extract the key points and interesting facts that could be discussed in a podcast. [...]
So I grabbed a copy of it and pasted in my blog entry about WWDC, which produced this result when I ran it through Gemini Flash using llm-gemini:
cat prompt.txt | llm -m gemini-1.5-flash-latest
Then I piped the result through my ospeak CLI tool for running text-to-speech with the OpenAI TTS models (after truncating to 690 tokens with ttok because it turned out to be slightly too long for the API to handle):
llm logs --response | ttok -t 690 | ospeak -s -o wwdc-auto-podcast.mp3
And here's the result (3.9MB 3m14s MP3).
It's not as good as the PDF-to-Podcast version because Stephan has some really clever code that uses different TTS voices for each of the characters in the transcript, but it's still a surprisingly fun way of repurposing text from my blog. I enjoyed listening to it while I was cooking dinner.
Datasette 0.64.7.
A very minor dot-fix release for Datasette stable, addressing this bug where Datasette running against the latest version of SQLite - 3.46.0 - threw an error on canned queries that included :named
parameters in their SQL.
The root cause was Datasette using a now invalid clever trick I came up with against the undocumented and unstable opcodes returned by a SQLite EXPLAIN
query.
I asked on the SQLite forum and learned that the feature I was using was removed in this commit to SQLite. D. Richard Hipp explains:
The P4 parameter to OP_Variable was not being used for anything. By omitting it, we make the prepared statement slightly smaller, reduce the size of the SQLite library by a few bytes, and help sqlite3_prepare() and similar run slightly faster.
Weeknotes: PyCon US 2024
Earlier this month I attended PyCon US 2024 in Pittsburgh, Pennsylvania. I gave an invited keynote on the Saturday morning titled “Imitation intelligence”, tying together much of what I’ve learned about Large Language Models over the past couple of years and making the case that the Python community has a unique opportunity and responsibility to help try to nudge this technology in a positive direction.
[... 474 words]AI counter app from my PyCon US keynote. In my keynote at PyCon US this morning I ran a counter at the top of my screen that automatically incremented every time I said the words "AI" or "artificial intelligence", using vosk, pyaudio and Tkinter. I wrote it in a few minutes with the help of GPT-4o - here's the code I ran as a GitHub repository.
I'll publish full detailed notes from my talk once the video is available on YouTube.