46 items tagged “github-actions”
GitHub's Actions tool for repository automation.
2024
Generating Descriptive Weather Reports with LLMs. Drew Breunig produces the first example I've seen in the wild of the new LLM attachments Python API. Drew's Downtown San Francisco Weather Vibes project combines output from a JSON weather API with the latest image from a webcam pointed at downtown San Francisco to produce a weather report "with a style somewhere between Jack Kerouac and J. Peterman".
Here's the Python code that constructs and executes the prompt. The code runs in GitHub Actions.
UV with GitHub Actions to run an RSS to README project.
Jeff Triplett demonstrates a very neat pattern for using uv to run Python scripts with their dependencies inside of GitHub Actions. First, add uv
to the workflow using the setup-uv action:
- uses: astral-sh/setup-uv@v3
with:
enable-cache: true
cache-dependency-glob: "*.py"
This enables the caching feature, which stores uv's own cache of downloads from PyPI between runs. The cache-dependency-glob
key ensures that this cache will be invalidated if any .py
file in the repository is updated.
Now you can run Python scripts using steps that look like this:
- run: uv run fetch-rss.py
If that Python script begins with some dependency definitions (PEP 723) they will be automatically installed by uv run
on the first run and reused from the cache in the future. From the start of fetch-rss.py:
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "feedparser",
# "typer",
# ]
# ///
uv
will download the required Python version and cache that as well.
New improved commit messages for scrape-hacker-news-by-domain. My simonw/scrape-hacker-news-by-domain repo has a very specific purpose. Once an hour it scrapes the Hacker News /from?site=simonwillison.net page (and the equivalent for datasette.io) using my shot-scraper tool and stashes the parsed links, scores and comment counts in JSON files in that repo.
It does this mainly so I can subscribe to GitHub's Atom feed of the commit log - visit simonw/scrape-hacker-news-by-domain/commits/main and add .atom
to the URL to get that.
NetNewsWire will inform me within about an hour if any of my content has made it to Hacker News, and the repo will track the score and comment count for me over time. I wrote more about how this works in Scraping web pages from the command line with shot-scraper back in March 2022.
Prior to the latest improvement, the commit messages themselves were pretty uninformative. The message had the date, and to actually see which Hacker News post it was referring to, I had to click through to the commit and look at the diff.
I built my csv-diff tool a while back to help address this problem: it can produce a slightly more human-readable version of a diff between two CSV or JSON files, ideally suited for including in a commit message attached to a git scraping repo like this one.
I got that working, but there was still room for improvement. I recently learned that any Hacker News thread has an undocumented URL at /latest?id=x
which displays the most recently added comments at the top.
I wanted that in my commit messages, so I could quickly click a link to see the most recent comments on a thread.
So... I added one more feature to csv-diff
: a new --extra option lets you specify a Python format string to be used to add extra fields to the displayed difference.
My GitHub Actions workflow now runs this command:
csv-diff simonwillison-net.json simonwillison-net-new.json \
--key id --format json \
--extra latest 'https://news.ycombinator.com/latest?id={id}' \
>> /tmp/commit.txt
This generates the diff between the two versions, using the id
property in the JSON to tie records together. It adds a latest
field linking to that URL.
The commits now look like this:
GitHub Actions: Faster Python runs with cached virtual environments (via) Adam Johnson shares his improved pattern for caching Python environments in GitHub Actions.
I've been using the pattern where you add cache: pip
to the actions/setup-python
block, but it has two disadvantages: if the tests fail the cache won't be saved at the end, and it still spends time installing the packages despite not needing to download them fresh since the wheels are in the cache.
Adam's pattern works differently: he caches the entire .venv/
folder between runs, avoiding the overhead of installing all of those packages. He also wraps the block that installs the packages between explicit actions/cache/restore
and actions/cache/save
steps to avoid the case where failed tests skip the cache persistence.
qrank (via) Interesting and very niche project by Colin Dellow.
Wikidata has pages for huge numbers of concepts, people, places and things.
One of the many pieces of data they publish is QRank—“ranking Wikidata entities by aggregating page views on Wikipedia, Wikispecies, Wikibooks, Wikiquote, and other Wikimedia projects”. Every item gets a score and these scores can be used to answer questions like “which island nations get the most interest across Wikipedia”—potentially useful for things like deciding which labels to display on a highly compressed map of the world.
QRank is published as a gzipped CSV file.
Colin’s hikeratlas/qrank GitHub repository runs weekly, fetches the latest qrank.csv.gz file and loads it into a SQLite database using SQLite’s “.import” mechanism. Then it publishes the resulting SQLite database as an asset attached to the “latest” GitHub release on that repo—currently a 307MB file.
The database itself has just a single table mapping the Wikidata ID (a primary key integer) to the latest QRank—another integer. You’d need your own set of data with Wikidata IDs to join against this to do anything useful.
I’d never thought of using GitHub Releases for this kind of thing. I think it’s a really interesting pattern.
GitHub Actions: Introducing the new M1 macOS runner available to open source! Set “runs-on: macos-14” to run a GitHub Actions workflow on a 7GB of RAM ARM M1 runner. I have been looking forward to this for ages: it should make it much easier to build releases of both Electron apps and Python binary wheels for Apple Silicon.
Publish Python packages to PyPI with a python-lib cookiecutter template and GitHub Actions
I use cookiecutter to start almost all of my Python projects. It helps me quickly generate a skeleton of a project with my preferred directory structure and configured tools.
[... 686 words]2022
Tracking Mastodon user numbers over time with a bucket of tricks
Mastodon is definitely having a moment. User growth is skyrocketing as more and more people migrate over from Twitter.
[... 1,534 words]Leveraging ’shot-scraper’ and creating image diffs. Üllar Seerme has a neat recipe for using shot-scraper and ImageMagick to create differential animations showing how a scraped web page has visually changed.
How to create a Python package in 2022 (via) Fantastic tutorial on modern Python packaging by Rodrigo Girão Serrão. I’ve been meaning to figure out Poetry for a while now and this gave me exactly the information I needed to start figuring it out. Great coverage of GitHub Actions, Tox and pre-commit as well.
Automating screenshots for the Datasette documentation using shot-scraper
I released shot-scraper back in March as a tool for keeping screenshots in documentation up-to-date.
[... 1,810 words]Software engineering practices
Gergely Orosz started a Twitter conversation asking about recommended “software engineering practices” for development teams.
[... 1,557 words]A tool to run caption extraction against online videos using Whisper and GitHub Issues/Actions
I released a new project this weekend, built during the Bellingcat Hackathon (I came second!) It’s called Action Transcription and it’s a tool for caturing captions and transcripts from online videos.
[... 1,362 words]upptime (via) “Open-source uptime monitor and status page, powered entirely by GitHub Actions, Issues, and Pages.” This is a very creative (ab)use of GitHub Actions: it runs a scheduled action to check the availability of sites that you specify, records the results in a YAML file (with the commit history tracking them over time) and can automatically open a GitHub issue for you if it detects a new incident.
simonw/datasette-screenshots (via) I started a new GitHub repository to automate taking screenshots of Datasette for marketing purposes, using my shot-scraper browser automation tool.
Supercharging GitHub Actions with Job Summaries (via) GitHub Actions workflows can now generate a rendered Markdown summary of, well, anything that you can think to generate as part of the workflow execution. I particularly like the way this is designed: they provide a filename in a $GITHUB_STEP_SUMMARY environment variable which you can then append data to from each of your steps.
Automatically opening issues when tracked file content changes
I figured out a GitHub Actions pattern to keep track of a file published somewhere on the internet and automatically open a new repository issue any time the contents of that file changes.
[... 1,211 words]Building a Covid sewage Twitter bot (and other weeknotes)
I built a new Twitter bot today: @covidsewage. It tweets a daily screenshot of the latest Covid sewage monitoring data published by Santa Clara county.
[... 1,079 words]Instantly create a GitHub repository to take screenshots of a web page
I just released shot-scraper-template, a GitHub repository template that helps you start taking automated screenshots of a web page by filling out a form.
[... 1,177 words]Scraping web pages from the command line with shot-scraper
I’ve added a powerful new capability to my shot-scraper command line browser automation tool: you can now use it to load a web page in a headless browser, execute JavaScript to extract information and return that information back to the terminal as JSON.
[... 1,277 words]@newshomepages (via) Ben Welsh used my shot-scraper tool and GitHub Actions to launch a Twitter bot which tweets screenshots of newspaper homepages on a scheduled basis. Ben says: “The tech is so easy, I was able to pull it off in a couple hours at zero cost. A decade ago I ran a similar project using the cloud resources of the day. [...] It costs thousands of dollars and the screenshots were of much lower quality. Incredible progress!”
shot-scraper: automated screenshots for documentation, built on Playwright
shot-scraper is a new tool that I’ve built to help automate the process of keeping screenshots up-to-date in my documentation. It also doubles as a scraping tool—hence the name—which I picked as a complement to my git scraping and help scraping techniques.
[... 1,802 words]Help scraping: track changes to CLI tools by recording their --help using Git
I’ve been experimenting with a new variant of Git scraping this week which I’m calling Help scraping. The key idea is to track changes made to CLI tools over time by recording the output of their --help
commands in a Git repository.
2021
wheel.yml for Pyjion using cibuildwheel (via) cibuildwheel, maintained by the Python Packaging Authority, builds and tests Python wheels across multiple platforms. I hadn’t realized quite how minimal a configuration using their GitHub Actions action was until I looked at how Pyjion was using it.
How to build, test and publish an open source Python library
At PyGotham this year I presented a ten minute workshop on how to package up a new open source Python library and publish it to the Python Package Index. Here is the video and accompanying notes, which should make sense even without watching the talk.
[... 2,055 words]Finding and reporting an asyncio bug in Python 3.10
I found a bug in Python 3.10 today! Some notes on how I found it and my process for handling it once I figured out what was going on.
[... 1,789 words]Datasette Desktop—a macOS desktop application for Datasette
I just released version 0.1.0 of the new Datasette macOS desktop application, the first version that end-users can easily install. I would very much appreciate your help testing it out!
[... 1,761 words]Dynamic content for GitHub repository templates using cookiecutter and GitHub Actions
GitHub repository templates were introduced a couple of years ago to provide a mechanism for creating a brand new GitHub repository starting with an initial set of files.
[... 1,413 words]Django SQL Dashboard 1.0
Earlier this week I released Django SQL Dashboard 1.0. I also just released 1.0.1, with a bug fix for PostgreSQL 10 contributed by Ryan Cheley.
[... 629 words]PAGNIs: Probably Are Gonna Need Its
Luke Page has a great post up with his list of YAGNI exceptions.
[... 1,289 words]