106 items tagged “github”
[... 2661 words]
Mastodon is definitely having a moment. User growth is skyrocketing as more and more people migrate over from Twitter.[... 1534 words]
For the last few years I’ve been trying to center my work around creating what I consider to be the Perfect Commit. This is a single commit that contains all of the following:[... 2011 words]
Open every CSV file in a GitHub repository in Datasette Lite (via) I built an Observable notebook that accepts a GitHub repository as input, scans it for CSV files and generates a link to open all of those CSV files in Datasette Lite. # 1st September 2022, 7:24 pm
sethmlarson/pypi-data (via) Seth Michael Larson uses GitHub releases to publish a ~325MB (gzipped to ~95MB) SQLite database on a roughly monthly basis that contains records of 370,000+ PyPI packages plus their OpenSSF score card metrics. It’s a really interesting dataset, but also a neat way of packaging and distributing data—the scripts Seth uses to generate the database file are included in the repository. # 11th August 2022, 1:02 am
[... 1146 words]
I figured out a GitHub Actions pattern to keep track of a file published somewhere on the internet and automatically open a new repository issue any time the contents of that file changes.[... 1211 words]
pip install command can accept a URL to a zip file or tarball. GitHub provides URLs that can create a zip file of any branch, tag or commit in any repository. Combining these is a really useful trick for maintaining Python packages.
How to push tagged Docker releases to Google Artifact Registry with a GitHub Action. Ben Welsh’s writeup includes detailed step-by-step instructions for getting the mysterious “Workload Identity Federation” mechanism to work with GitHub Actions and Google Cloud. I’ve been dragging my heels on figuring this out for quite a while, so it’s great to see the steps described at this level of detail. # 18th April 2022, 3:41 am
Datasette table diagram using Mermaid (via) Mermaid is a DSL for generating diagrams from plain text, designed to be embedded in Markdown. GitHub just added support for Mermaid to their Markdown pipeline, which inspired me to try it out. Here’s an Observable Notebook I built which uses Mermaid to visualize the relationships between Datasette tables based on their foreign keys. # 14th February 2022, 7:43 pm
GitHub Burndown (via) Neat Observable notebook by Tom MacWright—give it a GitHub access token and the name of a repo and it pulls the details of every issue and plots a burndown chart over time, showing how long issues stay open for. The code is worth spending some time with—the way it fetches data from the paginated JSON API is a really great example of using generators with Observable, and the chart itself is a lovely clear example of Observable Plot. # 10th February 2022, 4:29 pm
I’ve been experimenting with a new variant of Git scraping this week which I’m calling Help scraping. The key idea is to track changes made to CLI tools over time by recording the output of their
--help commands in a Git repository.
I’m maintaining a lot of different projects at the moment. I thought it would be useful to describe the process I use for adding a new feature to one of them, using the new sqlite-utils create-database command as an example.[... 2779 words]
Introducing stack graphs (via) GitHub launched “precise code navigation” for Python today—the first language to get support for this feature. Click on any Python symbol in GitHub’s code browsing views and a box will show you exactly where that symbol was defined—all based on static analysis by a custom parser written in Rust as opposed to executing any Python code directly. The underlying computer science uses a technique called stack graphs, based on scope graphs research from Eelco Visser’s research group at TU Delft. # 9th December 2021, 11:07 pm
At PyGotham this year I presented a ten minute workshop on how to package up a new open source Python library and publish it to the Python Package Index. Here is the video and accompanying notes, which should make sense even without watching the talk.[... 2055 words]
I spent the last week mostly surrounded by boxes: we’re completing our move to the new place and life is mostly unpacking now. I did find some time to fix some issues with my CDC vaccination history Datasette instance though.[... 514 words]
This week I broke my streak of not sending out the Datasette newsletter, figured out how to use Sphinx for Python class documentation, worked out how to run Datasette on GitHub Codespaces, implemented Datasette column metadata and got tantalizingly close to a solution for an elusive Datasette feature.[... 2164 words]
GitHub’s Engineering Team has moved to Codespaces. My absolute dream development environment is one where I can spin up a new, working development environment in seconds—to try something new on a branch, or because I broke something and don’t want to spend time figuring out how to fix it. This article from GitHub explains how they got there: from a half-day setup to a 45 minute bootstrap in a codespace, then to five minutes through shallow cloning and a nightly pre-built Docker image and finally to 10 seconds be setting up “pools of codespaces, fully cloned and bootstrapped, waiting to be connected with a developer who wants to get to work”. # 11th August 2021, 4:53 pm
Running GitHub on Rails 6.0. Back in 2019 Eileen M. Uchitelle explained how GitHub upgraded everything in production to Rails 6.0 within 1.5 weeks of the stable release. There’s a trick in here I really like: they have an automated weekly job which fetches the latest Rails main branch and runs the full GitHub test suite against it, giving them super-early warnings about anything that might break and letting them provide feedback to upstream about unintended regressions. # 6th August 2021, 4:30 pm
A framework for building Open Graph images. GitHub’s new social preview images are generated by a Node.js script that fetches data from their GraphQL API, generates an HTML version of the card and then grabs a PNG snapshot of it using Puppeteer. It takes an average of 280ms to serve an image and generates around 2 million unique images a day. Interestingly, they found that bumping the available RAM from 512MB up to 513MB had a big effect on performance, because Chromium detects devices on 512MB or less and switches some processes from parallel to sequential. # 22nd June 2021, 9:25 pm
Flat Data. New project from the GitHub OCTO (the Office of the CTO, love that backronym) somewhat inspired by my work on Git scraping: I’m really excited to see GitHub embracing git for CSV/JSON data in this way. Flat incorporates a reusable Action for scraping and storing data (using Deno), a VS Code extension for setting up those workflows and a very nicely designed Flat Viewer web app for browsing CSV and JSON data hosted on GitHub. # 19th May 2021, 1:05 am
Behind GitHub’s new authentication token formats (via) This is a really smart design. GitHub’s new tokens use a type prefix of “ghp_” or “gho_” or a few others depending on the type of token, to help support mechanisms that scan for accidental token publication. A further twist is that the last six characters of the tokens are a checksum, which means token scanners can reliably distinguish a real token from a coincidental string without needing to check back with the GitHub database. “One other neat thing about _ is it will reliably select the whole token when you double click on it”—what a useful detail! # 5th April 2021, 9:28 pm
GitHub, by default, writes five replicas of each repository across our three data centers to protect against failures at the server, rack, network, and data center levels. When we need to update Git references, we briefly take a lock across all of the replicas in all of our data centers, and release the lock when our three-phase-commit (3PC) protocol reports success.
How we found and fixed a rare race condition in our session handling. GitHub had a terrifying bug this month where a user reported suddenly being signed in as another user. This is a particularly great example of a security incident report, explaining how GitHub identified the underlying bug, what caused it and the steps they are taking to ensure bugs like that never happen in the future. The root cause was a convoluted sequence of events which could cause a Ruby Hash to be accidentally shared between two requests, caused as a result of a new background thread that was introduced as a performance optimization. # 18th March 2021, 11:06 pm
Making GitHub’s new homepage fast and performant. A couple of really clever tricks in this article by Tobias Ahlin. The first is using IntersectionObserver in conjunction with the video preload=“none” attribute to lazily load a video when it scrolls into view. The second is an ingenious trick to create an efficiently encoded transparent JPEG image: embed the image in a SVG file twice, once as the image and once as a transparency mask. # 29th January 2021, 7:05 pm
Everything You Always Wanted To Know About GitHub (But Were Afraid To Ask) (via) ClickHouse by Yandex is an open source column-oriented data warehouse, designed to run analytical queries against TBs of data. They’ve loaded the full GitHub Archive of events since 2011 into a public instance, which is a great way of both exploring GitHub activity and trying out ClickHouse. Here’s a query I just ran that shows number of watch events per year, for example: SELECT toYear(created_at) as yyyy, count() FROM github_events WHERE event_type = ’WatchEvent’ group by yyyy # 5th January 2021, 1:02 am
At GitHub, we want to protect developer privacy, and we find cookie banners quite irritating, so we decided to look for a solution. After a brief search, we found one: just don’t use any non-essential cookies. Pretty simple, really. 🤔 So, we have removed all non-essential cookies from GitHub, and visiting our website does not send any information to third-party analytics services.
I gave a talk yesterday about personal data warehouses for GitHub’s OCTO Speaker Series, focusing on my Datasette and Dogsheep projects. The video of the talk is now available, and I’m presenting that here along with an annotated summary of the talk, including links to demos and further information.[... 5166 words]