Simon Willison’s Weblog

Subscribe

35 items tagged “git”

2022

The Perfect Commit

For the last few years I’ve been trying to center my work around creating what I consider to be the Perfect Commit. This is a single commit that contains all of the following:

[... 2011 words]

sqlite-comprehend: run AWS entity extraction against content in a SQLite database

I built a new tool this week: sqlite-comprehend, which passes text from a SQLite database through the AWS Comprehend entity extraction service and stores the returned entities.

[... 1146 words]

A tiny CI system (via) Christian Ştefănescu shares a recipe for building a tiny self-hosted CI system using Git and Redis. A post-receive hook runs when a commit is pushed to the repo and uses redis-cli to push jobs to a list. Then a separate bash script runs a loop with a blocking “redis-cli blpop jobs” operation which waits for new jobs and then executes the CI job as a shell script. # 26th April 2022, 3:39 pm

Help scraping: track changes to CLI tools by recording their --help using Git

I’ve been experimenting with a new variant of Git scraping this week which I’m calling Help scraping. The key idea is to track changes made to CLI tools over time by recording the output of their --help commands in a Git repository.

[... 978 words]

How I build a feature

I’m maintaining a lot of different projects at the moment. I thought it would be useful to describe the process I use for adding a new feature to one of them, using the new sqlite-utils create-database command as an example.

[... 2779 words]

2021

git-history: a tool for analyzing scraped data collected using Git and SQLite

I described Git scraping last year: a technique for writing scrapers where you periodically snapshot a source of data to a Git repository in order to record changes to that source over time.

[... 2002 words]

2020

Commits are snapshots, not diffs (via) Useful, clearly explained revision of some Git fundamentals. # 17th December 2020, 10:01 pm

nyt-2020-election-scraper. Brilliant application of git scraping by Alex Gaynor and a growing team of contributors. Takes a JSON snapshot of the NYT’s latest election poll figures every five minutes, then runs a Python script to iterate through the history and build an HTML page showing the trends, including what percentage of the remaining votes each candidate needs to win each state. This is the perfect case study in why it can be useful to take a “snapshot if the world right now” data source and turn it into a git revision history over time. # 6th November 2020, 2:24 pm

Git scraping: track changes over time by scraping to a Git repository

Git scraping is the name I’ve given a scraping technique that I’ve been experimenting with for a few years now. It’s really effective, and more people should use it.

[... 963 words]

Weeknotes: Rocky Beaches, Datasette 0.48, a commit history of my database

This week I helped Natalie launch Rocky Beaches, shipped Datasette 0.48 and several releases of datasette-graphql, upgraded the CSRF protection for datasette-upload-csvs and figured out how to get a commit log of changes to my blog by backing up its database to a GitHub repository.

[... 1294 words]

Weeknotes: cookiecutter templates, better plugin documentation, sqlite-generate

I spent this week spreading myself between a bunch of smaller projects, and finally getting familiar with cookiecutter. I wrote about my datasette-plugin cookiecutter template earlier in the week; here’s what else I’ve been working on.

[... 703 words]

Web apps are typically continuously delivered, not rolled back, and you don’t have to support multiple versions of the software running in the wild.

This is not the class of software that I had in mind when I wrote the blog post 10 years ago. If your team is doing continuous delivery of software, I would suggest to adopt a much simpler workflow (like GitHub flow) instead of trying to shoehorn git-flow into your team.

Vincent Driessen # 14th May 2020, 1:49 pm

Weeknotes: Archiving coronavirus.data.gov.uk, custom pages and directory configuration in Datasette, photos-to-sqlite

I mainly made progress on three projects this week: Datasette, photos-to-sqlite and a cleaner way of archiving data to a git repository.

[... 1132 words]

Goodbye Zeit Now v1, hello datasette-publish-now—and talking to myself in GitHub issues

This week I’ve been mostly dealing with the finally announced shutdown of Zeit Now v1. And having long-winded conversations with myself in GitHub issues.

[... 2050 words]

2019

Repository driven development (via) I’m already a big fan of keeping documentation and code in the same repo so you can update them both from within the same code review, but this takes it even further: in repository driven development every aspect of the code and configuration needed to define, document, test and ship a service live in the service repository—all the way down to the configurations for reporting dashboards. This sounds like heaven. # 24th July 2019, 8:41 am

Generating a commit log for San Francisco’s official list of trees

San Francisco has a neat open data portal (as do an increasingly large number of cities these days). For a few years my favourite file on there has been Street Tree List, a list of all 190,000 trees in the city maintained by the Department of Public Works.

[... 1051 words]

2018

isomorphic-git (via) A pure-JavaScript implementation of the git protocol and underlying tools which works both server-side (Node.js) AND in the client, using an emulation of the fs API. Given the right CORS headers it can clone a GitHub repository over HTTPS right into your browser. Impressive. # 16th May 2018, 8:54 pm

Telling stories through your commits. Joel Chippendale’s excellent guide to writing a useful commit history. I spend a lot of time on my commit messages, because when I’m trying to understand code later on they are the only form of documentation that is guaranteed to remain up-to-date against the code at that exact point of time. These tips are clear, concise, teadabale and include some great examples. # 13th January 2018, 7:44 pm

2017

Anyone that has me on too high of a pedestal should see me fumbling around with git.

John Carmack # 12th November 2017, 3:50 pm

Exploding Git Repositories. Kate Murphy describes how git is vulnerable to a similar attack to the XML “billion laughs” recursive entity expansion attack—you can create a tiny git repository that acts as a “git bomb”, expanding 12 root objects to over a billion files using recursive blob references. # 12th October 2017, 7:43 pm

2013

What are the differences between “forking,” “cloning,” and downloading the project as a zip file on GitHub?

“fork” creates a copy of the project hosted on your own GitHub account. This is an exclusive Build software better, together. (links to: http://Github.com) feature and not a Git feature.

[... 98 words]

2012

Should I use Dropbox instead of Git for 2 coders? In terms of going really fast and working on things at the same time, I’m thinking it may be uber productive to use Dropbox for it’s instant syncing instead of Git/Github. What are the pros/cons?

Dropbox is definitely the wrong tool for this—you’ll find yourself running in to all sorts of weird problems very quickly if you attempt to use it this way.

[... 119 words]

2010

GitHub: Announcing SVN Support. The best kind of April Fool’s joke: one that works. It’s read-only, but that’s good enough to support referencing GitHub repositories from SVN externals. # 1st April 2010, 11:33 am

A successful Git branching model (via) This looks eminently sensible. The master branch is used for production-ready code, and is only updated by merging from either release branches or emergency hotfix branches. A develop branch is used for integration (from feature branches), and is branched to create release branches when a release is nearly ready. It’s all comprehensively documented and comes with some well-designed diagrams. # 20th January 2010, 7:30 pm

2009

Introducing the YUI 3 Gallery. Write a plugin for YUI3, BSD license it and sign a CLA and Yahoo! will push your module out to their CDN and make it loadable using the YUI().use() statement. They’re coordinating the submissions using GitHub. # 4th November 2009, 11:14 pm

How We Made GitHub Fast. Detailed overview of the new GitHub architecture. It’s a lot more complicated than I would have expected—lots of moving parts are involved in ensuring they can scale horizontally when they need to. Interesting components include nginx, Unicorn, Rails, DRBD, HAProxy, Redis, Erlang, memcached, SSH, git and a bunch of interesting new open source projects produced by the GitHub team such as BERT/Ernie and ProxyMachine. # 21st October 2009, 9:14 pm

homebrew. Exciting alternative to MacPorts for compiling software on OS X—homebrew avoids sudo and defines packages as simple Ruby scripts, shared and distributed using Git. # 21st September 2009, 6:51 pm

Fabric, Django, Git, Apache, mod_wsgi, virtualenv and pip deployment. I’m slowly working my way through this stack at the moment—next stop, fabric. # 28th July 2009, 11:56 am

djangopeople.net on GitHub. I’ve released the source code for Django People, the geographical community site developed last year by myself and Natalie Downe (it hasn’t otherwise been touched since April last year, so it needs porting to Django 1.1). If you want a new feature on the site, implement it and I’ll see about merging it in. # 4th May 2009, 6:12 pm

Dulwich. A pure Python implementation of the Git file format and protocols. Reinforces my impression that a key to Git’s success is stable, well designed and documented on-disk formats. # 16th February 2009, 10:27 pm