Simon Willison’s Weblog

41 items tagged “docker”

2022

Testing Datasette parallel SQL queries in the nogil/python fork. As part of my ongoing research into whether Datasette can be sped up by running SQL queries in parallel I’ve been growing increasingly suspicious that the GIL is holding me back. I know the sqlite3 module releases the GIL and was hoping that would give me parallel queries, but it looks like there’s still a ton of work going on in Python GIL land creating Python objects representing the results of the query. Sam Gross has been working on a nogil fork of Python and I decided to give it a go. It’s published as a Docker image and it turns out trying it out really did just take a few commands... and it produced the desired results, my parallel code started beating my serial code where previously the two had produced effectively the same performance numbers. I’m pretty stunned by this. I had no idea how far along the nogil fork was. It’s amazing to see it in action. # 29th April 2022, 5:45 am

How to push tagged Docker releases to Google Artifact Registry with a GitHub Action. Ben Welsh’s writeup includes detailed step-by-step instructions for getting the mysterious “Workload Identity Federation” mechanism to work with GitHub Actions and Google Cloud. I’ve been dragging my heels on figuring this out for quite a while, so it’s great to see the steps described at this level of detail. # 18th April 2022, 3:41 am

2021

Introduction to heredocs in Dockerfiles (via) This is a fantastic upgrade to Dockerfile syntax, enabled by BuildKit and a new frontend for executing the Dockerfile that can be specified with a #syntax= directive. I often like to create a standalone Dockerfile that works without needing other files from a directory, so being able to use <<EOF syntax to populate configure files from inline blocks of code is really handy. # 22nd November 2021, 5:01 pm

Weeknotes: Apache proxies in Docker containers, refactoring Datasette

Updates to six major projects this week, plus finally some concrete progress towards Datasette 1.0.

[... 1630 words]

aws-lambda-adapter. AWS Lambda added support for Docker containers last year, but with a very weird shape: you can run anything on Lambda that fits in a Docker container, but unlike Google Cloud Run your application doesn’t get to speak HTTP: it needs to run code that listens for proprietary AWS lambda events instead. The obvious way to fix this is to run some kind of custom proxy inside the container which turns AWS runtime events into HTTP calls to a regular web application. Serverlessish and re:Web are two open source projects that implemented this, and now AWS have their own implementation of that pattern, written in Rust. # 28th October 2021, 5:04 am

Weeknotes: Learning Kubernetes, learning Web Components

I’ve been mainly climbing the learning curve for Kubernetes and Web Components this week. I also released Datasette 0.59.1 with Python 3.10 compatibility and an updated Docker image.

[... 1101 words]

We never shipped a great commercial product. The reason for that is we didn’t focus. We tried to do a little bit of everything. It’s hard enough to maintain the growth of your developer community and build one great commercial product, let alone three or four, and it is impossible to do both, but that’s what we tried to do and we spent an enormous amount of money doing it.

Solomon Hykes # 7th September 2021, 2:47 pm

GitHub’s Engineering Team has moved to Codespaces. My absolute dream development environment is one where I can spin up a new, working development environment in seconds—to try something new on a branch, or because I broke something and don’t want to spend time figuring out how to fix it. This article from GitHub explains how they got there: from a half-day setup to a 45 minute bootstrap in a codespace, then to five minutes through shallow cloning and a nightly pre-built Docker image and finally to 10 seconds be setting up “pools of codespaces, fully cloned and bootstrapped, waiting to be connected with a developer who wants to get to work”. # 11th August 2021, 4:53 pm

Best Practices Around Production Ready Web Apps with Docker Compose (via) I asked on Twitter for some tips on Docker Compose and was pointed to this article by Nick Janetakis, which has a whole host of useful tips and patterns I hadn’t encountered before. # 12th June 2021, 2:36 am

Weeknotes: Docker architectures, sqlite-utils 3.7, nearly there with Datasette 0.57

This week I learned a whole bunch about using Docker to emulate different architectures, released sqlite-utils 3.7 and made a ton of progress towards the almost-ready-to-ship Datasette 0.57.

[... 1081 words]

logpaste (via) Useful example of how to use the Litestream SQLite replication tool in a Dockerized application: S3 credentials are passed to the container on startup, it then attempts to restore the SQLite database from S3 and starts a Litestream process in the same container to periodically synchronize changes back up to the S3 bucket. # 17th March 2021, 3:48 pm

2020

New for AWS Lambda – Container Image Support. “You can now package and deploy Lambda functions as container images of up to 10 GB in size”—can’t wait to try this out with Datasette. # 1st December 2020, 5:34 pm

Sandboxing and Workload Isolation (via) Fly.io run other people’s code in containers, so workload isolation is a Big Deal for them. This blog post goes deep into the history of isolation and the various different approaches you can take, and fills me with confidence that the team at Fly.io know their stuff. I got to the bottom and found it had been written by Thomas Ptacek, which didn’t surprise me in the slightest. # 30th July 2020, 10:19 pm

GitHub Actions: Manual triggers with workflow_dispatch (via) New GitHub Actions feature which fills a big gap in the offering: you can now create “workflow dispatch” events which provide a button for manually triggering an action—and you can specify extra UI form fields that can customize how that action runs. This turns Actions into an interactive automation engine for any code that can be wrapped in a Docker container. # 7th July 2020, 4:33 am

datasette-publish-fly (via) Fly is a neat new Docker hosting provider with a very tempting pricing model: Just $2.67/month for their smallest always-on instance, and they give each user $10/month in free credit. datasette-publish-fly is the first plugin I’ve written using the publish_subcommand plugin hook, which allows extra hosting providers to be added as publish targets. Install the plugin and you can run “datasette publish fly data.db” to deploy SQLite databases to your Fly account. # 19th March 2020, 3:40 am

Weeknotes: Datasette Cloud and zero downtime deployments

Yesterday’s piece on Deploying a data API using GitHub Actions and Cloud Run was originally intended to be my weeknotes, but ended up getting a bit too involved.

[... 1428 words]

How to do Zero Downtime Deployments of Docker Containers. I’m determined to get reliable zero-downtime deploys working for a new project, because I know from experience that even a few seconds of downtime during a deploy changes the project mentality from “deploy any time you want” to “don’t deploy too often”. I’m using Docker containers behind Traefik, which means new containers should have traffic automatically balanced to them by Traefik based on their labels. After much fiddling around the pattern described by this article worked best for me: it lets me start a new container, then stop the old one and have Traefik’s “retry” mechanism send any requests to the stopped container over to the new one instead. # 16th January 2020, 11:12 pm

Weeknotes: Improv at Stanford, planning Datasette Cloud

Last week was the first week of the quarter at Stanford—which is called “shopping week” here because students are expected to try different classes to see which ones they are going to stick with.

[... 806 words]

2019

Dockerfile for creating a Datasette of NHS dentist information (via) Really neat Dockerfile example by Alf Eaton that uses multi-stage builds to pull dentist information from the NHS, compile to SQLite using csvs-to-sqlite and serve the results with Datasette. TIL the NHS like to use ¬ as their CSV separator! # 26th April 2019, 2:09 pm

Smaller Python Docker Containers with Multi-Stage Builds and Python Wheels (via) Clear tutorial on how to use Docker’s multi-stage build feature to create smaller final images by taking advantage of Python’s wheel format—so an initial stage can install a full compiler toolchain and compile C dependencies into wheels, then a later stage can install those pre-compiled wheels into a slimmer container without including the C compiler. # 26th April 2019, 2:05 pm

Ministry of Silly Runtimes: Vintage Python on Cloud Run (via) Cloud Run is an exciting new hosting service from Google that lets you define a container using a Dockerfile and then run that container in a “scale to zero” environment, so you only pay for time spent serving traffic. It’s similar to the now-deprecated Zeit Now 1.0 which inspired me to create Datasette. Here Dustin Ingram demonstrates how powerful Docker can be as the underlying abstraction by deploying a web app using a 25 year old version of Python 1.x. # 9th April 2019, 5:33 pm

2018

repo2docker (via) Neat tool from the Jupyter project team: run “jupyter-repo2docker https://github.com/norvig/pytudes” and it will pull a GitHub repository, create a new Docker container for it, install Jupyter and launch a Jupyter instance for you to start trying out the library. I’ve been doing this by hand using virtual environments, but using Docker for even cleaner isolation seems like a smart improvement. # 28th November 2018, 10:06 pm

dive (via) Handy command-line tool (as with so much of the Docker ecosystem it’s written in Go, which means you can download a Darwin binary directly from the GitHub releases page and run it directly on your Mac) for visually exploring the different layers of a given Docker image. # 19th November 2018, 4:41 am

Building smaller Python Docker images

Changes are afoot at Zeit Now, my preferred hosting provider for the past year (see previous posts). They have announced Now 2.0, an intriguing new approach to providing auto-scaling immutable deployments. It’s built on top of lambdas, and comes with a whole host of new constraints: code needs to fit into a 5MB bundle for example (though it looks like this restriction will soon be relaxed a littleupdate November 19th you can now bump this up to 50MB).

[... 1872 words]

elasticsearch-dump. Neat open source utility by TaskRabbit for importing and exporting data in bulk from Elasticsearch. It can copy data from one Elasticsearch cluster directly to another or to an intermediary file, making it a swiss-army knife for migrating data around. I successfully used the “docker run” incantation to execute it without needing to worry about having the correct version of Node.js installed. # 9th April 2018, 10:10 pm

rubber-docker/linux.c. rubber-docker is a workshop that talks through building a simply Docker clone from scratch in Python. I particularly liked this detail: linux.c is a Python extension written in C that exposes a small collection of Linux syscalls that are needed for the project—clone, mount, pivot_root, setns, umount, umount2 and unshare. Just reading through this module gives a really nice overview of how some of Docker’s underlying magic actually work. # 2nd April 2018, 6:18 pm

Cloud-first: Rapid webapp deployment using containers (via) The Research Software Engineering group at ICL have written a tutorial on deploying web apps as Docker containers using Azure and they use Datasette as the example application. # 28th March 2018, 3:50 pm

Andrew Godwin’s www-router Docker container (via) Really clever Docker trick: a container that runs Nginx and uses it to route traffic to other containers based on the hostname—but the hostnames to be routed are configured using environment variables which the run-nginx.py CMD script uses to dynamically construct an nginx config when the container starts. # 21st February 2018, 5:04 am

Building a Full-Text Search App Using Docker and Elasticsearch. Deep, comprehensive tutorial from Patrick Triest showing how to use docker-compose to run three containers (Node API, nginx static content, elasticsearch) and then use that to build a neat Vue.js web search UI against 100 books from Project Gutenberg. # 1st February 2018, 3:41 pm

2017

How to compile and run a Pony program using Docker. My notes on using the Docker ponylang/ponyc container to compile and execute a Pony program without needing to install anything (since Docker will download and run the image the first time you run the command). # 18th December 2017, 9:47 pm