Weeknotes: Fun with Unix domain sockets
13th July 2021
A small enhancement to Datasette this week: I’ve added support for proxying via Unix domain sockets.
This started out as a feature request from Aslak Raanes: #1388: Serve using UNIX domain socket.
I’ve not worked with these much before so it was a good opportunity to learn something new. Unix domain sockets provide a mechanism whereby different processes on a machine can communicate with each over over a mechanism similar to TCP, but via a file path instead.
I’ve encountered these before with the Docker daemon, which listens on path /var/run/docker.sock
and can be communicated with using curl
like so:
curl --unix-socket /var/run/docker.sock \
http://localhost/v1.41/containers/json
Plenty more examples in the Docker documentation if you click the ’HTTP’ tab.
It turns out both nginx and Apache have the ability to proxy traffic to a Unix domain socket rather than to an HTTP port, which makes this a useful mechanism for running backend servers without attaching them to TCP ports.
Implementing this in Datasette
Datasette uses the excellent Uvicorn Python web server to serve traffic out of the box, and Uvicorn already includes support for UDS—so adding support to Datasette was pretty easy—here’s the full implementation. I’ve added a new --uds
option, so now you can run Datasette like this:
datasette --uds /tmp/datasette.sock fixtures.db
Datasette will “listen” on /tmp/datasette.sock
—which means you can run requests via curl
like so:
curl --unix-socket /tmp/datasette.sock \
http://localhost/fixtures.json | jq
More importantly, it means you can configure nginx or Apache to proxy to the Datasette server like this (nginx):
daemon off;
events {
worker_connections 1024;
}
http {
server {
listen 80;
location / {
proxy_pass http://datasette;
proxy_set_header Host $host;
}
}
upstream datasette {
server unix:/tmp/datasette.sock;
}
}
Or like this (Apache):
ProxyPass / unix:/tmp/datasette.sock|http://localhost/
Writing tests
The implementation was only a few lines of code (to pass the uds
option to Uvicorn) but adding a test proved a little more challenging. I used this pytest fixture to spin up a server process:
@pytest.fixture(scope="session") def ds_unix_domain_socket_server(tmp_path_factory): socket_folder = tmp_path_factory.mktemp("uds") uds = str(socket_folder / "datasette.sock") ds_proc = subprocess.Popen( ["datasette", "--memory", "--uds", uds], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, cwd=tempfile.gettempdir(), ) # Give the server time to start time.sleep(1.5) # Check it started successfully assert not ds_proc.poll(), ds_proc.stdout.read().decode("utf-8") yield ds_proc, uds # Shut it down at the end of the pytest session ds_proc.terminate()
I use a similar pattern for some other tests, to exercise the --ssl-keyfile
and --ssl-certfile
options added in #1221.
The test itself looks like this, taking advantage of HTTPX’s ability to make calls against Unix domain sockets:
@pytest.mark.serial @pytest.mark.skipif(not hasattr(socket, "AF_UNIX"), reason="Requires socket.AF_UNIX support") def test_serve_unix_domain_socket(ds_unix_domain_socket_server): _, uds = ds_unix_domain_socket_server transport = httpx.HTTPTransport(uds=uds) client = httpx.Client(transport=transport) response = client.get("http://localhost/_memory.json") assert { "database": "_memory", "path": "/_memory", "tables": [], }.items() <= response.json().items()
The skipif
decorator avoids running this test on platforms which don’t support Unix domain sockets (which I think includes Windows, see this comment).
The @pytest.mark.serial
decorator applies a “mark” that can be used to selectively run the test. I do this because Datasette’s tests run in CI using pytest-xdist, but that’s not compatible with this way of spinning up a temporary server. Datasette actually runs the tests in GitHub Actions like so:
- name: Run tests
run: |
pytest -n auto -m "not serial"
pytest -m "serial"
The pytest -n auto -m "not serial"
line runs almost all of the tests using pytest-xdist
across an automatically selected number of processes, but skips the ones marked with @pytest.mark.serial
. Then the second line runs the remaining serial tests without any additional concurrency.
Documenation and example configuration for this feature can be found in the Running Datasette behind a proxy documentation. Thanks to Aslak for contributing the notes on Apache configuration.
TIL this week
More recent articles
- Gemini 2.0 Flash: An outstanding multi-modal LLM with a sci-fi streaming mode - 11th December 2024
- ChatGPT Canvas can make API requests now, but it's complicated - 10th December 2024
- I can now run a GPT-4 class model on my laptop - 9th December 2024