Simon Willison’s Weblog

Subscribe

Weeknotes: Fun with Unix domain sockets

13th July 2021

A small enhancement to Datasette this week: I’ve added support for proxying via Unix domain sockets.

This started out as a feature request from Aslak Raanes: #1388: Serve using UNIX domain socket.

I’ve not worked with these much before so it was a good opportunity to learn something new. Unix domain sockets provide a mechanism whereby different processes on a machine can communicate with each over over a mechanism similar to TCP, but via a file path instead.

I’ve encountered these before with the Docker daemon, which listens on path /var/run/docker.sock and can be communicated with using curl like so:

curl --unix-socket /var/run/docker.sock \
  http://localhost/v1.41/containers/json

Plenty more examples in the Docker documentation if you click the ’HTTP’ tab.

It turns out both nginx and Apache have the ability to proxy traffic to a Unix domain socket rather than to an HTTP port, which makes this a useful mechanism for running backend servers without attaching them to TCP ports.

Implementing this in Datasette

Datasette uses the excellent Uvicorn Python web server to serve traffic out of the box, and Uvicorn already includes support for UDS—so adding support to Datasette was pretty easy—here’s the full implementation. I’ve added a new --uds option, so now you can run Datasette like this:

datasette --uds /tmp/datasette.sock fixtures.db

Datasette will “listen” on /tmp/datasette.sock—which means you can run requests via curl like so:

curl --unix-socket /tmp/datasette.sock \
  http://localhost/fixtures.json | jq

More importantly, it means you can configure nginx or Apache to proxy to the Datasette server like this (nginx):

daemon off;
events {
  worker_connections  1024;
}
http {
  server {
    listen 80;
    location / {
      proxy_pass http://datasette;
      proxy_set_header Host $host;
    }
  }
  upstream datasette {
    server unix:/tmp/datasette.sock;
  }
}

Or like this (Apache):

ProxyPass / unix:/tmp/datasette.sock|http://localhost/

Writing tests

The implementation was only a few lines of code (to pass the uds option to Uvicorn) but adding a test proved a little more challenging. I used this pytest fixture to spin up a server process:

@pytest.fixture(scope="session")
def ds_unix_domain_socket_server(tmp_path_factory):
    socket_folder = tmp_path_factory.mktemp("uds")
    uds = str(socket_folder / "datasette.sock")
    ds_proc = subprocess.Popen(
        ["datasette", "--memory", "--uds", uds],
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        cwd=tempfile.gettempdir(),
    )
    # Give the server time to start
    time.sleep(1.5)
    # Check it started successfully
    assert not ds_proc.poll(), ds_proc.stdout.read().decode("utf-8")
    yield ds_proc, uds
    # Shut it down at the end of the pytest session
    ds_proc.terminate()

I use a similar pattern for some other tests, to exercise the --ssl-keyfile and --ssl-certfile options added in #1221.

The test itself looks like this, taking advantage of HTTPX’s ability to make calls against Unix domain sockets:

@pytest.mark.serial
@pytest.mark.skipif(not hasattr(socket, "AF_UNIX"), reason="Requires socket.AF_UNIX support")
def test_serve_unix_domain_socket(ds_unix_domain_socket_server):
    _, uds = ds_unix_domain_socket_server
    transport = httpx.HTTPTransport(uds=uds)
    client = httpx.Client(transport=transport)
    response = client.get("http://localhost/_memory.json")
    assert {
        "database": "_memory",
        "path": "/_memory",
        "tables": [],
    }.items() <= response.json().items()

The skipif decorator avoids running this test on platforms which don’t support Unix domain sockets (which I think includes Windows, see this comment).

The @pytest.mark.serial decorator applies a “mark” that can be used to selectively run the test. I do this because Datasette’s tests run in CI using pytest-xdist, but that’s not compatible with this way of spinning up a temporary server. Datasette actually runs the tests in GitHub Actions like so:

- name: Run tests
  run: |
    pytest -n auto -m "not serial"
    pytest -m "serial"

The pytest -n auto -m "not serial" line runs almost all of the tests using pytest-xdist across an automatically selected number of processes, but skips the ones marked with @pytest.mark.serial. Then the second line runs the remaining serial tests without any additional concurrency.

Documenation and example configuration for this feature can be found in the Running Datasette behind a proxy documentation. Thanks to Aslak for contributing the notes on Apache configuration.

TIL this week

This is Weeknotes: Fun with Unix domain sockets by Simon Willison, posted on 13th July 2021.

Next: Datasette 0.58: The annotated release notes

Previous: Django SQL Dashboard 1.0