Weeknotes: Fun with Unix domain sockets
13th July 2021
A small enhancement to Datasette this week: I’ve added support for proxying via Unix domain sockets.
This started out as a feature request from Aslak Raanes: #1388: Serve using UNIX domain socket.
I’ve not worked with these much before so it was a good opportunity to learn something new. Unix domain sockets provide a mechanism whereby different processes on a machine can communicate with each over over a mechanism similar to TCP, but via a file path instead.
I’ve encountered these before with the Docker daemon, which listens on path /var/run/docker.sock
and can be communicated with using curl
like so:
curl --unix-socket /var/run/docker.sock \
http://localhost/v1.41/containers/json
Plenty more examples in the Docker documentation if you click the ’HTTP’ tab.
It turns out both nginx and Apache have the ability to proxy traffic to a Unix domain socket rather than to an HTTP port, which makes this a useful mechanism for running backend servers without attaching them to TCP ports.
Implementing this in Datasette
Datasette uses the excellent Uvicorn Python web server to serve traffic out of the box, and Uvicorn already includes support for UDS—so adding support to Datasette was pretty easy—here’s the full implementation. I’ve added a new --uds
option, so now you can run Datasette like this:
datasette --uds /tmp/datasette.sock fixtures.db
Datasette will “listen” on /tmp/datasette.sock
—which means you can run requests via curl
like so:
curl --unix-socket /tmp/datasette.sock \
http://localhost/fixtures.json | jq
More importantly, it means you can configure nginx or Apache to proxy to the Datasette server like this (nginx):
daemon off;
events {
worker_connections 1024;
}
http {
server {
listen 80;
location / {
proxy_pass http://datasette;
proxy_set_header Host $host;
}
}
upstream datasette {
server unix:/tmp/datasette.sock;
}
}
Or like this (Apache):
ProxyPass / unix:/tmp/datasette.sock|http://localhost/
Writing tests
The implementation was only a few lines of code (to pass the uds
option to Uvicorn) but adding a test proved a little more challenging. I used this pytest fixture to spin up a server process:
@pytest.fixture(scope="session") def ds_unix_domain_socket_server(tmp_path_factory): socket_folder = tmp_path_factory.mktemp("uds") uds = str(socket_folder / "datasette.sock") ds_proc = subprocess.Popen( ["datasette", "--memory", "--uds", uds], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, cwd=tempfile.gettempdir(), ) # Give the server time to start time.sleep(1.5) # Check it started successfully assert not ds_proc.poll(), ds_proc.stdout.read().decode("utf-8") yield ds_proc, uds # Shut it down at the end of the pytest session ds_proc.terminate()
I use a similar pattern for some other tests, to exercise the --ssl-keyfile
and --ssl-certfile
options added in #1221.
The test itself looks like this, taking advantage of HTTPX’s ability to make calls against Unix domain sockets:
@pytest.mark.serial @pytest.mark.skipif(not hasattr(socket, "AF_UNIX"), reason="Requires socket.AF_UNIX support") def test_serve_unix_domain_socket(ds_unix_domain_socket_server): _, uds = ds_unix_domain_socket_server transport = httpx.HTTPTransport(uds=uds) client = httpx.Client(transport=transport) response = client.get("http://localhost/_memory.json") assert { "database": "_memory", "path": "/_memory", "tables": [], }.items() <= response.json().items()
The skipif
decorator avoids running this test on platforms which don’t support Unix domain sockets (which I think includes Windows, see this comment).
The @pytest.mark.serial
decorator applies a “mark” that can be used to selectively run the test. I do this because Datasette’s tests run in CI using pytest-xdist, but that’s not compatible with this way of spinning up a temporary server. Datasette actually runs the tests in GitHub Actions like so:
- name: Run tests
run: |
pytest -n auto -m "not serial"
pytest -m "serial"
The pytest -n auto -m "not serial"
line runs almost all of the tests using pytest-xdist
across an automatically selected number of processes, but skips the ones marked with @pytest.mark.serial
. Then the second line runs the remaining serial tests without any additional concurrency.
Documenation and example configuration for this feature can be found in the Running Datasette behind a proxy documentation. Thanks to Aslak for contributing the notes on Apache configuration.
TIL this week
More recent articles
- Weeknotes: Embeddings, more embeddings and Datasette Cloud - 17th September 2023
- Build an image search engine with llm-clip, chat with models with llm chat - 12th September 2023
- LLM now provides tools for working with embeddings - 4th September 2023
- Datasette 1.0a4 and 1.0a5, plus weeknotes - 30th August 2023
- Making Large Language Models work for you - 27th August 2023
- Datasette Cloud, Datasette 1.0a3, llm-mlc and more - 16th August 2023
- How I make annotated presentations - 6th August 2023
- Weeknotes: Plugins for LLM, sqlite-utils and Datasette - 5th August 2023
- Catching up on the weird world of LLMs - 3rd August 2023
- Run Llama 2 on your own Mac using LLM and Homebrew - 1st August 2023