Simon Willison’s Weblog

Subscribe

Deploying an asynchronous Python microservice with Sanic and Zeit Now

14th October 2017

Back in 2008 Natalie Downe and I deployed what today we would call a microservice: json-head, a tiny Google App Engine app that allowed you to make an HTTP head request against a URL and get back the HTTP headers as JSON. One of our initial use-scase for this was Natalie’s addSizes.js, an unobtrusive jQuery script that could annotate links to PDFs and other large files with their corresponding file size pulled from the Content-Length header. Another potential use-case is detecting broken links, since the API can be used to spot 404 status codes (as in this example).

At some point in the following decade json-head.appspot.com stopped working. Today I’m bringing it back, mainly as an excuse to try out the combination of Python 3.5 async, the Sanic microframework and Zeit’s brilliant Now deployment platform.

First, a demo. https://json-head.now.sh/?url=https://simonwillison.net/ returns the following:

[
    {
        "ok": true,
        "headers": {
            "Date": "Sat, 14 Oct 2017 18:37:52 GMT",
            "Content-Type": "text/html; charset=utf-8",
            "Connection": "keep-alive",
            "Set-Cookie": "__cfduid=dd0b71b4e89bbaca5b27fa06c0b95af4a1508006272; expires=Sun, 14-Oct-18 18:37:52 GMT; path=/; domain=.simonwillison.net; HttpOnly; Secure",
            "Cache-Control": "s-maxage=200",
            "X-Frame-Options": "SAMEORIGIN",
            "Via": "1.1 vegur",
            "CF-Cache-Status": "HIT",
            "Vary": "Accept-Encoding",
            "Server": "cloudflare-nginx",
            "CF-RAY": "3adca70269a51e8f-SJC",
            "Content-Encoding": "gzip"
        },
        "status": 200,
        "url": "https://simonwillison.net/"
    }
]

Given a URL, json-head.now.sh performs an HTTP HEAD request and returns the resulting status code and the HTTP headers. Results are returned with the Access-Control-Allow-Origin: * header so you can call the API using fetch() or XMLHttpRequest from JavaScript running on any page.

Sanic and Python async/await

A key new feature added to Python 3.5 back in September 2015 was built-in syntactic support for coroutine control via the async/await statements. Python now has some serious credibility as a platform for asynchronous I/O (the concept that got me so excited about Node.js back in 2009). This has lead to an explosion of asynchronous innovation around the Python community.

json-head is the perfect application for async—it’s little more than a dumbed-down HTTP proxy, accepting incoming HTTP requests, making its own requests elsewhere and then returning the results.

Sanic is a Flask-like web framework built specifically to take advantage of async/await in Python 3.5. It’s designed for speed—built on top of uvloop, a Python wrapper for libuv (which itself was originally built to power Node.js). uvloop’s self-selected benchmarks are extremely impressive.

Zeit Now

To host this new microservice, I chose Zeit Now. It’s a truly beautiful piece of software design.

Now lets you treat deployments as immutable. Every time you deploy you get a brand new URL. You can then interact with your deployment directly, or point an existing alias to it if you want a persistent URL for your project.

Deployments are free, and deployed code stays available forever due to some clever engineering behind the scenes.

Best of all: deploying a project takes just a single command: type now and the code in your current directory will be deployed to their cloud and assigned a unique URL.

Now was originally built for Node.js projects, but last August Zeit added Docker support. If the directory you run it in contains a Dockerfile, running now will upload, build and run the corresponding image.

There’s just one thing missing: good examples of how to deploy Python projects to Now using Docker. I’m hoping this article can help fill that gap.

Here’s the complete Dockerfile I’m using for json-head:

FROM python:3
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
EXPOSE 8006
CMD ["python", "json_head.py"]

I’m using the official Docker Python image as a base, copying the current directory into the image, using pip install to install dependencies and then exposing port 8006 (for no reason other than that’s the port I use for local development environment) and running the json_head.py script. Now is smart enough to forward incoming HTTP traffic on port 80 to the port that was exposed by the container.

If you setup Now yourself (npm install -g now or use one of their installers) you can deploy my code directly from GitHub to your own instance with a single command:

$ now simonw/json-head
> Didn't find directory. Searching on GitHub...
> Deploying GitHub repository "simonw/json-head" under simonw
> Ready! https://simonw-json-head-xqkfgorgei.now.sh (copied to clipboard) [1s]
> Initializing…
> Building
> ▲ docker build
Sending build context to Docker daemon 7.168 kBkB
> Step 1 : FROM python:3
> 3: Pulling from library/python
> ... lots more stuff here ...

Initial implementation

Here’s my first working version of json-head using Sanic:

from sanic import Sanic
from sanic import response
import aiohttp

app = Sanic(__name__)

async def head(session, url):
    try:
        async with session.head(url) as response:
            return {
                'ok': True,
                'headers': dict(response.headers),
                'status': response.status,
                'url': url,
            }
    except Exception as e:
        return {
            'ok': False,
            'error': str(e),
            'url': url,
        }

@app.route('/')
async def handle_request(request):
    url = request.args.get('url')
    if url:
        async with aiohttp.ClientSession() as session:
            head_info = await head(session, url)
            return response.json(
                head_info,
                headers={
                    'Access-Control-Allow-Origin': '*'
                },
            )
    else:
        return response.html('Try /?url=xxx')

if __name__ == '__main__':
    app.run(host="0.0.0.0", port=8006)

This exact code is deployed at https://json-head-thlbstmwfi.now.sh/—since Now deployments are free, there’s no reason not to leave work-in-progress examples hosted as throwaway deployments.

In addition to Sanic, I’m also using the handy aiohttp asynchronous HTTP library—which features API design clearly inspired by my all-time favourite HTTP library, requests.

The key new pieces of syntax to understand in the above code are the async and await statements. async def is used to declare a function that acts as a coroutine. Coroutines need to be executed inside an event loop (which Sanic handles for us), but gain the ability to use the await statement.

The await statement is the real magic here: it suspends the current coroutine until the coroutine it is calling has finished executing. It is this that allows us to write asynchronous code without descending into a messy hell of callback functions.

Adding parallel requests

So far we haven’t really taken advantage of what async I/O can do—if every incoming HTTP request results in a single outgoing HTTP response then async may help us scale to serve more incoming requests at once but it’s not really giving us any new functionality.

Executing multiple outbound HTTP requests in parallel is a much more interesting use-case. Let’s add support for multiple ?url= parameters, such as the following:

https://json-head.now.sh/?url=https://simonwillison.net/&url=https://www.google.com/

[
    {
        "ok": true,
        "headers": {
            "Date": "Sat, 14 Oct 2017 19:35:29 GMT",
            "Content-Type": "text/html; charset=utf-8",
            "Connection": "keep-alive",
            "Set-Cookie": "__cfduid=ded486c1faaac166e8ae72a87979c02101508009729; expires=Sun, 14-Oct-18 19:35:29 GMT; path=/; domain=.simonwillison.net; HttpOnly; Secure",
            "Cache-Control": "s-maxage=200",
            "X-Frame-Options": "SAMEORIGIN",
            "Via": "1.1 vegur",
            "CF-Cache-Status": "EXPIRED",
            "Vary": "Accept-Encoding",
            "Server": "cloudflare-nginx",
            "CF-RAY": "3adcfb671c862888-SJC",
            "Content-Encoding": "gzip"
        },
        "status": 200,
        "url": "https://simonwillison.net/"
    },
    {
        "ok": true,
        "headers": {
            "Date": "Sat, 14 Oct 2017 19:35:29 GMT",
            "Expires": "-1",
            "Cache-Control": "private, max-age=0",
            "Content-Type": "text/html; charset=ISO-8859-1",
            "P3P": "CP=\"This is not a P3P policy! See g.co/p3phelp for more info.\"",
            "Content-Encoding": "gzip",
            "Server": "gws",
            "X-XSS-Protection": "1; mode=block",
            "X-Frame-Options": "SAMEORIGIN",
            "Set-Cookie": "1P_JAR=2017-10-14-19; expires=Sat, 21-Oct-2017 19:35:29 GMT; path=/; domain=.google.com",
            "Alt-Svc": "quic=\":443\"; ma=2592000; v=\"39,38,37,35\"",
            "Transfer-Encoding": "chunked"
        },
        "status": 200,
        "url": "https://www.google.com/"
    }
]

We’re now accepting multiple URLs and executing multiple HEAD requests… but Python 3.5 async makes it easy to do this in parallel, so our overall request time should match that of the single longest HEAD request that we triggered.

Here’s an implementation that adds support for multiple, parallel outbound HTTP requests:

@app.route('/')
async def handle_request(request):
    urls = request.args.getlist('url')
    if urls:
        async with aiohttp.ClientSession() as session:
            head_infos = await asyncio.gather(*[
                head(session, url) for url in urls
            ])
            return response.json(
                head_infos,
                headers={'Access-Control-Allow-Origin': '*'},
            )
    else:
        return response.html(INDEX)

We’re using the asyncio module from the Python 3.5 standard library here—in particular the gather function. asyncio.gather takes a list of coroutines and returns a future aggregating their results. This future will resolve (and return to a corresponding await statement) as soon as all of those coroutines have returned their values.

My final code for json-head can be found on GitHub. As I hope I’ve demonstrated, the combination of Python 3.5+, Sanic and Now makes deploying asynchronous Python microservices trivially easy.