Running gunicorn behind nginx on Heroku for buffering and logging
Heroku’s default setup for Django uses the gunicorn application server. Each Heroku dyno can only run a limited number of gunicorn workers, which means a limited number of requests can be served in parallel (around 4 per dyno is a good rule of thumb).
Where things get nasty is when you have devices on slow connections—like mobile phones. Heroku’s router buffers headers but it does not buffer response bodies, so a slow device could hold up a gunicorn worker for several seconds. Too many slow devices at once and the site will become unavailable to other users.
This issue is explained and discussed here: Don’t use Gunicorn to host your Django sites on Heroku
That article recommends using waitress as an alternative to gunicorn, but in the comments at the bottom of the article people suggest using a Heroku nginx-buildpack as an alternative.
Here is a slightly out-of-date tutorial on getting this all set up: https://koed00.github.io/Heroku_setups/
I used the following commands to set up the buildpacks:
heroku stack:set cedar-14 heroku buildpacks:clear heroku buildpacks:add https://github.com/beanieboi/nginx-buildpack.git heroku buildpacks:add https://github.com/heroku/heroku-buildpack-python.git
Unfortunately the nginx buildpack is not yet compatible with the new heroku-16 stack, so until the nginx buildpack has been updated it’s necessary to run the application on the older cedar-14 stack. See this discussion for details: ryandotsmith/nginx-buildpack#68
Adding nginx in this way also gives us the opportunity to fix another limitation of Heroku: its default logging configuration. By default, log lines produced by Heroku (visible using heroku logs --tail or with a logging addon such as Papertrail) look like this:
Oct 01 18:01:06 simonwillisonblog heroku/router: at=info method=GET path="/2017/Oct/1/ship/" host=simonwillison.net request_id=bb22f67e-6924-4e81-b6ad-74d1f465cda7 fwd="2001:8003:74c5:8b00:79e4:80ed:fa85:7b37,220.127.116.11" dyno=web.1 connect=0ms service=338ms status=200 bytes=4523 protocol=http
Notably missing here is both the user-agent string and the referrer header sent by the browser! If you’re a fan of tailing log files these omissions are pretty disappointing.
The nginx buildback I’m using loads a default configuration file at config/nginx.conf.erb. By including my own copy of this file I can override the original and define my own custom log format.
Having applied this change, the new log lines look like this:
2017-10-02T01:44:38.762845+00:00 app[web.1]: measure#nginx.service=0.133 request="GET / HTTP/1.1" status_code=200 request_id=8b6402de-d072-42c4-9854-0f71697b30e5 remote_addr="10.16.227.159" forwarded_for="18.104.22.168" forwarded_proto="http" via="1.1 vegur" body_bytes_sent=12666 referer="-" user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
This blog entry started life as a commit message.