Running gunicorn behind nginx on Heroku for buffering and logging
2nd October 2017
Heroku’s default setup for Django uses the gunicorn application server. Each Heroku dyno can only run a limited number of gunicorn workers, which means a limited number of requests can be served in parallel (around 4 per dyno is a good rule of thumb).
Where things get nasty is when you have devices on slow connections—like mobile phones. Heroku’s router buffers headers but it does not buffer response bodies, so a slow device could hold up a gunicorn worker for several seconds. Too many slow devices at once and the site will become unavailable to other users.
This issue is explained and discussed here: Don’t use Gunicorn to host your Django sites on Heroku
That article recommends using waitress as an alternative to gunicorn, but in the comments at the bottom of the article people suggest using a Heroku nginx-buildpack as an alternative.
Here is a slightly out-of-date tutorial on getting this all set up: https://koed00.github.io/Heroku_setups/
I used the following commands to set up the buildpacks:
heroku stack:set cedar-14
heroku buildpacks:clear
heroku buildpacks:add https://github.com/beanieboi/nginx-buildpack.git
heroku buildpacks:add https://github.com/heroku/heroku-buildpack-python.git
Unfortunately the nginx buildpack is not yet compatible with the new heroku-16 stack, so until the nginx buildpack has been updated it’s necessary to run the application on the older cedar-14 stack. See this discussion for details: ryandotsmith/nginx-buildpack#68
Adding nginx in this way also gives us the opportunity to fix another limitation of Heroku: its default logging configuration. By default, log lines produced by Heroku (visible using heroku logs --tail or with a logging addon such as Papertrail) look like this:
Oct 01 18:01:06 simonwillisonblog heroku/router: at=info
method=GET path="/2017/Oct/1/ship/" host=simonwillison.net
request_id=bb22f67e-6924-4e81-b6ad-74d1f465cda7
fwd="2001:8003:74c5:8b00:79e4:80ed:fa85:7b37,108.162.249.198"
dyno=web.1 connect=0ms service=338ms status=200 bytes=4523 protocol=http
Notably missing here is both the user-agent string and the referrer header sent by the browser! If you’re a fan of tailing log files these omissions are pretty disappointing.
The nginx buildback I’m using loads a default configuration file at config/nginx.conf.erb. By including my own copy of this file I can override the original and define my own custom log format.
Having applied this change, the new log lines look like this:
2017-10-02T01:44:38.762845+00:00 app[web.1]:
measure#nginx.service=0.133 request="GET / HTTP/1.1" status_code=200
request_id=8b6402de-d072-42c4-9854-0f71697b30e5 remote_addr="10.16.227.159"
forwarded_for="199.188.193.220" forwarded_proto="http" via="1.1 vegur"
body_bytes_sent=12666 referer="-" user_agent="Mozilla/5.0 (Macintosh;
Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/61.0.3163.100 Safari/537.36"
This blog entry started life as a commit message.
More recent articles
- Weeknotes: Embeddings, more embeddings and Datasette Cloud - 17th September 2023
- Build an image search engine with llm-clip, chat with models with llm chat - 12th September 2023
- LLM now provides tools for working with embeddings - 4th September 2023
- Datasette 1.0a4 and 1.0a5, plus weeknotes - 30th August 2023
- Making Large Language Models work for you - 27th August 2023
- Datasette Cloud, Datasette 1.0a3, llm-mlc and more - 16th August 2023
- How I make annotated presentations - 6th August 2023
- Weeknotes: Plugins for LLM, sqlite-utils and Datasette - 5th August 2023
- Catching up on the weird world of LLMs - 3rd August 2023
- Run Llama 2 on your own Mac using LLM and Homebrew - 1st August 2023