Simon Willison’s Weblog

Subscribe

Adding GeoDjango to an existing Django project

3rd May 2021

Work on VIAL for Vaccinate The States continues.

I talked about matching last week. I’ve been building more features to support figuring out if a newly detected location is already listed or not, with one of the most significant being the ability to search for locations within a radius of a specific point.

I’ve experimented with a PostgreSQL/Django version of the classic cos/sin/radians query for this but if you’re going to do this over a larger dataset it’s worth using a proper spatial index for it—and GeoDjango has provided tools for this since Django 1.0 in 2008!

I have to admit that outside of a few prototypes I’ve never used GeoDjango extensively myself—partly I’ve not had the right project for it, and in the past I’ve also been put off by the difficulty involved in installing all of the components.

That’s a lot easier in 2021 than it was in 2008. But VIAL is a project in-flight, so here are some notes on what it took to get GeoDjango added to an existing Django project.

Alex Vandiver has been working with me on VIAL and helped figure out quite a few of these steps.

Activating PostgreSQL

The first step was to install the PostGIS PostgreSQL extension. This can be achieved using a Django migration:

from django.contrib.postgres.operations import CreateExtension
from django.db import migrations


class Migration(migrations.Migration):

    dependencies = [
        ("my_app", "0108_previous-migration"),
    ]

    operations = [
        CreateExtension("postgis"),
    ]

Most good PostgreSQL hosting already makes this extension available—in our case we are using Google Cloud SQL which supports various extensions, including PostGIS. I use Postgres.app for my personal development environment which bundles PostGIS too.

So far, so painless!

System packages needed by GeoDjango

GeoDjango needs the GEOS, GDAL and PROJ system libraries. Alex added these to our Dockerfile (used for our production deployments) like so:

RUN apt-get update && apt-get install -y \
    binutils \
    gdal-bin \
    libproj-dev \
    && rm -rf /var/lib/apt/lists/*

Adding a point field to a Django model

I already had a Location model, which looked something like this:

class Location(models.Model):
    name = models.CharField()
    # ...
    latitude = models.DecimalField(
        max_digits=9, decimal_places=5
    )
    longitude = models.DecimalField(
        max_digits=9, decimal_places=5
    )

I made three changes to this class: I changed the base class to this:

from django.contrib.gis.db import models as gis_models

class Location(gis_models.Model):
    # ...

I added a point column:

    point = gis_models.PointField(
        blank=True,
        null=True,
        spatial_index=True
    )

And I set up a custom save() method to populate that point field with a point representing the latitude and longitude every time the object was saved:

from django.contrib.gis.geos import Point

# ...

    def save(self, *args, **kwargs):
        # Point is derived from latitude/longitude
        if self.longitude and self.latitude:
            self.point = Point(
                float(self.longitude),
                float(self.latitude),
                srid=4326
            )
        else:
            self.point = None
        super().save(*args, **kwargs)

srid=4326 ensures the point is stored using WGS84—the most common coordinate system for latitude and longitude values across our planet.

Running ./manage.py makemigrations identified the new point Point column and created the corresponding migration for me.

Backfilling the point column with a migration

The .save() method would populate point for changes going forward, but I had 40,000 records that already existed which I needed to backfill. I used this migration to do that:

from django.db import migrations

class Migration(migrations.Migration):

    dependencies = [
        ("core", "0110_location_point"),
    ]

    operations = [
        migrations.RunSQL(
            sql="""
            update location
            set point = ST_SetSRID(
                ST_MakePoint(
                    longitude, latitude
                ),
                4326
            );""",
            reverse_sql=migrations.RunSQL.noop,
        )
    ]

latitude/longitude/radius queries

With the new point column created and populated, here’s the code I wrote to support simple latitude/longitude/radius queries:

from django.contrib.gis.geos import Point
from django.contrib.gis.measure import Distance

def search_locations(request):
    qs = Location.objects.filter(soft_deleted=False)
    latitude = request.GET.get("latitude")
    longitude = request.GET.get("longitude")
    radius = request.GET.get("radius")
    if latitude and longitude and radius:
        # Validate latitude/longitude/radius
        for value in (latitude, longitude, radius):
            try:
                float(value)
            except ValueError:
                return JsonResponse(
                    {"error": "latitude/longitude/radius should be numbers"}, status=400
                )
        qs = qs.filter(
            point__distance_lt=(
                Point(
                    float(longitude),
                    float(latitude)
                ),
                Distance(m=float(radius)),
            )
        )
    # ... return JSON for locations

In writing up these notes I realize that this isn’t actually the best way to do this, because it fails to take advantage of the spatial index on that column! I’ve filed myself an issue to switch to the spatial-index-friendly dwithin instead.

Getting CI to work

The hardest part of all of this turned out to be getting our CI suites to pass.

We run CI in two places at the moment: GitHub Actions and Google Cloud Build (as part of our continuous deployment setup).

The first error I hit was this one:

psycopg2.errors.UndefinedFile: could not open extension control file "/usr/share/postgresql/13/extension/postgis.control": No such file or directory

It turns out that’s what happens when your PostgreSQL server doesn’t have the PostGIS extension available.

Our GitHub Actions configuration started like this:

name: Run tests

on: [push]

jobs:
  test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:13
        env:
          POSTGRES_USER: postgres
          POSTGRES_PASSWORD: postgres
          POSTGRES_DB: vaccinate
        options:
          --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5
        ports:
        - 5432:5432
    steps:

The postgres:13 image doesn’t have PostGIS. Swapping that out for postgis/postgis:13-3.1 fixed that (using this image).

Our Cloud Build configuration included this:

  # Start up a postgres for tests
  - id: "start postgres"
    name: "gcr.io/cloud-builders/docker"
    args:
      - "run"
      - "-d"
      - "--network=cloudbuild"
      - "-e"
      - "POSTGRES_HOST_AUTH_METHOD=trust"
      - "--name"
      - "vaccinate-db"
      - "postgres"

  - id: "test image"
    name: "gcr.io/cloud-builders/docker"
    args:
      - "run"
      - "-t"
      - "--network=cloudbuild"
      - "-e"
      - "DATABASE_URL=postgres://postgres@vaccinate-db:5432/vaccinate"
      - "${_IMAGE_NAME}:latest"
      - "pytest"
      - "-v"

I tried swapping out that last postgres argument for postgis/postgis:13-3.1, like I had with the GitHub Actions one... and it failed with this error instead:

django.db.utils.OperationalError: could not connect to server: Connection refused Is the server running on host "vaccinate-db" (192.168.10.3) and accepting TCP/IP connections on port 5432?

This one stumped me. Eventually Alex figured out the problem: the extra extension meant the PostgreSQL was taking slightly longer to start—something that was covered in our GitHub Actions configuration by the pg_isready line. He added this step to our Cloud Build configuration:

  - id: "wait for postgres"
    name: "jwilder/dockerize"
    args: ["dockerize", "-timeout=60s", "-wait=tcp://vaccinate-db:5432"]

It uses jwilder/dockerize to wait until the database container starts accepting connections on port 5432.

Next steps

Now that we have GeoDjango I’m excited to start exploring new capabilities for our software. One thing in particular that interests me is teaching VIAL to backfill the county for a location based on its latitude and longitude—the US Census provide a shapefile of county polygons which I use with Datasette and SpatiaLite in my simonw/us-counties-datasette project, so I’m confident it would work well using PostGIS instead.

Releases this week

TIL this week

This is Adding GeoDjango to an existing Django project by Simon Willison, posted on 3rd May 2021.

Next: Django SQL Dashboard

Previous: One year of TILs