Simon Willison’s Weblog

On genetics, opensource, csv, postgresql, aws, ...


Recent entries

Weeknotes: PG&E outages, and Open Source works! two days ago

My big focus this week was the PG&E outages project. I’m really pleased with how this turned out: the San Francisco Chronicle used data from it for their excellent PG&E outage interactive (mixing in data on wind conditions) and it earned a bunch of interest on Twitter and some discussion on Hacker News.

I gave a talk about the project on Thursday for the Online News Association Silicon Valley meetup and made connections with a bunch of interesting journalists from around the Bay Area.

Open Source works!

A highlight of last week was that a bunch of my projects gained new functionality through open source contributions!


I started a very basic website for my Dogsheep personal analytics project.

I also started running various Dogsheep tools via cron on my personal Dogsheep server, to keep that set of databases automatically updated with my latest activity on various services.

Most excitingly, Tobias Kunze built the first indepedent Dogsheep-style tool: goodreads-to-sqlite!

As the name suggests, it imports your data from Goodreads into a SQLite database. It inspired me to create a Goodreads account which I will be using to track my book reading activity from now on.

Tobias wrote a fantastic blog post introducing the tool which includes some neat example queries and graphs.

In other Dogsheep news, I added an issue-comments command to github-to-sqlite for fetching all issue comments in a repo. My goal is to evolve that tool to the point where it can import all relevant data from all of my repositories and give me a single Datasette-powered dashboard for keeping track of everything in one place.

Tracking PG&E outages by scraping to a git repo five days ago

PG&E have cut off power to several million people in northern California, supposedly as a precaution against wildfires.

As it happens, I’ve been scraping and recording PG&E’s outage data every 10 minutes for the past 4+ months. This data got really interesting over the past two days!

The original data lives in a GitHub repo (more importantly in the commit history of that repo).

Reading JSON in a Git repo isn’t particularly productive, so this afternoon I figured out how to transform that data into a SQLite database and publish it with Datasette.

The result is

The data model: outages and snapshots

The three key tables to understand are outages, snapshots and outage_snapshots.

PG&E assign an outage ID to every outage—where an outage is usually something that affects a few dozen customers. I store these in the outages table.

Every 10 minutes I grab a snapshot of their full JSON file, which reports every single outage that is currently ongoing. I store a record of when I grabbed that snapshot in the snapshots table.

The most interesting table is outage_snapshots. Every time I see an outage in the JSON feed, I record a new copy of its data as an outage_snapshot row. This allows me to reconstruct the full history of any outage, in 10 minute increments.

Here are all of the outages that were represented in snapshot 1269—captured at 4:10pm Pacific Time today.

I can run select sum(estCustAffected) from outage_snapshots where snapshot = 1269 (try it here) to count up the total PG&E estimate of the number of affected customers—it’s 545,706!

I’ve installed datasette-vega which means I can render graphs. Here’s my first attempt at a graph showing the number of estimated customers affected over time.

(I don’t know why there’s a dip towards the end of the graph).

I also defined a SQL view which shows all of the outages from the most recently captured snapshot (usually within the past 10 minutes if the PG&E website hasn’t gone down) and renders them using datasette-cluster-map.

Things to be aware of

There are a huge amount of unanswered questions about this data. I’ve just been looking at PG&E’s JSON and making guesses about what things like estCustAffected means. Without official documentation we can only guess as to how accurate this data is, or how it should be interpreted.

Some things to question:

  • What’s the quality of this data? Does it reflect accurately on what’s actually going on out there?
  • What’s the exact meaning of the different columns—estCustAffected, currentEtor, autoEtor, hazardFlag etc?
  • Various columns (lastUpdateTime, currentEtor, autoEtor) appear to be integer unix timestamps. What timezone were they recorded in? Do they include DST etc?

How it works

I originally wrote the scraper back in October 2017 during the North Bay fires, and moved it to run on Circle CI based on my work building a commit history of San Francisco’s trees.

It’s pretty simple: every 10 minutes a Circle CI job runs which scrapes the JSON feed that powers the PG&E website’s outage map.

The JSON is then committed to my pge-outages GitHub repository, over-writing the existing pge-outages.json file. There’s some code that attempts to generate a human-readable commit message, but the historic data itself is saved in the commit history of that single file.

Building the Datasette

The hardest part of this project was figuring out how to turn a GitHub commit history of changes to a JSON file into a SQLite database for use with Datasette.

After a bunch of prototyping in a Jupyter notebook, I ended up with the schema described above.

The code that generates the database can be found in I used GitPython to read data from the git repository and my sqlite-utils library to create and update the database.


Since this is a large database that changes every ten minutes, I couldn’t use the usual datasette publish trick of packaging it up and re-deploying it to a serverless host (Cloud Run or Heroku or Zeit Now) every time it updates.

Instead, I’m running it on a VPS instance. I ended up trying out Digital Ocean for this, after an enjoyable Twitter conversation about good options for stateful (as opposed to stateless) hosting.

Next steps

I’m putting this out there and sharing it with the California News Nerd community in the hope that people can find interesting stories in there and help firm up my methodology—or take what I’ve done and spin up much more interesting forks of it.

If you build something interesting with this please let me know, via email (swillison is my Gmail) or on Twitter.

Weeknotes: Dogsheep nine days ago

Having figured out my Stanford schedule, this week I started getting back into the habit of writing some code.


Dogsheep is the collective name I’ve given to a suite of tools I’m building around the concept of personal analytics.

I generate a lot of data, and while much of it ends up in the silos of the internet giants, thanks to the GDPR most of those silos now feature an “export a copy of your data” button.

Wouldn’t it be cool if you could convert that data into a SQLite database and then use Datasette to run queries against it?

So that’s what I’m doing! The tools I’ve built so far include:

  • healthkit-to-sqlite for my Apple HealthKit data (mostly collected by my Apple Watch).
  • twitter-to-sqlite, by far the most developed tool. For Dogsheep purposes it lets me import my tweets and the tweets I have favourited, but it’s growing all kinds of other useful features for retrieving and analyzing data from Twitter. More on this in my previous weeknotes.
  • swarm-to-sqlite for my Foursquare Swarm checkins.
  • inaturalist-to-sqlite for my iNaturalist observations.
  • google-takeout-to-sqlite for the wealth of data available from Google Takeout. I’ve barely scratched the surface with this one but it does have the ability to export my location history from Google Maps—215,000 latitude/longitude/timestamp records dating back to select min(timestamp) from location_history July 2015!
  • github-to-sqlite for importing my GitHub repositories and the repositories I have starred.
  • pocket-to-sqlite (built over this weekend) for articles I have saved to Pocket.
  • genome-to-sqlite for my 23andMe genome, because being able to run SQL queries against my genetic code is really funny.

My biggest achievement this week is that my own personal Dogsheep is no longer vaporware! I got a Datasette instance running on a AWS Lightsail VPS (using the systemd recipe from this ticket) and started loading in copies of my data.

Securing this instance well is important. In addition to running it behind datasette-auth-github I’ve followed up on a long-term ambition of deploying something protected by client certificates. If your browser doesn’t have the required certificate installed it gets rejected by nginx before it’s even proxied through to Datasette.

I have the certificate successfully installed on my iPhone and my laptop browsers. Setting this all up wasn’t particularly straight-forward—I ended up following this excellent tutorial by Nathan Wittstock for the nginx part of it, then randomly Googling for tips on installing the certificate in Firefox and on my iPhone.

I think there’s a great opportunity for tooling that makes this much easier. Adding that to my maybe-someday pile.

datasette-auth-github asset caching

Dogsheep is a great opportunity for eating my own dogfood. I’ve been having a lot of fun using it to explore my data... but I noticed that there was a suspicous delay every time I clicked a link. Datasette is mostly JavaScript-free but usually it responds so fast (~100ms or less) that clicking around feels “app-like”—but here I was really feeling the delay, often of more than a second.

I fired up the Firefox Network inspector and quickly spotted what was happening. I have the datasette-vega plugin installed so I can create charts, and that weighs in at nearly 1MB of JavaScript. According to Firefox my browser was loading that entire 1M file from scratch for every page load!

The culprit there turned out to be my datasette-auth-github plugin.

As detailed in issue #6, the ASGI middleware adds a cache-control: private header to every HTTP response—to make sure authentication-required content isn’t accidentally cached by any intermediary proxies (I originally designed Datasette to run effectively behind Varnish and Cloudflare).

This was applied to everything... including the static JavaScript and CSS served up by the plugin!

To fix this, I shipped a new release of datasette-auth-github with a cacheable_prefixes setting, which defaults to /-/static/ and /-/static-plugins/ when run as a Datasette plugin. See issue 47 for details.

Areas of research

Connection pooling

My most significant ongoing project for Datasette right now concerns database connection pooling.

For Datasette Library I want to be able to support potentially hundreds of attached databases, and Datasette Edit calls for the ability to use a writable (as opposed to read-only) connection.

This means I need to get smarter about database connections, and last week I finally made some solid progress on figuring out more advanced connection pooling. Still more work to go but I’m finally feeling unblocked on this after sweating over it for a couple of months without writing any code.

I also added a /-/threads page to Datasette (demo) to make it easier to understand the threads being run by the application.

Recursive CTEs in SQLite

twitter-to-sqlite generates a tweets table with both id and in_reply_to_status_id (nullable) columns.

SQLite supports recursive CTEs, documented here. In theory, this means it should be possible to write SQL queries that analyze Twitter threads—showing me the tweets that are at the bottom of the longest reply chain, for example.

I was feeling lazy and asked if anyone knew how to do that on Twitter. Robin Houston, Matthew Somerville and Piers Cawley all chipped in with useful tips, which I’ve collected in this gist.

I particularly enjoyed learning how Piers uses recursive SQL queries to help run his bakery.


I finally made some progress on getting my head around the Python standard library logging module, a long-time nemesis.

The breakthrough was to sit down and actually read the source code—which helped me understand that the entire point of the library is to be a globally configured logging system that makes careful use of threading primitives to ensure you can log without worrying about concurrency.

I’ve been worrying about that aspect of logging for years, so it’s reassuring to see that Vinay Sajip has that comprehensively covered already.

Easier installation for Datasette

I want journalists to be able to install Datasette. I don’t want to have to talk them through installing xcode to install homebrew to install Python 3 first!

Raffaele Messuti did some great work investigating pyinstaller for this a while ago. I started a conversation on Twitter a few days ago after investigating pex (neat but not quite what I want, since users still have to install Python). From that thread I got some interesting further leads:

  • Freezing your code in the The Hitchhiker’s Guide to Python describes the problem space in general.
  • Briefcase is the part of the BeeWare suite that deals with for converting a Python project into a standalone native application (for a bunch of different platforms).
  • PyOxidizer is a relatively new tool for building standalone executables that bundle a Python interpreter, on top of the Rust and Cargo build ecosystem.
  • Conda Constructor helps you construct an installer for a group of Conda packages (I need to figure out Conda).

Next step: actively try some of these out and see what I can do with them.

My longer term goal is to package Datasette up with Electron and make it available as a regular Windows and OS X application, for people who aren’t comfortable using the command-line directly.


I’ve been mostly working through required reading for my Stanford courses. Strategic Communication in particular has some excellent text books:

Weeknotes: first week of Stanford classes 16 days ago

One of the benefits of the JSK fellowship is that I can take classes and lectures at Stanford, on a somewhat ad-hoc basis (I don’t take exams or earn credits).

With thousands of courses to chose from, figuring out how best to take advantage of this isn’t at all easy—especially since I want to spend a big portion of my time focusing on my fellowship project.

This week was the first week of classes, which Stanford calls “shopping week”—because students are encouraged to try out lots of different things and literally walk out half way through a lecture if they decide it’s not for them! Feels really rude to me, but apparently that’s how it works here.

For this term I’ve settled on four classes:

  • Strategic Communications, at the Stanford Graduate School of Business. This is an extremely highly regarded course on public speaking and effective written communication. As you might expect from a class on public speaking the lectures themselves have been case studies in how to communicate well. I’ve given dozens of conference talks and I’m already learning a huge amount from this that will help me perform better in the future.
  • Classical Guitar. I’m taking this with three other fellows. It turns out my cheap acoustic guitar (bought on an impulse a couple of years ago from Amazon Prime Now) isn’t the correct instrument for this class (Classical Guitars are nylon stringed and a different shape) but the instructor thinks it will be fine for the moment. Great opportunity to do something musical!
  • Biostatistics. I want to firm up my fundamental knowledge of statistics, and I figured learning it from the biology department would be much more interesting than the corresponding maths or computer science classes.
  • Media Innovation. This is a lunchtime series of guest lectures from different professionals in different parts of the media industry. As such it doesn’t have much homework (wow, Stanford courses have a lot of homework) which makes it a good fit for my schedule, and the variety of speakers look to be really informative.

Combined with the JSK afternoon sessions on Monday, Wednesday and Friday I’ll be on campus every weekday, which will hopefully help me build a schedule that incorporates plenty of useful conversations with people about my project, plus actual time to get some code written.

… what with all the shopping for classes, I wrote almost no code at all this week!

I did some experimentation with structlog—I have an unfinished module which can write structlog entries to a SQLite database using sqlite-utils (here’s a Gist) and I’ve been messing around with Python threads in a Jupyter notebook as part of ongoing research into smarter connection pooling for Datasette but aside from that I’ve been concentrating on figuring out Stanford.


Stanford classes come with all sorts of required reading, but I’ve also made some progress on Just Enough Research by Erika Hall (mentioned last week). I’m about half way through and it’s fantastic—really fun to read and packed with useful tips on getting the most out of user interviews and associated techniques. Hopefully I’ll get to start putting it into practice next week!

Weeknotes: Design thinking for journalists, genome-to-sqlite, datasette-atom 26 days ago

I haven’t had much time for code this week: we’ve had a full five day workshop at JSK with Tran Ha (a JSK alumni) learning how to apply Design Thinking to our fellowship projects and generally to challenges facing journalism.

I’ve used aspects of design thinking in building software products, but I’d never really thought about how it could be applied outside of digital product design. It’s been really interesting—especially seeing the other fellows (who, unlike me, are generally not planning to build software during their fellowship) start to apply it to a much wider and more interesting range of problems.

I’ve been commuting in to Stanford on the Caltrain, which did give me a bit of time to work on some code.


I’m continuing to build out a family of tools for personal analytics, where my principle goal is to reclaim the data that various internet companies have collected about me and pull it into a local SQLite database so I can analyze, visualize and generally and have fun with it.

A few years ago I shared my DNA with 23andMe. I don’t think I’d make the decision to do that today: it’s incredibly personal data, and the horror stories about people making unpleasant discoveries about their family trees keep on building. But since I’ve done it, I decided to see if I could extract out some data…

… and it turns out they let you download your entire genome! You can export it as a zipped up TSV file—mine decompresses to 15MB of data (which feels a little small—I know little about genetics, but I’m presuming that’s because the genome they record and share is just the interesting known genetic markers, not the entire DNA sequence—UPDATE: confirmed, thanks @laurencerowe).

So I wrote a quick utility, genome-to-sqlite, which loads the TSV file (directly from the zip or a file you’ve already extracted) and writes it to a simple SQLite table. Load it into Datasette and you can even facet by chromosome, which is exciting!

This is where my knowledge runs out. I’m confident someone with more insight than me could construct some interesting SQL queries against this—maybe one that determines if you are likely to have red hair?—so I’m hoping someone will step in and provide a few examples.

I filed a help wanted issue on GitHub. I also put a request out on Twitter for an UPDATE statement that could turn me into a dinosaur.


This is very much a work-in-progress right now: datasette-atom will be a Datasette plugin that adds .atom as an output format (using the register_output_renderer plugin hook contributed by Russ Garrett a few months ago.

The aim is to allow people to subscribe to the output of a query in their feed reader (and potentially through that via email and other mechanisms)—particularly important for databases which are being updated over time.

It’s a slightly tricky plugin to design because valid Atom feed entries require a globally unique ID, a title and an “updated” date—and not all SQL queries produce obvious candidates for these values. As such, I’m going to have the plugin prompt the user for those fields and then persist them in the feed URL that you subscribe to.

This also means you won’t be able to generate an Atom feed for a query that doesn’t return at least one datetime column. I think I’m OK with that.


I released one new feature for github-to-sqlite this week: the github-to-sqlite repos github.db command, which populates a database table of all of the repositories available to the authenticated user. Or use github-to-sqlite repos github.db dogsheep to pull the repos owned by a specific user or organization.

The command configures a SQLite full-text search index against the repo titles and descriptions, so if you have a lot of GitHub repos (I somehow have nearly 300!) you can search through them and use Datasette to facet them against different properties.

github-to-sqlite currently has two other useful subcommands: starred fetches details of every repository a user has starred, and issues pulls details of the issues (but sadly not yet their comment threads) attached to a repository.


I’m trying to spend more time reading books—so I’m going to start including book stuff in my weeknotes in the hope of keeping myself on track.

I acquired two new books this week:

  • Just Enough Research by Erika Hall (recommended by Tom Coates and Tran Ha), because I need to spent the next few months interviewing as many journalists (and other project stakeholders) as possible to ensure I am solving the right problems for them.
  • Producing Open Source Software by Karl Fogel, because my fellowship goal is to build a thriving open source ecosystem around tooling for data journalism and this book looks like it covers a lot of the topics I need to really do a good job of that.

Next step: actually read them! Hopefully I’ll have some notes to share in next week’s update.

Weeknotes: ONA19, twitter-to-sqlite, datasette-rure one month ago

I’ve decided to start writing weeknotes for the duration of my JSK fellowship. Here goes!

I started the fellowship last week, and this week I’ve been attending the Online News Association conference in New Orleans along with the other fellows.

Natalie and I have had a lot of fun exploring New Orleans, including several niche museums (documented in this Twitter thread, which I expect to continue adding to for years to come).

ONA is the largest digital news conference on the calendar, and as such it’s a great opportunity to meet all kinds of different journalists and get a start on figuring out how best to approach the fellowship.


I’ve been working on this tool for a couple of weeks now, but this week it really started to prove its worth. twitter-to-sqlite is a utility I’m building to fetch data from Twitter and write it into a SQLite database. It has a comprehensive README, as do all of my recent projects: I’ve been doing this for long enough that I know that time spent on documentation will pay me back enormously in just a few months.

My initial goal for this tool was to use it for personal analytics, partly inspired by my Twitter followers project from last year. Then ONA sent out a spreadsheet with the Twitter names of most of the conference attendees and I realised there was an opportunity to use it for something more interesting.

I won’t share the attendee list here because it’s intended to be private to conference attendees, but the short version of what I did with it is this:

First, load the attendee list into a database using csvs-to-sqlite (after first converting the XLS file to CSV using Numbers):

$ csvs-to-sqlite attendees.csv ona.db

Next, fetch the full Twitter profiles for every one of those attendees. I added a --sql option to twitter-to-sqlite for this (issue #8) and ran the following:

$ twitter-to-sqlite users-lookup ona.db --sql="select Twitter from attendees"

Now I can run datasette ona.db and execute full-text searches against the complete Twitter profiles of those attendees—great for figuring out e.g. who is attending the conference from the Washington Post.

I took this a step further: can I identify people I follow on Twitter who are attending the conference? I can pull the list of IDs of people I follow like so:

$ twitter-to-sqlite friends-ids ona.db simonw

Now I can see which of my Twitter friends are attending the conference by loading it into Datasette and using this query:

select * from users
where screen_name in (select Twitter from attendees)
and id in (select followed_id from following where follower_id = 12497)

This is not a particularly efficient SQL query… and it doesn’t matter! The great thing about working against tiny SQLite databases that contain just a few thousand rows of data is that you can nest SQL queries like this with little concern for their performance—this one runs on my laptop in 18.213ms.


SQLite has a REGEXP operator, which is documented thus:

The REGEXP operator is a special syntax for the regexp() user function. No regexp() user function is defined by default and so use of the REGEXP operator will normally result in an error message. If an application-defined SQL function named “regexp” is added at run-time, then the “X REGEXP Y” operator will be implemented as a call to “regexp(Y,X)”.

Python’s sqlite3 module lets you register custom functions, so this should be an easy fix. I’ve been holding off on implementing this in Datasette for a while though because Datasette allows user-provided queries, and Python’s regular expressions have a nasty quality: you can easily construct a regular expression and input that will hang the Python interpreter.

So I needed a safe regular expression library that I could trust not to hang my intepreter on the wrong input.

Google have such a library—re2—but the various Python bindings for it aren’t cleanly installable via pip install.

I asked about this on Twitter and Alex Willmer pointed me to rure-python—a Python wrapper for the Rust regular expression library, which “guarantees linear time searching using finite automata”—exactly what I was looking for!

So I built a quick Datasette plugin: datasette-rure, which provides a regexp() function to enable the REGEXP oaperator, powered by the Rust regular expression engine.

I also added regexp_match(pattern, text, index) and regexp_matches(pattern, text) functions. Here are their examples from the README:

select regexp_match('.*( and .*)', title) as n from articles where n is not null
-- Returns the ' and X' component of any matching titles, e.g.
--     and Recognition
--     and Transitions Their Place
-- etc
select regexp_matches(
    'hello (?P<name>\w+) the (?P<species>\w+)',
    'hello bob the dog, hello maggie the cat, hello tarquin the otter'
-- Returns a JSON array:
-- [{"name": "bob", "species": "dog"},
--  {"name": "maggie", "species": "cat"},
--  {"name": "tarquin", "species": "otter"}]

Returning JSON from a SQLite custom function is a fun trick: it means you can then loop over the returned rows or even join them against other tables within SQL using the bundled json_each() function.

I deployed an interactive demo of Datasette running the plugin. I’ve been trying to do this for all of the plugins I release—datasette-jq is another recent example (the interactive demos are linked from the README).



  • 2018 Central Park Squirrel Census in Datasette (via) The Squirrel Census project released their data! 3,000 squirrel observations in Central Park, each with fur color and latitude and longitude and behavioral observations. I love this data so much. I’ve loaded it into a Datasette running on Glitch. #

14th October 2019

  • μPlot (via) “An exceptionally fast, tiny time series chart. [...] from a cold start it can create an interactive chart containing 150,000 data points in 40ms. [...] at < 10 KB, it’s likely the smallest and fastest time series plotter that doesn’t make use of WebGL shaders or WASM” #
  • goodreads-to-sqlite (via) This is so cool! Tobias Kunze built a Python CLI tool to import your Goodreads data into a SQLite database, inspired by github-to-sqlite and my various other Dogsheep tools. It’s the first Dogsheep style tool I’ve seen that wasn’t built by me—and Tobias’ write-up includes some neat examples of queries you can run against your Goodreads data. I’ve now started using Goodreads and I’m importing my books into my own private Dogsheep Datasette instance. #

7th October 2019

  • SQL Murder Mystery in Datasette (via) “A crime has taken place and the detective needs your help. The detective gave you the  crime scene report, but you somehow lost it. You vaguely remember that the crime  was a murder that occurred sometime on ​Jan.15, 2018 and that it took place in SQL  City. Start by retrieving the corresponding crime scene report from the police  department’s database.”—Really fun game to help exercise your skills with SQL by the NU Knight Lab. I loaded their SQLite database into Datasette so you can play in your browser. #

6th October 2019

  • twitter-to-sqlite 0.6, with track and follow. I shipped a new release of my twitter-to-sqlite command-line tool this evening. It now includes experimental features for subscribing to the Twitter streaming API: you can track keywords or follow users and matching Tweets will be written to a SQLite database in real-time as they come in through the API. Since Datasette supports mutable databases now you can run Datasette against the database and run queries against the tweets as they are inserted into the tables. #
  • Streamlit: Turn Python Scripts into Beautiful ML Tools (via) A really interesting new tool / application development framework. Streamlit is designed to help machine learning engineers build usable web frontends for their work. It does this by providing a simple, productive Python environment which lets you declaratively build up a sort-of Notebook style interface for your code. It includes the ability to insert a DataFrame, geospatial map rendering, chart or image into the application with a single Python function call. It’s hard to describe how it works, but the tutorial and demo worked really well for me: “pip install streamlit” and then “streamlit hello” to get a full-featured demo in a browser, then you can run through the tutorial to start building a real interactive application in a few dozen lines of code. #

5th October 2019

  • Get your own Pocket OAuth token (via) I hate it when APIs make you jump through extensive hoops just to get an access token for pulling data directly from your own personal account. I’ve been playing with the Pocket API today and it has a pretty complex OAuth flow, so I built a tiny Flask app on Glitch which helps go through the steps to get an API token for your own personal Pocket account. #
  • Client-Side Certificate Authentication with nginx. I’m intrigued by client-side browser certificates, which allow you to lock down a website such that only browsers with a specific certificate installed can access them. They work on both laptops and mobile phones. I followed the steps in this tutorial and managed to get an nginx instance running which only allows connections from my personal laptop and iPhone. #

4th October 2019

  • NGINX: Authentication Based on Subrequest Result (via) TIL about this neat feature of NGINX: you can use the auth_request directive to cause NGINX to make an HTTP subrequest to a separate authentication server for each incoming HTTP request. The authentication server can see the cookies on the incoming request and tell NGINX if it should fulfill the parent request (via a 2xx status code) or if it should be denied (by returning a 401 or 403). This means you can run NGINX as an authenticating proxy in front of any HTTP application and roll your own custom authentication code as a simple webhook-recieving endpoint. #

3rd October 2019

  • SQL queries don't start with SELECT. This is really useful. Understanding that SELECT (and associated window functions) happen after the WHERE, GROUP BY and HAVING helps explain why you can’t filter a query based on the results of a window function for example. #

28th September 2019

  • Microservices are about scaling teams, not scaling tech

    Petrus Theron #

26th September 2019

  • If you’re a little shy at conferences, speaking is The Best way to break the ice. Nobody talks to you before the talk. Everybody want’s to talk to you afterwards, largely because they have a way in. As such, public speaking is bizarrely good for introverts.

    Andy Budd #

25th September 2019