Simon Willison’s Weblog

On nodejs, observable, explorables, paulford, andrewgodwin, ...


Recent entries

Datasette Demo (video) from the SF Python Meetup one month ago

I gave a short talk about Datasette last month at the SF Python Meetup Holiday Party. They’ve just posted the video, so here it is:

I showed how I built San Francisco Tree Search using Datasette, csvs-to-sqlite and data from the San Francisco Department of Public Works.

Analyzing my Twitter followers with Datasette one month ago

I decided to do some ad-hoc analsis of my social network on Twitter this afternoon… and since everything is more fun if you bundle it up into a SQLite database and publish it to the internet I performed the analysis using Datasette.

The end result

Here’s the Datasette database containing all of my Twitter followers:

Much more interesting though are the queries I can now run against it. A few examples:

The thing I find most exciting about this use-case for Datasette is that it allows you to construct entire mini-applications using just a SQL query encoded in a URL. Type queries into the textarea, iterate on them until they do something useful, add some :named parameters (which generate form fields) and bookmark the resulting URL. It’s an incredibly powerful way to build custom interfaces for exploring data.

The rest of this post will describe how I pulled the data from Twitter and turned it into a SQLite database for publication with Datasette.

Fetching my followers

To work with the Twitter API, we first need credentials. Twitter still mostly uses the OAuth 1 model of authentication which is infuriatingly complicated, requiring you to sign parameters using two pairs of keys and secrets. OAuth 2 mostly uses a single access token sent over TLS to avoid the signing pain, but Twitter’s API dates back to the times when API client libraries with robust TLS were not a safe assumption.

Since I have to re-figure out the Twitter API every few years, here’s how I got it working this time. I created a new Twitter app using the form on (which is surprisingly hard to find if you start out on the portal). Having created the app I navigated to the “Keys and Access Tokens” tab, scrolled down and clicked the “Create my access token” button. Then I grabbed the four magic tokens from the following spots on the page:

Twitter application setup

Now in Python I can make properly signed calls to the Twitter API like so:

from requests_oauthlib import OAuth1Session
twitter = OAuth1Session(

The Twitter API has an endpoint for retrieving everyone who follows an account as a paginated JSON list: followers/list. At some point in the past few years Twitter got really stingy with their rate limits—most endpoints, including followers/list only allow 15 requests every 15 minutes! You can request up to 200 followers at a time, but with 15,000 followers that meant the full fetch would take 75 minutes. So I set the following running in a Jupyter notebook and went for a walk with the dog.

from requests_oauthlib import OAuth1Session
import urllib.parse
import time

twitter = OAuth1Session(...)
url = ''

def fetch_followers(cursor=-1):
    r = twitter.get(url + '?'+ urllib.parse.urlencode({
        'count': 200,
        'cursor': cursor
    return r.headers, r.json()

cursor = -1
users = []
while cursor:
    headers, body = fetch_followers(cursor)
    cursor = body['next_cursor']

A couple of hours later I had a users list with 15,281 user dictionaries in it. I wrote that to disk for safe keeping:

import json
json.dump(users, open('twitter-followers.json', 'w'), indent=4)

Converting that JSON into a SQLite database

I wrote some notes on How to turn a list of JSON objects into a Datasette using Pandas a few weeks ago. This works really well, but we need to do a bit of cleanup first: Pandas prefers a list of flat dictionaries, but the Twitter API has given us back some nested structures.

I won’t do a line-by-line breakdown of it, but here’s the code I ended up using. The expand_entities() function replaces Twitter’s ugly links with their expanded display_url alternatives—then clean_user() flattens a nested user into a simple dictionary:

def expand_entities(s, entities):
    for key, ents in entities.items():
        for ent in ents:
            if 'url' in ent:
                replacement = ent['expanded_url'] or ent['url']
                s = s.replace(ent['url'], replacement)
    return s

def clean_user(user):
    if user['description'] and 'description' in user['entities']:
        user['description'] = expand_entities(
            user['description'], user['entities']['description']
    if user['url'] and 'url' in user['entities']:
        user['url'] = expand_entities(user['url'], user['entities']['url'])
    if 'entities' in user:
        del user['entities']
    if 'status' in user:
        del user['status']

for user in users:

I now have a nice flat list of users dictionaries—a subset of which is provided here for illustration.

One additional step: SQLite’s built-in functions for handling date and time prefer ISO formatted timestamps, but previewing the DataFrame in Jupyter shows that the data I pulled from Twitter has dates in a different format altogether. I can fix this with a one-liner using the ever-handy dateutil library:

from dateutil.parser import parse
import pandas as pd
df = pd.DataFrame(users)
df['created_at'] = df['created_at'].apply(lambda s: parse(s).isoformat())

Here’s the before and after:

df.apply() illustrated

Now that the list contains just simple dictionaries, I can load it into a Pandas DataFrame and convert it to a SQLite table like so:

import sqlite3
conn = sqlite3.connect('/tmp/followers.db')
df.to_sql('followers', conn)

Now I can run datasette /tmp/followers.db to preview what I’ve got so far.

Extracting columns and setting up full-text search

This all works fine, but it’s not quite the finished product I demonstrated above. My desired final state has two additional features: common values in the lang, location, time_zone and translator_type columns have been pulled out into lookup tables, and I’ve enabled SQLite full-text search against a subset of the columns.

Normally I would use the -c and -f arguments to my csvs-to-sqlite tool to do this (see my write-up here), but that tool only works against CSV files on disk. I want to work with an in-memory Pandas DataFrame.

So I reverse-engineered my own code and figured out how to apply the same transformations from an interactive Python prompt instead. It ended up looking like this:

from csvs_to_sqlite import utils

conn = sqlite3.connect('/tmp/simonw-twitter-followers.db')

# Define columns I want to refactor:
foreign_keys = {
    'time_zone': ('time_zone', 'value'),
    'translator_type': ('translator_type', 'value'),
    'location': ('location', 'value'),
    'lang': ('lang', 'value'),
new_frames = utils.refactor_dataframes(conn, [df], foreign_keys)

# Save my refactored DataFrame to SQLite
    conn, new_frames[0], 'followers',
    foreign_keys, None, index_fks=True

# Create the full-text search index across these columns:
fts = ['screen_name', 'description', 'name', 'location']
utils.generate_and_populate_fts(conn, ['followers'], fts, foreign_keys)


Final step: publishing with Datasette

Having run datasette /tmp/simonw-twitter-followers.db to confirm locally that I got the results I was looking for, the last step was to publish it to the internet. As always, I used Zeit Now via the datasette publish command for this final step:

tmp $ datasette publish now simonw-twitter-followers.db \
    --title="@simonw Twitter followers, 27 Jan 2018"
> Deploying /private/var/.../datasette under simonw
> Ready! (copied to clipboard) [14s]
> Synced 2 files (11.29MB) [0ms] 
> Initializing…
> Building
> ▲ docker build
Sending build context to Docker daemon 11.85 MBkB
> Step 1 : FROM python:3
> Deployment complete!

Then I ran new alias to assign a permanent, more memorable URL:

 now alias

Datasette Publish: a web app for publishing CSV files as an online database two months ago

I’ve just released Datasette Publish, a web tool for turning one or more CSV files into an online database with a JSON API.

Here’s a demo application I built using Datasette Publish, showing Californian campaign finance data using CSV files released by the California Civic Data Coalition.

And here’s an animated screencast showing exactly how I built it:

Animated demo of Datasette Publish

Datasette Publish combines my Datasette tool for publishing SQLite databases as an API with my csvs-to-sqlite tool for generating them.

It’s built on top of the Zeit Now hosting service, which means anything you deploy with it lives on your own account with Zeit and stays entirely under your control. I used the brand new Zeit API 2.0.

Zeit’s generous free plan means you can try the tool out as many times as you like—and if you want to use it for an API powering a production website you can easily upgrade to a paid hosting plan.

Who should use it

Anyone who has data they want to share with the world!

The fundamental idea behind Datasette is that publishing structured data as both a web interface and a JSON API should be as quick and easy as possible.

The world is full of interesting data that often ends up trapped in PDF blobs or other hard-to-use formats, if it gets published at all. Datasette encourages using SQLite instead: a powerful, flexible format that enables analysis via SQL queries and can easily be shared and hosted online.

Since so much of the data that IS published today uses CSV, this first release of Datasette Publish focuses on CSV conversion above anything else. I plan to add support for other useful formats in the future.

The three areas I’m most excited in seeing adoption of Datasette are data journalism, civic open data and cultural institutions.

Data journalism because when I worked at the Guardian Datasette is the tool I wish I had had for publishing data. When we started the Guardian Datablog we ended up using Google Sheets for this.

Civic open data because it turns out the open data movement mostly won! It’s incredible how much high quality data is published by local and national governments these days. My San Francisco tree search project for example uses data from the Department of Public Works—a CSV of 190,000 trees around the city.

Cultural institutions because the museums and libraries of the world are sitting on enormous treasure troves of valuable information, and have an institutional mandate to share that data as widely as possible.

If you are involved in any of the above please get in touch. I’d love your help improving the Datasette ecosystem to better serve your needs.

How it works

Datasette Publish would not be possible without Zeit Now. Now is a revolutionary approach to hosting: it lets you instantly create immutable deployments with a unique URL, via a command-line tool or using their recently updated API. It’s by far the most productive hosting environment I’ve ever worked with.

I built the main Datasette Publish interface using React. Building a SPA here made a lot of sense, because it allowed me to construct the entire application without any form of server-side storage (aside from Keen for analytics).

When you sign in via Zeit OAuth I store your access token in a signed cookie. Each time you upload a CSV the file is stored directly using Zeit’s upload API, and the file metadata is persisted in JavaScript state in the React app. When you click “publish” the accumulated state is sent to the server where it is used to construct a new Zeit deployment.

The deployment itself consists of the CSV files plus a Dockerfile that installs Python, Datasette, csvs-to-sqlite and their dependencies, then runs csvs-to-sqlite against the CSV files and starts up Datasette against the resulting database.

If you specified a title, description, source or license I generate a Datasette metadata.json file and include that in the deployment as well.

Since free deployments to Zeit are “source code visible”, you can see exactly how the resulting application is structured by visiting (the campaign finance app I built earlier).

Using the Zeit API in this way has the neat effect that I don’t ever store any user data myself—neither the access token used to access your account nor any of the CSVs that you upload. Uploaded files go straight to your own Zeit account and stay under your control. Access tokens are never persisted. The deployed application lives on your own hosting account, where you can terminate it or upgrade it to a paid plan without any further involvement from the tool I have built.

Not having to worry about storing encrypted access tokens or covering any hosting costs beyond the Datasette Publish tool itself is delightful.

This ability to build tools that themselves deploy other tools is fascinating. I can’t wait to see what other kinds of interesting new applications it enables.

Discussion on Hacker News.

Building a location to time zone API with SpatiaLite, OpenStreetMap and Datasette three months ago

Given a latitude and longitude, how can we tell what time zone that point lies within? Here’s how I built a simple JSON API to answer that question, using a combination of data from OpenStreetMap, the SpatiaLite extension for SQLite and my Datasette API tool.


You can try the API out here: feed it a latitude and longitude and it will return the corresponding time zone ID:

Some examples:

The data

I was first introduced to Eric Muller’s tz_world shapefile by Nick Williams at Eventbrite, who used it to build us an internal time zone lookup API on top of MySQL’s geospatial data types. Eric’s project is no longer updated and he recommends Evan Siroky’s timezone-boundary-builder project as an alternative, which derives time zone shapes from OpenStreetMap and makes the resulting data available under the Open Database License. The shapefile itself can be downloaded from the GitHub releases page for the project.

Loading the data into SpatiaLite

SpatiaLite is a powerful open source extension for SQLite, which adds a comprehensive suite of geospatial functions—including the ability to ingest shapefiles, convert them into geometries and run point within queries against the resulting shapes.

The easiest way to get it running on OS X is via Homebrew:

$ brew install spatialite-tools
$ brew install gdal

Having installed SpatiaLite, we can ingest the shapefile using .loadshp combined_shapefile timezones CP1252 23032—here’s the full process, from downloading the shapefile to ingesting it into a new SQLite database file called timezones.db:

$ wget
$ unzip
$ cd dist
$ spatialite timezones.db
SpatiaLite version ..: 4.3.0a   Supported Extensions:
Enter SQL statements terminated with a ";"
spatialite> .loadshp combined_shapefile timezones CP1252 23032
Loading shapefile at 'combined_shapefile' into SQLite table 'timezones'

CREATE TABLE "timezones" (
"tzid" TEXT);
SELECT AddGeometryColumn('timezones', 'Geometry', 23032, 'MULTIPOLYGON', 'XY');

Inserted 414 rows into 'timezones' from SHAPEFILE

Let’s try it out with a query. Here’s the SQL needed to find the time zone for a point in Tokyo:

select tzid
    within(GeomFromText('POINT(139.7819661 35.6631424)'), timezones.Geometry);

Let’s run that in SpatiaLite and see what we get:

spatialite> select tzid
   ...> from
   ...>     timezones
   ...> where
   ...>     within(GeomFromText('POINT(139.7819661 35.6631424)'), timezones.Geometry);

Looks good so far! How long is it taking though? We can find out by running .timer on in the spatialite prompt:

spatialite> .timer on
spatialite> select tzid
   ...> from
   ...>     timezones
   ...> where
   ...>     within(GeomFromText('POINT(139.7819661 35.6631424)'), timezones.Geometry);
CPU Time: user 0.108479 sys 0.064778

That’s a tenth of a second, or 100ms. Fast, but not brilliant.

Speeding it up with a geospatial index

It turns out SpatiaLite includes support for spatial indices, based on SQLite’s R*Tree module. R-Tree indexes can massively accelerate boundary box searches. Our searches are a lot more complex than that, acting as they do against extremely complex polygon shapes—but we can use a boundary box search to dramatically reduce the candidates we need to consider. Let’s create an index against our Geometry column:

SELECT CreateSpatialIndex('timezones', 'geometry');

To take advantage of this index, we need to expand our original SQL to first filter by geometries where their bounding box contains the point we are searching for. SpatiaLite has created an index table called idx_timezones_Geometry against which we can run an R-Tree optimized query. Here’s the SQL we will use:

select tzid
    within(GeomFromText('POINT(139.7819661 35.6631424)'), timezones.Geometry)
    and rowid in (
        SELECT pkid FROM idx_timezones_Geometry
        where xmin < 139.7819661
        and xmax > 139.7819661
        and ymin < 35.6631424
        and ymax > 35.6631424

How does this affect our performance?

spatialite> select tzid
   ...> from
   ...>     timezones
   ...> where
   ...>     within(GeomFromText('POINT(139.7819661 35.6631424)'), timezones.Geometry)
   ...>     and rowid in (
   ...>         SELECT pkid FROM idx_timezones_Geometry
   ...>         where xmin < 139.7819661
   ...>         and xmax > 139.7819661
   ...>         and ymin < 35.6631424
   ...>         and ymax > 35.6631424
   ...>     );
CPU Time: user 0.001541 sys 0.000111

From 100ms down to 1.5ms—nearly a 70x speedup! Not bad at all.

Building and publishing the API with Datasette

Now that we have a fast SQL query for finding a time zone for a latitude and longitude we can use Datasette to turn it into a JSON API.

The simplest way to do that looks like this:

datasette timezones.db \

This will start Datasette on port 8001 and load the SpatiaLite extension. You can then navigate to localhost:8001/timezones in your browser and paste in the SQL query… which should look something like this:‘POINT(139.7819661+35.6631424)’)%2C+timezones.Geometry)%0D%0Aand+rowid+in+(%0D%0A++++SELECT+pkid+FROM+idx_timezones_Geometry%0D%0Awhere+xmin+<+139.7819661%0D%0Aand+xmax+>+139.7819661%0D%0Aand+ymin+<+35.6631424%0D%0Aand+ymax+>+35.6631424%0D%0A++++)%3B

This works (click the JSON link to get the result as JSON) but it’s a little inconvenient to use: you have to construct a URL with the same latitude and longitude repeated in multiple places.

We can improve things using Datasette’s support for SQLite named parameters. Here’s that same SQL query using parameters instead of hard-coded latitude and longitude points:

select tzid
    within(GeomFromText('POINT(' || :longitude || ' ' || :latitude || ')'), timezones.Geometry)
    and rowid in (
        SELECT pkid FROM idx_timezones_Geometry
        where xmin < :longitude
        and xmax > :longitude
        and ymin < :latitude
        and ymax > :latitude)

If you paste this into Datasette it will detect the named parameters and turn them into URL querystring parameters hooked up (in thu UI) to HTML form fields.

Datasette time zone query showing longitude and latitude form fields

To save us from having to include the full SQL in the URL every time we call our new API, let’s take advantage of a new feature introduced in Datasette 0.14: canned queries. These are named, pre-packaged queries that can be defined in a metadata.json file. The file looks like this:

    "title": "OpenStreetMap Time Zone Boundaries",
    "license": "ODbL",
    "license_url": "",
    "source": "timezone-boundary-builder",
    "source_url": "",
    "databases": {
        "timezones": {
            "queries": {
                "by_point": "select tzid\nfrom\n    timezones\nwhere\n    within(GeomFromText('POINT(' || :longitude || ' ' || :latitude || ')'), timezones.Geometry)\n    and rowid in (\n        SELECT pkid FROM idx_timezones_Geometry\n        where xmin < :longitude\n        and xmax > :longitude\n        and ymin < :latitude\n        and ymax > :latitude\n    )"

The canned query is defined as the by_point key in the queries nested object. I’m also adding license and source information here for the project, because it’s good manners.

We can try this in Datasette on our local machine like so:

datasette timezones.db -m metadata.json \

Now visiting localhost:8001/timezones/by_point will provide the interface for the query—and adding .json to the URL will turn it into an API.

Vizualizing time zones with Leaflet and GeoJSON

If you browse around in Datasette you’ll quickly run into a rather unfortunate problem. The localhost:8001/timezones/timezones page, which shows the first 100 rows in the table, takes a shockingly long time to load. When it eventually does load you’ll see why: each record includes an enormous binary string containing its the geometry. On my machine just that one page weighs in at 62MB of HTML!

This is bad: rendering that much HTML in one go can block the event loop and cause the application to become unresponsive. That giant blob of binary data isn’t exactly useful for humans, either.

We can make some major improvements here using another Datasette 0.14 feature: custom templates. Let’s start with a replacement template that shows just the length of the binary string instead of attempting to render it.

We’ll do that by over-riding the _rows_and_columns.html include template, which is used by Datasette on both the table page and the page used to display individual rows. Since we only want to over-ride this template for one particular table we’ll create a file called _rows_and_columns-timezones-timezones.html—the file name means this will only over-ride the template timezones table in our timezones database. Here’s our new template:

            {% for column in display_columns %}
                <th scope="col">{{ column }}</th>
            {% endfor %}
    {% for row in display_rows %}
            {% for cell in row %}
                    {% if cell.column == 'Geometry' %}
                        {{ cell.value|length }} bytes
                    {% else %}
                        {{ cell.value }}
                    {% endif %}
            {% endfor %}
    {% endfor %}

If we put that in a directory called templates/ we can tell Datasette to use it like this:

datasette timezones.db -m metadata.json \
    --load-extension=/usr/local/lib/mod_spatialite.dylib \

Our localhost:8001/timezones/timezones page now looks like this:

Datasette time zone index showing lengths

But wouldn’t it be neat if we could see the actual shapes of these time zones? It turns out that’s actually pretty easy, using the combination of GeoJSON and the Leaflet mapping library.

GeoJSON is a neat, widely supported standard for encoding geographic information such as polygon shapes as JSON. SpatiaLite ships with built-in GeoJSON support in the form of the AsGeoJSON SQL function. We can use that function to turn any of our time zone geometries into a GeoJSON string:

select AsGeoJSON(Geometry) from timezones where tzid = 'Asia/Tokyo';

If you run that with Datasette you’ll get back a string of GeoJSON. You can paste that into to instantly visualize it.

The Leaflet mapping library supports GeoJSON out of the box. We can construct a custom row.html template for our Datasette that loads Leaflet from, uses fetch() to execute the AsGeoJSON query and renders the result in a map on the page. Here’s the result:

Custom Datasette page rendering a GeoJSON map

And here’s the row-timezones-timezones.html template:

{% extends "row.html" %}
{% block extra_head %}
<link rel="stylesheet" href="" integrity="sha512-M2wvCLH6DSRazYeZRIm1JnYyh22purTM+FDB5CsyxtQJYeKq83arPe5wgbNmcFXGqiSH2XR8dT/fJISVA1r/zQ==" crossorigin=""/>
<script src="" integrity="sha512-lInM/apFSqyy1o6s89K4iQUKg6ppXEgsVxT35HbzUupEVRh2Eu9Wdl4tHj7dZO0s1uvplcYGmt3498TtHq+log==" crossorigin=""></script>
#map {
  margin-top: 1em;
  width: 100%;
  height: 400px;
{% endblock %}

{% block content %}
{{ super() }}
<div id="map"></div>
var pk = location.pathname.split('/').slice(-1)[0];
var tiles = L.tileLayer('https://{s}{z}/{x}/{y}.png', {
    maxZoom: 19,
    detectRetina: true,
    attribution: '&copy; <a href="">OpenStreetMap</a> contributors, Points &copy 2012 LINZ'
var sql = 'select AsGeoJSON(Geometry) from timezones where PK_UID = ' + pk;
fetch('/timezones.json?sql=' + encodeURIComponent(sql)).then(r => r.json()).then(d => {
  var map ='map', {layers: [tiles]});
  var geoJSON = JSON.parse(d.rows[0][0]);
  var layer = L.geoJSON(geoJSON)
{% endblock %}

Publishing it to the internet

Normally we would use the datasette publish command to publish our database to Heroku or Zeit Now, but the SpatiaLite dependency means that won’t work for this case. Instead, we need to construct a custom Dockerfile that builds the SpatiaLite module.

Since we’re using Docker, we may as well have the Dockerfile download the shapefiles and build the SpatiaLite database for us all in one go. Here’s a Dockerfile that does exactly that:

FROM python:3.6-slim-stretch

RUN apt update
RUN apt install -y python3-dev gcc spatialite-bin libsqlite3-mod-spatialite wget unzip

RUN pip install

# Download and import the shapefiles
RUN wget \
    && unzip && \
    cd dist && \
    echo ".loadshp combined_shapefile timezones CP1252 23032\nSELECT CreateSpatialIndex('timezones', 'geometry');" | spatialite timezones.db && \
    mv timezones.db .. && \
    cd .. && rm -rf dist && rm

ENV SQLITE_EXTENSIONS /usr/lib/x86_64-linux-gnu/

ADD metadata.json metadata.json

ADD templates templates

RUN datasette inspect timezones.db --inspect-file inspect-data.json


CMD ["datasette", "serve", "timezones.db", "--host", "", "--cors", "--port", "8001", "--inspect-file", "inspect-data.json", "-m", "metadata.json", "--template-dir", "templates"]

The full code, including the supporting templates, can be found in simonw/timezones-api on GitHub.

If you have Docker installed (Docker for Mac is a one-click install package these days, it’s impressively painless) you can build the container like this:

docker build . -t timezones-api
# Wait for the image to build
docker run -p 8001:8001 timezones-api

Then you can visit http://localhost:8001/ to see your freshly built Datasette in your browser.

The easiest way to publish it online is using Zeit Now. Simply run the now command in the directory containing the Dockerfile and Zeit will upload the entire directory, build the container in the cloud and deploy it with a fresh URL. It’s by far the easiest Docker deployment environment I’ve ever used.

Now can even deploy directly from a public GitHub repository… so you can deploy your own copy of the API by running the following command:

$ now simonw/timezones-api
> Didn't find directory. Searching on GitHub...
> Deploying GitHub repository "simonw/timezones-api" under simonw
> Ready! (copied to clipboard) [2s]
> Initializing…

Canned queries + SpatiaLite = instant geospatial APIs

Hopefully this has helped illustrate the ease with which Datasette, SpatiaLite and canned queries can be used to create and publish geospatial APIs. Thanks to projects like OpenStreetMap the world is full of high quality open geospatial data. Go build something cool with it!



  • Protecting Against HSTS Abuse (via) Any web feature that can be used to persist information will eventually be used to build super-cookies. In this case it’s HSTS—a web feature that allows sites to tell browsers “in the future always load this domain over HTTPS even if the request specified HTTP”. The WebKit team caught this being exploited in the wild, by encoding a user identifier in binary across 32 separate sub domains. They have a couple of mitigations in place now—I expect other browser vendors will follow suit. #
  • How to use HDF5 files in Python (via) HDF5: the missing manual. A detailed explanation of the HDF5 file format and how to work with it using the h5py module. HDF5 allows you to efficiently store multiple datasets (plus metatdata about them) in a single file and then load data from them without pulling the entire file into memory—kind of like SQLite but without the SQL support and more optimized for working with arrays. #

17th March 2018

  • Trio Tutorial. Trio is a really nice async library for Python—a simpler alternative to asyncio, with some very clean API design. Best of all, the tutorial is fantastic—it provides a very clear explanation of async/await without diving into the intricacies of coroutines. #

13th March 2018

  • Everyone can now run JavaScript on Cloudflare with Workers. This is such a brilliant piece of software design: Cloudflare took the service workers spec and used it as the basis for their edge-executed JacaScript feature. This means you can run server-side JavaScript in hundreds of edge locations worldwide, applying custom dynamic logic (including additional async cached fetch() calls) with only around 1ms if additional overhead. The pricing model is a steal: $0.50 per million requests with a $5/month minimum. #
  • Being fast and light: Using binary data to optimise libraries on the client and the server. (via) Ada Rose Cannon provides a detailed introduction to ArrayBuffers in JavaScript and describes how she used them for a custom binary protocol to sync the state of 170 Virtual Reality users in the same venue without bringing down the network. #

10th March 2018

  • Consider Bitcoin a grand middle finger. It’s a prank, almost a parody of the global financial system, that turned into a bubble. “You plutocrats of Davos may think you control the global money supply,” the pranksters seem to say. “But humans will make an economy out of anything. Even this!”

    Paul Ford #

  • BAD TRAFFIC: Sandvine’s PacketLogic Devices Used to Deploy Government Spyware in Turkey and Redirect Egyptian Users to Affiliate Ads? “Targeted users in Turkey and Syria who downloaded Windows applications from official vendor websites including Avast Antivirus, CCleaner, Opera, and 7-Zip were silently redirected to malicious versions by way of injected HTTP redirects. This redirection was possible because official websites for these programs, even though they might have supported HTTPS, directed users to non-HTTPS downloads by default.” #
  • Real-time photogrammetry with #ARKit. Astonishing photogrammetry demo by Tim Field using ARKit in iOS 11.3. #
  • I’m still a novice to the healthcare space, but if I walked away with a single insight, it’s that the problems of the US healthcare system are very tractable. The high cost and mixed results are unique to our system. There are incumbents fighting fiercely to maintain the status quo, but no more so than in other industries that technology has overturned. The regulatory environment is complex, but again not uniquely so. There are industries where one has to dig to find the problems that technology is well suited to solve, but US healthcare, an industry that communicates via fax, is not one of them.

    Kellan Elliott-McCrea #

5th March 2018

  • Upgrades to Facebook's link security (via) Facebook have started scanning links shared on the site for HSTS headers, which are used to indicate that an HTTP page is also available over HTTPS and are intended to be cached by browsers such that future HTTP access is automatically retrieved over HTTPS instead. Facebook will now obey those headers itself and link directly to the HTTPS version. What a great idea: all sites with sophisticated link sharing (where links are fetched to retrieve extracts and images for example) should do this as well. #

2nd March 2018

  • Code is like a poem; it’s not just something we write to reach some practical result. Sometimes people that are far from the Redis philosophy suggest using other code written by other authors (frequently in other languages) in order to implement something Redis currently lacks. But to us this is like if Shakespeare decided to end Enrico IV using the Paradiso from the Divina Commedia. Is using any external code a bad idea? Not at all. Like in “One Thousand and One Nights” smaller self contained stories are embedded in a bigger story, we’ll be happy to use beautiful self contained libraries when needed. At the same time, when writing the Redis story we’re trying to write smaller stories that will fit in to other code.

    The Redis Manifesto #

1st March 2018

  • BearID: Bear Face Detector. Comprehensive tutorial on building a computer vision system to identify faces of bears, using dlib and the Histogram of Oriented Gradients (HOG) technique. Bears! #
  • The key to using Wagtail effectively is to recognise that there are multiple roles involved in creating a website: the content author, site administrator, developer and designer. These may well be different people, but they don’t have to be—if you’re using Wagtail to build your personal blog, you’ll probably find yourself hopping between those different roles. Either way, it’s important to be aware of which of those hats you’re wearing at any moment, and to use the right tools for that job. A content author or site administrator will do the bulk of their work through the Wagtail admin interface; a developer or designer will spend most of their time writing Python, HTML or CSS code. This is a good thing: Wagtail isn’t designed to replace the job of programming. Maybe one day someone will come up with a drag-and-drop UI for building websites that’s as powerful as writing code, but Wagtail is not that tool, and does not try to be.

    The Zen of Wagtail #

  • Wagtail 2.0. The leading Django content management system just released it’s 2.0 version—now Django 2.0 and Python 3 only, and with a new rich text editor based on Draft.js. I really like Wagtail—it’s full-feature,d mature, well-documented and philosophically aligned with how I think a CMS should work. I also like this line about the new Python 3 requirement: “Call us reckless neophiles, but we think that, nine years in, Python 3 is looking pretty solid.” #

27th February 2018

  • Responsive Components: a Solution to the Container Queries Problem (via) Philip Walton uses Chrome’s new ResizeObserver API (best described as document.onresize for elements, currently a W3C Editor’s Draft, not yet supported by other browsers) to implement a media-query style mechanism for applying CSS based on the size of the parent container. This is really clever. In the absence of ResizeObserver (which can be polyfilled) it can fall back to showing the narrowest design, which is probably best for mobile anyway. Desktop browsers are better equipped to run the polyfill. #

26th February 2018

  • By far the most important lesson I took out of this game is that whenever there’s behavior that needs to be repeated around to multiple types of entities, it’s better to default to copypasting it than to abstracting/generalizing it too early. This is a very very hard thing to do in practice. As programmers we’re sort of wired to see repetition and want to get rid of it as fast as possible, but I’ve found that that impulse generally creates more problems than it solves. The main problem it creates is that early generalizations are often wrong, and when a generalization is wrong it ossifies the structure of the code around it in a way that is harder to fix and change than if it wasn’t there in the first place.

    SSYGEN #

25th February 2018

  • r1chardj0n3s/parse: Parse strings using a specification based on the Python format() syntax. (via) Really neat API design: parse() behaves almost exactly in the opposite way to Python’s built-in format(), so you can use format strings as an alternative to regular expressions for extracting specific data from a string. #
  • kennethreitz/requests-html: HTML Parsing for Humans™ (via) Neat and tiny wrapper around requests, lxml and html2text that provides a Kenneth Reitz grade API design for intuitively fetching and scraping web pages. The inclusion of html2text means you can use a CSS selector to select a specific HTML element and then convert that to the equivalent markdown in a one-liner. #
  • Publishing history has various examples of advertising-only business models. But they are very much the exception. They mainly exist when there are near monopoly barriers to entry into the market which allow publishers to command and defend robust ad rates.

    Josh Marshall #