Simon Willison’s Weblog

Subscribe

November 2017

Nov. 14, 2017

How to train your own Object Detector with TensorFlow’s Object Detector API (via) Dat Tran built a TensorFlow model that can detect raccoons! Impressive results, especially given it was only trained on 200 raccoon images from Google Image search.

# 4:24 am / computer-vision, tensorflow, raccoons

Datasette for Polar Bears. I found a fun dataset of Polar Bear ear tag tracking data put out by the USGS Alaska Science Center and deployed it using datasette in just a couple of minutes—here’s how I did it.

# 5:41 am / datasette, projects

Datasettes · simonw/datasette. I’m collecting examples of datasette-powered APIs on the project wiki.

# 7:39 am / datasette, projects, github

Using SVG as placeholders — More Image Loading Techniques. This is such a good idea: generate a tiny SVG placeholder for an image, and display that until the image itself has loaded. This article explores potential ways of generating those SVGs in some depth.

# 7:19 pm / svg

pillow-simd (via) A “friendly fork” of the Python Pillow image library that takes advantage of SIMD operations on certain CPUs to obtain massive speed-ups—they claim 16 to 40 times faster than ImageMagick.

# 9:42 pm / pil, python

nginx proxy-cache-lock (via) Crucially important feature hidden away in the nginx documentation: proxy_cache_lock enables request coalescing, or dog-pile protection: it means that if a hundred simultaneous requests all suffer the same cache miss, only one request is made to the backend and the answer is then sent back to all hundred requests at once. I’ve leaned heavily on this feature in Varnish for years—useful to know that nginx has the same capability.

# 9:53 pm / nginx, caching, varnish, dogpile

We are actively developing cross datacenter replication (internally we are calling it "cross cluster replication" so you will likely see it referred to this in the future but of course this is subject to change). I can not give a timeframe, but it is one of the top features on the Elasticsearch roadmap.

Jason Tedor

# 10:40 pm / elasticsearch

tuxracer-web. Brilliant Docker hack from David Cooper: just run “docker run -p 8008:80 dtcooper/tuxracer-web” to get Tux Racer (the 3D game) running in your browser, on top a cunning mix of the noVNC HTML5 VNC client and icecast for sound.

# 11:28 pm / docker

Nov. 15, 2017

Release Notes for Safari Technology Preview 44. The big news is support for the W3C Payment Request API for devices with Apple Pay enabled. Chrome, Firefox and Edge have been working on this as well.

# 11:35 pm / safari

Nov. 16, 2017

Plotly Falcon SQL Client and the Remarkable Datasette SQLite2API Generator. Lovely write-up of Datasette by Tony Hirst. I’m particularly excited about its applications in the world of data journalism so it’s great to see it starting to get attention from people in that sphere.

# 3:04 am / tony-hirst, datasette

Verified cryptography for Firefox 57 (via) Mozilla just became the first browser vendor to ship a formally verified crypto implementation.

# 2:26 pm / cryptography, firefox, mozilla

Datasette 0.12. I just released v0.12 of Datasette. The most exciting new feature is the ability to display a UI for editing named parameters—so you can construct an arbitrarily complex SQL query, include some named parameters and then link directly to it in Datasette to provide a simple interface for changing those parameters. An example involving Australian dogs is included in the release notes.

# 3:55 pm / datasette, documentation

gitchangelog. Handy Python utility that can generate a reStructured Text changelog from your git commit log. I used this to help get the Datasette release notes started.

# 4:52 pm / documentation

Introducing security alerts on GitHub. This is huge: GitHub’s dependency graph feature now shows any dependencies that have a known security vulnerability, based on CVE IDs—and you can sign up for notifications of new vulnerabilities as well. Only supports Ruby and JavaScript today, but Python support is coming in 2018.

# 7:48 pm / security, github

django-multitenant (via) Absolutely fascinating Django library for horizontally sharding a database using a multi-tenant pattern, from the team at Citus. In this pattern every relevant table includes a “tenant_id”, and all queries should specifically select against that ID. Once you have that in place, you can shard your rows across multiple different databases and route to the correct database based on the tenant ID, safe in the knowledge that joins will still work provided they are against other rows belonging to the same tenant.

# 9:12 pm / scaling, postgresql, django

DSF calls for applicants for a Django Fellow. This is a fantastic opportunity: the Django Software Foundation’s fellowship program is providing the opportunity for a new developer to get paid to work on Django 20-40 hours a week. Tim Graham has held this position full-time for the past three years and is looking to scale back to part-time, hence this new opportunity. Applications are due by December 18, 2017.

# 10:31 pm / django

now-ab. Intriguing example of a Zeit Now microservice: now-ab is a Node.js HTTP proxy which proxies through to one of two or more other Now-deployed applications based on a cookie. If you don’t have the cookie, it picks a backend at random and sets the cookie. Admittedly this is the easiest part of implementing A/B testing (the hard part is the analytics: tracking exposures and conversions) but as an example of a microservice architectural pattern this is fascinating.

# 11:03 pm / zeit-now, ab-testing, microservices, nodejs

Nov. 17, 2017

If you’re a public data provider—and many large NGOs, government organizations, cultural organizations, historical archives, media organizations, medical orgs, and academic institutions are exactly that—you can publish gigabytes of data, and make it available as an API, and make it easy to browse on the web, too, with extremely low effort. Put it into SQLite, point this little guy at it, and you’ve just radically increased the accessibility and utility of your data. Because messing around in SQL from a web browser is orders of magnitude more immediately useful than downloading a CSV, processing it, and figuring out what comes next.

Paul Ford

# 1:10 am / datasette, sqlite, paul-ford

The Pac-Man Rule at Conferences. This is such a good idea from Eric Holscher: at the conferences he organizes he tells his attendees “When standing as a group of people, always leave room for 1 person to join your group”—to encourage networking and inclusive converations.

# 2:34 am / conferences, eric-holscher

csvs-to-sqlite: Refactoring columns into separate lookup tables. I just shipped a new version of csvs-to-sqlite with the ability to extract specified columns into a separate SQLite lookup table by passing additional command-line arguments.

# 6:41 am / projects, csv

Nov. 18, 2017

Scaling Postgres with Read Replicas & Using WAL to Counter Stale Reads (via) The problem with sending writes to the primary and balancing reads across replicas is dealing with replica lag—what if you write to the primary and then read from a replica that hasn’t had the new state applied to it yet? Brandur Leach dives deep into an elegant solution using PostgreSQL’s LSN (log sequence numbers) accesesed using pg_last_wal_replay_lsn(). An observer process continuously polls the replicas for their most recently applied LSN and stores them in a table. A column in the Users table then records the min_lsn valid for that user, updating it to the pg_current_wal_lsn() of the primary whenever that user makes a write. Combining the two allows the application to randomly select a replica that is up-to-date for the purposes of a specific user any time it needs to make a read.

# 6:42 pm / scaling, replication, postgresql, brandur-leach

Monodraw for macOS (via) Spectacular OS X application for drawing ASCII art diagrams—it’s basically OmniGraffle for ASCII art. The output is beautiful, and it even lets you do things like draw connecting arrows between boxes and then re-arrange the boxes such that the arrows keep up with your changes.

# 6:58 pm / asciiart

Nov. 19, 2017

pythondotorg/admin.py (via) There are some neat tricks in the Django application that powers Python.org (built a few years ago by RevSys). Here’s how their admin app handles creator/last_modified_by user relationships.

# 6:28 am / django

csvs-to-sqlite: Automatically deploy tags as PyPI releases. I learned how to set up Travis CI to automatically deploy new package versions to PyPI when I push a version tag to GitHub (and the tests pass).

# 6:35 am / travis, pypi

Nov. 20, 2017

Medicaid Eligibility | plainlanguage.gov (via) Useful resource for helping government writers use clear language. I love that this is an official US government website written using Jekyll and developed entirely in the open on GutHub—the commit history is fascinating.

# 1:59 am / writing

Publishing the Lahman Baseball Database with Datasette (via) Ted Lawless used Datasette to publish an API for the Lahman Baseball Database—a dataset of players and teams spanning 1871 to 2016, available under a Creative Commons License.

# 9:15 pm / datasette

Nov. 21, 2017

SSL Issuer Popularity. The impressive growth of Let’s Encrypt in one graph: from 4.87% of TLS-enabled domains in May 2016 to 36.68% in November 2017.

# 2:44 pm / ssl

JOY.js (via) Delightful new exploratory programming environment from Nicky Case. Try the Turtle demo, which takes Logo and updates it for the modern web.

# 2:50 pm / explorables, javascript

Of SVG, Minification and Gzip. Delightfully nerdy exploration of tricks you can use to hand-optimize your SVG in order to maximize gzip compression. Premature optimization may be the root of all evil but this is still a great way to learn about how gzip actually works.

# 2:54 pm / svg, gzip, minification

gzthermal (via) “pseudo thermal view of Gzip/Deflate compression efficiency”—neat tool for visualizing gzip compressed data and understanding exactly how run-length encoding and back references apply to a gzipped file.

# 2:56 pm / gzip