Simon Willison’s Weblog

Subscribe

November 2017

Nov. 7, 2017

Something is wrong on the internet. James Bridle takes a fascinating and deeply troubling dive into the world of Kids’ YouTube videos, which appear to be increasingly algorithmically generated and are evolving in a very dark direction.

# 12:40 pm / youtube, james-bridle

Cloud SQL for PostgreSQL adds high availability and replication. Google Cloud Platform now offers PostgreSQL with automatic asynchronous disk-level replication to a separate instance in a different availability zone, via their new “Regional Disks“ feature. Between this, Heroku, Citus and Amazon RDS the appeal of a self-maintained PostgreSQL instance continues to fall.

# 1:49 pm / highavailability, postgresql, google

Pull request #4120 · python/cpython. I just had my first ever change merged into Python! It was a one sentence documentation improvement (on how to cancel SQLite operations) but it was fascinating seeing how Python’s GitHub flow is set up—clever use of labels, plus a bot that automatically checks that you have signed a copy of their CLA.

# 2:06 pm / sqlite, open-source, python, github

GOV.UK Registers (via) Canonical sources of “lists of information” intended for use by GDS teams building software for the UK government, but available for anyone. 17 registers are “ready for use”, 45 are “in progress”. Covers things like the FCO’s country list, the official list of prison estates, and DEFRA’s list of public bodies in England that manage drainage systems.

# 3:31 pm / government, open-data, datagov, gov-uk

Feature Visualization (via) Another gorgeous paper published on Distill, the journal that prides itself on including interactive visualizations to help provide clear explanations of machine learning.

# 8:48 pm / machine-learning, explorables

Nov. 8, 2017

ZEIT – 6x Faster Now Uploads with HTTP/2 (via) Fantastic optimization write-up by Pranay Prakash. The Now deployment tool works by computing a hash for every local file in a project, then uploading just the ones that are missing. Pranay switched to uploading over HTTP/2 using the fetch-h2 library and got a 6x speedup for larger projects.

# 1:04 am / zeit-now, nodejs, http2

There's also a huge population of "admins," who as you might guess, administer an organization's salesforce account & data. These folks often start out as office managers or other clerical types, who are handed this responsibility because nobody else wants to do it. Here's where it gets interesting. Admin ➡️ WYSIWYG customizer ➡️ occasional coder ➡️ full time dev is a real pipeline into software development that folks often with just high school degrees are actually taking. This isn't just a narrative pushed by salesforce marketing; I'm meeting these people. They say things like "I love salesforce, it changed my life" with disarming sincerity.

Sarah Mei

# 11:56 am / salesforce, inclusion

Redis streams aren’t exciting for their innovativeness, but rather than they bring building a unified log architecture within reach of a small and/or inexpensive app. Kafka is infamously difficult to configure and get running, and is expensive to operate once you do. [...] Redis on the other hand is probably already in your stack.

Brandur Leach

# 4:23 pm / brandur-leach, redis, kafka

Redis Streams and the Unified Log. In which Brandur Leach explores the new Kafka-style streams functionality coming to Redis 4.0, and shows an example of a robust at-least once processing architecture built on a combination of Redis streams and PostgreSQL transactions. I really like the pattern of writing log records to a staging table in PostgreSQL first in order to bundle them up in the same transaction as the originating state change, then have a separate process read them from that table and publish them to Redis.

# 4:37 pm / brandur-leach, postgresql, redis

spaCy. “Industrial-strength Natural Language Processing in Python”. Exciting alternative to nltk—spaCy is mostly written in Cython, makes bold performance claims and ships with a range of pre-built statistical models covering multiple different languages. The API design is clean and intuitive and spaCy even includes an SVG visualizer that works with Jupyter.

# 4:43 pm / nlp, spacy, python

TensorFlow 101. Concise, readable introduction to TensorFlow, with Python examples you can execute (and visualize) in Jupyter.

# 5:57 pm / python, tensorflow

Eager Execution: An imperative, define-by-run interface to TensorFlow. Lets you evaluate TensorFlow expressions interactively in Python without needing to constantly run tf.Session().run(variable).

# 7:32 pm / tensorflow, python

For Redis 4.2 I'm moving Disque as a Redis module. To do this, Redis modules are getting a fully featured Cluster API. This means that it will be possible, for instance, to write a Redis module that orchestrates N Redis masters, using Raft or any other consensus algorithm, as a single distributed system. This will allow to also model strong guarantees easily.

Salvatore Sanfilippo

# 10:51 pm / redis, salvatore-sanfilippo

How Sentry Receives 20 Billion Events Per Month While Preparing To Handle Twice That. RabbitMQ federation, nginx and HAProxy, Riak as a key/value store, data processing is still mainly Python with a little bit of Rust. As of July 2017 it’s all hosted on Google Cloud Platform.

# 11:32 pm / scaling, rust, sentry

Nov. 9, 2017

When you’ve written the same code 3 times, write a function. When you’ve given the same in-person advice 3 times, write a blog post.

David Robinson

# 7:10 am / blogging

The Booming Japanese Rent-a-Friend Business. “In Japan, you can pay an actor to impersonate your relative, spouse, coworker, or any kind of acquaintance.”. Surprisingly poignant interview with a man who runs a company that employs 800 professional actors for almost any conceivable situation.

# 3:28 pm / japan

Use a Render Prop! Michael Jackson makes a convincing argument for using the render prop pattern in React in preference to HOCs (Higher Order Components, the React equivalent of Python class decorators).

# 3:39 pm / react

dhash (via) Python library to calculate the perceptual difference hash for an image. Delightfully simple algorithm that’s fully explained in the README—it works by scaling the image to 8x8 grayscale and then creator a bitmap representing of each pixel is lighter or darker than the previous one.

# 5:44 pm / images, python

Pink Trombone (via) Explorable explanation of the human vocal system by Neil Thapen. View source for some pleasantly readable JavaScript / canvas / window.AudioContext code.

# 6:44 pm / explorables

[On a startup using machine learning to encourage people to get addicted to apps] This technology is so unethical it needs to be criminalized globally, before it evolves into a Fermi Paradox solution... and I generally DON'T think banning technologies is a good idea. But this is like a neuroscience homebrew dirty nuke.

Charlie Stross

# 10:50 pm / charlie-stross

Nov. 10, 2017

Interactive Database of the World’s River Basins (via) “This database provides the first-ever compilation of the world’s river basins developed specifically for corporate disclosure. It features a comprehensive list of river basins worldwide, including their names, boundaries, and other helpful information.”

# 3:07 pm / gis

Exploring Line Lengths in Python Packages. Interesting exploration of the impact if the 79 character length limit rule of thumb on various Python packages—and a thoroughly useful guide to histogram plotting in Jupyter, pandas and matplotlib.

# 3:34 pm / jupyter, pandas, python, visualization

Nov. 11, 2017

Inside a low budget consumer hardware espionage implant. Fascinating break-down of a cheap ($25) USB charging cable with a hidden microphone and SIM that can be activated via SMS message or by calling the device.

# 8:04 pm / espionage

I’ve seen two different start-ups now, who hold personal data about customers in their “immutable log”. “How are you planning to handle GDPR requirements and removal of data?” – turns out the answer is often “Er – we haven’t thought about that.” Cue a sad face when I tell them that if they don’t modify their immutable log they’re automatically out of compliance.

Alex Hudson

# 8:57 pm / logs

Nov. 12, 2017

Any engineer who observes a bias in a system and chooses not to pro-actively correct for it is either a bad engineer or they stand to benefit from the bias. So much of engineering is about compensating, trimming, and equalizing imperfections out of real systems: wrap a feedback loop around it, and force the error function to zero.

Bunnie Huang

# 3:48 pm / bunniehuang

Anyone that has me on too high of a pedestal should see me fumbling around with git.

John Carmack

# 3:50 pm / git

Nov. 13, 2017

simonw/csvs-to-sqlite. I built a simple tool for bulk converting multiple CSV files into a SQLite database.

# 6:49 am / csv, sqlite, github, datasette, projects

CSS element() function (via) Vincent De Oliveira explores the CSS element function (2 years ago), which sadly is still only supported by Firefox. It lets you render any HTML element as a CSS background image, enabling folding effects, animated backgrounds, live previews and more.

# 2:34 pm / css

Entering the Quantum Era—How Firefox got fast again and where it’s going to get faster. I’ve been trying out the beta of Firefox 57 and it’s fantastic. All of that work on Servo and Rust is definitely paying off!

# 4:34 pm / rust, firefox, servo

Datasette: instantly create and publish an API for your SQLite databases

I just shipped the first public version of datasette, a new tool for creating and publishing JSON APIs for SQLite databases.

[... 968 words]