Simon Willison’s Weblog

Items in Feb, 2018

Filters: Year: 2018 × Month: Feb ×

Responsive Components: a Solution to the Container Queries Problem (via) Philip Walton uses Chrome’s new ResizeObserver API (best described as document.onresize for elements, currently a W3C Editor’s Draft, not yet supported by other browsers) to implement a media-query style mechanism for applying CSS based on the size of the parent container. This is really clever. In the absence of ResizeObserver (which can be polyfilled) it can fall back to showing the narrowest design, which is probably best for mobile anyway. Desktop browsers are better equipped to run the polyfill. # 27th February 2018, 1:21 pm

By far the most important lesson I took out of this game is that whenever there’s behavior that needs to be repeated around to multiple types of entities, it’s better to default to copypasting it than to abstracting/generalizing it too early. This is a very very hard thing to do in practice. As programmers we’re sort of wired to see repetition and want to get rid of it as fast as possible, but I’ve found that that impulse generally creates more problems than it solves. The main problem it creates is that early generalizations are often wrong, and when a generalization is wrong it ossifies the structure of the code around it in a way that is harder to fix and change than if it wasn’t there in the first place.

SSYGEN # 26th February 2018, 5:23 am

r1chardj0n3s/parse: Parse strings using a specification based on the Python format() syntax. (via) Really neat API design: parse() behaves almost exactly in the opposite way to Python’s built-in format(), so you can use format strings as an alternative to regular expressions for extracting specific data from a string. # 25th February 2018, 4:58 pm

kennethreitz/requests-html: HTML Parsing for Humans™ (via) Neat and tiny wrapper around requests, lxml and html2text that provides a Kenneth Reitz grade API design for intuitively fetching and scraping web pages. The inclusion of html2text means you can use a CSS selector to select a specific HTML element and then convert that to the equivalent markdown in a one-liner. # 25th February 2018, 4:49 pm

Publishing history has various examples of advertising-only business models. But they are very much the exception. They mainly exist when there are near monopoly barriers to entry into the market which allow publishers to command and defend robust ad rates.

Josh Marshall # 25th February 2018, 4:03 pm

github-trending-repos (via) This is a really clever hack: Vitaliy Potapov built a system for subscribing to a weekly digest of trending GitHub repos in your favourite languages entirely on top of the existing GitHub issues notification system. Find the issue for your particular language and hit “subscribe” and you’ll get an email (or push notification depending on how you get your issue notifications) once a week with the latest trends. The implementation is a 220 line Node.js script which runs on a daily and weekly schedule using Circle CI, so Vitaliy doesn’t even have to host or pay for any of the underlying infrastructure. It’s brilliant. # 23rd February 2018, 5:36 pm

I am pleased to inform all of you that there is a notorious black market maple syrup seller and he looks exactly like the image you get in your head when someone first says the phrase “a notorious black market maple syrup seller” to you.

Brian Grubb # 23rd February 2018, 3:56 pm

GitHub: Weak cryptographic standards removal notice. GitHub deprecated TLSv1 and TLSv1.1 yesterday. I like how they handled the deprecation: they disabled the protocols for one hour on February 8th in order to (hopefully) warm people by triggering errors in automated processes, then disabled them completely a couple of weeks later. # 23rd February 2018, 3:41 pm

I’ve Just Launched “Pwned Passwords” V2 With Half a Billion Passwords for Download (via) Troy Hunt has collected 501,636,842 passwords from a wide collection of major breaches. He suggests using the to build a password strength checker that can say “your password has been used by 53,274 other people”. The full collection is available as a list of SHA1 codes (brute-force reversible but at least slightly obfuscated) in an 8GB file or as an API. Where things get really clever is the API design: you send just the first 5 characters of the SHA1 hash of the user’s password and the API responds with the full list of several hundred hashes that match that prefix. This lets you build a checking feature without sharing full passwords with a remote service, if you don’t want to host the full 8GB of data yourself. # 22nd February 2018, 7:24 pm

s3monkey: A Python library that allows you to interact with Amazon S3 Buckets as if they are your local filesystem. (via) A particularly devious hack by Kenneth Reitz—provides a context manager within which various Python filesystem APIs such as open() and os.listdir() are monkeypatched to operate against an S3 bucket instead. Kenneth built it to make it easier to work with files from apps running on Heroku. Under the hood it uses pyfakefs, a filesystem mocking library originally released by Google. # 21st February 2018, 5:54 pm

A Promenade of PyTorch. Useful overview of the PyTorch machine learning library from Facebook AI Research described as “a Python library enabling GPU-accelerated tensor computation”. Similar to TensorFlow, but where TensorFlow requires you to explicitly construct an execution graph PyTorch instead lets you write regular Python code (if statements, for loops etc) which PyTorch then uses to construct the execution graph for you. # 21st February 2018, 5:31 am

Andrew Godwin’s www-router Docker container (via) Really clever Docker trick: a container that runs Nginx and uses it to route traffic to other containers based on the hostname—but the hostnames to be routed are configured using environment variables which the CMD script uses to dynamically construct an nginx config when the container starts. # 21st February 2018, 5:04 am

Photos from our tour of the amazing bone collection of Ray Bandar. Ray Bandar (1927-2017) was an artist, scientist, naturalist and an incredibly prolific collector of bones. His collection is in the process of moving to the California Academy of Sciences but Natalie managed to land us a private tour lead by his great nephew. The collection is truly awe-inspiring, and a testament to an extraordinary life lived following a very particular passion. # 21st February 2018, 4:58 am

Moving a large and old codebase to Python3 (via) Really interesting case study full of good ideas. The codebase in this case was 240,000 lines of Python and Django written over the course of 15 years. The team used Python-Modernize to aid their transition to a six-compatible codebase first. # 20th February 2018, 2:39 pm

Python & Async Simplified. Andrew Godwin: “Python’s async framework is actually relatively simple when you treat it at face value, but a lot of tutorials and documentation discuss it in minute implementation detail, so I wanted to make a higher-level overview that deliberately ignores some of the small facts and focuses on the practicalities of writing projects that mix both kinds of code.” ‪This is really useful: clearly explains the two separate worlds of Python (sync and async functions) and describes Andrew’s clever sync_to_async and async_to_sync decorators as well.‬ # 20th February 2018, 12:30 am

The whole story is basically that Facebook gets so much traffic that they started convincing publishers to post things on Facebook. For a long time, that was fine. People posted things on Facebook, then you would click those links and go to their websites. But then, gradually, Facebook started exerting more and more control of what was being seen, to the point that they, not our website, essentially became the main publishers of everyone’s content. Today, there’s no reason to go to a comedy website that has a video if that video is just right on Facebook. And that would be fine if Facebook compensated those companies for the ad revenue that was generated from those videos, but because Facebook does not pay publishers, there quickly became no money in making high-quality content for the internet.

Matt Klinman # 7th February 2018, 3:51 pm

Googlebot’s Javascript random() function is deterministic. random() as executed by Googlebot returns the same predicable sequence. More interestingly, Googlebot runs a much faster timer for setTimeout and setInterval—as Tom Anthony points out, “Why actually wait 5 seconds when you are a bot?” # 7th February 2018, 2:41 am

owlsnearme source code on GitHub. Here’s the source code for our new project. It’s a single-page React application that pulls all of its data from the iNaturalist API. We built it this weekend with the SuperbOwl kick-off as a hard deadline so it’s not the most beautiful React code, but it’s a nice demonstration of how React (and create-react-app in particular) can be used for rapid development. # 4th February 2018, 10:33 pm

Owls Near Me. Back in 2010 Natalie and I shipped—a website for finding your nearest owls, using data from the sadly deceased WildlifeNearYou (RIP). To celebrate #SuperbOwl Sunday we rebuilt the same concept on top of the excellent iNaturalist API. Search for a place to see which owls have been spotted there, or click the magic button to geolocate your device and see which owls have been spotted in your nearby area! # 4th February 2018, 10:26 pm

Conditional aggregation in Django 2.0 (via) I hadn’t realised how clever this new Django ORM feature by Tom Forbes is. It lets you build an aggregation against a subset of rows, e.g. Client.objects.aggregate(regular=Count(’pk’, filter=Q(account_type=Client.REGULAR)))—then if you are using PostgreSQL it translates it into a fast FILTER WHERE clause, while other databases emulate the same behaviour using a CASE statement. # 3rd February 2018, 9:38 pm

Imagine a Simon Says style game where I present an article found on the web on a projector. Students research for two to three minutes, then respond by standing or staying seated to signal if they believe the article is true or fake. My students absolutely loved the game. Some refused to go to recess until I gave them another chance to figure out the next article I had queued.

Scott Bedley # 3rd February 2018, 1:29 pm

How I made a Who’s On First subset database. Inspired by Paul Ford on Twitter, I tried out a new trick with SQLite: connect to a database containing JSON, attach a brand new empty database file using “attach database”, then populate it using INSERT INTO ... SELECT plus the json_extract() function to extract out a subset of the JSON properties into a new table in the new database. # 3rd February 2018, 5:25 am

Just switched to {window.localStorage.getItem(’debug’) && <pre>{JSON.stringify(this.state, null, 2)}</pre>}—now I can ship to production and turn on debugging in my console with localStorage.setItem(’debug’, 1)

@simonw # 3rd February 2018, 5:23 am

asgiref: AsyncToSync and SyncToAsync (via) Andrew’s classes in asgiref that can turn a synchronous callable into an awaitable (that runs in a thread pool) or an awaitable callable into a synchronous callable, using Python 3 futures and asyncio. # 2nd February 2018, 7:06 pm

Channels 2.0. Andrew just shipped Channels 2.0—a major rewrite and redesign of the Channels project he started back in 2014. Channels brings async to Django, providing a logical, standardized way of supporting things like WebSockets and asynchronous execution on top of a Django application. Previously it required you to run a separate Twisted server and redis/RabbitMQ queue, but thanks to Python 3 async everything can now be deployed as a single process. And the new ASGI spec means its turtles all the way down! Everything from URL routing to view functions to middleware can be composed together using the same ASGI interface. # 2nd February 2018, 6:19 pm

Family fun with deepfakes. Or how I got my wife onto the Tonight Show. deepfakes is dystopian nightmare technology: take a few thousand photos of two different people with similar shaped faces and you can produce an extremely realistic video where you swap one person’s face for the other. Unsurprisingly it’s being used for porn. This is a pleasantly SFW explanation of how it works, complete with a demo where Sven Charleer swaps his wife Elke for Anne Hathaway on the Tonight Show. # 2nd February 2018, 4:06 pm

Using in Your (Django) Project. Includes this neat trick: if you list in the setup(scripts=) argument you can call it from e.g. cron using the full path to within your virtual environment and it will execute in the correct context without needing to explicitly activate the environment first. # 2nd February 2018, 12:33 pm

How the Citus distributed database rebalances your data. Citus is a fascinating implementation of database sharding built on top of PostgreSQL primitives. PostgreSQL 10 introduced extremely flexible logical replication—in this post Craig Kerstiens explains how Citus use this new ability to re-balance shards (e.g. when you move from two to four physical PostgreSQL nodes) without downtime. # 1st February 2018, 10:50 pm

Building a Full-Text Search App Using Docker and Elasticsearch. Deep, comprehensive tutorial from Patrick Triest showing how to use docker-compose to run three containers (Node API, nginx static content, elasticsearch) and then use that to build a neat Vue.js web search UI against 100 books from Project Gutenberg. # 1st February 2018, 3:41 pm

What we need to do is come up with a way to help people understand that there are ways to never be lost again, and to listen to any music you want, and to video chat with someone on the other side of the world, without them having to feel disquieted about it. That it’s not OK that you’re made to feel weirded out. That it’s possible for there to be alternatives. That having to feel someone rooting around in your life is not a price you should have to pay.

Stuart Langridge # 1st February 2018, 2:03 pm