Simon Willison’s Weblog

Subscribe

Blogmarks in Feb, 2018

Filters: Type: blogmark × Year: 2018 × Month: Feb × Sorted by date


Responsive Components: a Solution to the Container Queries Problem (via) Philip Walton uses Chrome’s new ResizeObserver API (best described as document.onresize for elements, currently a W3C Editor’s Draft, not yet supported by other browsers) to implement a media-query style mechanism for applying CSS based on the size of the parent container. This is really clever. In the absence of ResizeObserver (which can be polyfilled) it can fall back to showing the narrowest design, which is probably best for mobile anyway. Desktop browsers are better equipped to run the polyfill. # 27th February 2018, 1:21 pm

r1chardj0n3s/parse: Parse strings using a specification based on the Python format() syntax. (via) Really neat API design: parse() behaves almost exactly in the opposite way to Python’s built-in format(), so you can use format strings as an alternative to regular expressions for extracting specific data from a string. # 25th February 2018, 4:58 pm

kennethreitz/requests-html: HTML Parsing for Humans™ (via) Neat and tiny wrapper around requests, lxml and html2text that provides a Kenneth Reitz grade API design for intuitively fetching and scraping web pages. The inclusion of html2text means you can use a CSS selector to select a specific HTML element and then convert that to the equivalent markdown in a one-liner. # 25th February 2018, 4:49 pm

github-trending-repos (via) This is a really clever hack: Vitaliy Potapov built a system for subscribing to a weekly digest of trending GitHub repos in your favourite languages entirely on top of the existing GitHub issues notification system. Find the issue for your particular language and hit “subscribe” and you’ll get an email (or push notification depending on how you get your issue notifications) once a week with the latest trends. The implementation is a 220 line Node.js script which runs on a daily and weekly schedule using Circle CI, so Vitaliy doesn’t even have to host or pay for any of the underlying infrastructure. It’s brilliant. # 23rd February 2018, 5:36 pm

GitHub: Weak cryptographic standards removal notice. GitHub deprecated TLSv1 and TLSv1.1 yesterday. I like how they handled the deprecation: they disabled the protocols for one hour on February 8th in order to (hopefully) warm people by triggering errors in automated processes, then disabled them completely a couple of weeks later. # 23rd February 2018, 3:41 pm

I’ve Just Launched “Pwned Passwords” V2 With Half a Billion Passwords for Download (via) Troy Hunt has collected 501,636,842 passwords from a wide collection of major breaches. He suggests using the to build a password strength checker that can say “your password has been used by 53,274 other people”. The full collection is available as a list of SHA1 codes (brute-force reversible but at least slightly obfuscated) in an 8GB file or as an API. Where things get really clever is the API design: you send just the first 5 characters of the SHA1 hash of the user’s password and the API responds with the full list of several hundred hashes that match that prefix. This lets you build a checking feature without sharing full passwords with a remote service, if you don’t want to host the full 8GB of data yourself. # 22nd February 2018, 7:24 pm

s3monkey: A Python library that allows you to interact with Amazon S3 Buckets as if they are your local filesystem. (via) A particularly devious hack by Kenneth Reitz—provides a context manager within which various Python filesystem APIs such as open() and os.listdir() are monkeypatched to operate against an S3 bucket instead. Kenneth built it to make it easier to work with files from apps running on Heroku. Under the hood it uses pyfakefs, a filesystem mocking library originally released by Google. # 21st February 2018, 5:54 pm

A Promenade of PyTorch. Useful overview of the PyTorch machine learning library from Facebook AI Research described as “a Python library enabling GPU-accelerated tensor computation”. Similar to TensorFlow, but where TensorFlow requires you to explicitly construct an execution graph PyTorch instead lets you write regular Python code (if statements, for loops etc) which PyTorch then uses to construct the execution graph for you. # 21st February 2018, 5:31 am

Andrew Godwin’s www-router Docker container (via) Really clever Docker trick: a container that runs Nginx and uses it to route traffic to other containers based on the hostname—but the hostnames to be routed are configured using environment variables which the run-nginx.py CMD script uses to dynamically construct an nginx config when the container starts. # 21st February 2018, 5:04 am

Photos from our tour of the amazing bone collection of Ray Bandar. Ray Bandar (1927-2017) was an artist, scientist, naturalist and an incredibly prolific collector of bones. His collection is in the process of moving to the California Academy of Sciences but Natalie managed to land us a private tour lead by his great nephew. The collection is truly awe-inspiring, and a testament to an extraordinary life lived following a very particular passion. # 21st February 2018, 4:58 am

Moving a large and old codebase to Python3 (via) Really interesting case study full of good ideas. The codebase in this case was 240,000 lines of Python and Django written over the course of 15 years. The team used Python-Modernize to aid their transition to a six-compatible codebase first. # 20th February 2018, 2:39 pm

Python & Async Simplified. Andrew Godwin: “Python’s async framework is actually relatively simple when you treat it at face value, but a lot of tutorials and documentation discuss it in minute implementation detail, so I wanted to make a higher-level overview that deliberately ignores some of the small facts and focuses on the practicalities of writing projects that mix both kinds of code.” ‪This is really useful: clearly explains the two separate worlds of Python (sync and async functions) and describes Andrew’s clever sync_to_async and async_to_sync decorators as well.‬ # 20th February 2018, 12:30 am

Googlebot’s Javascript random() function is deterministic. random() as executed by Googlebot returns the same predicable sequence. More interestingly, Googlebot runs a much faster timer for setTimeout and setInterval—as Tom Anthony points out, “Why actually wait 5 seconds when you are a bot?” # 7th February 2018, 2:41 am

owlsnearme source code on GitHub. Here’s the source code for our new owlsnearme.com project. It’s a single-page React application that pulls all of its data from the iNaturalist API. We built it this weekend with the SuperbOwl kick-off as a hard deadline so it’s not the most beautiful React code, but it’s a nice demonstration of how React (and create-react-app in particular) can be used for rapid development. # 4th February 2018, 10:33 pm

Owls Near Me. Back in 2010 Natalie and I shipped owlsnearyou.com—a website for finding your nearest owls, using data from the sadly deceased WildlifeNearYou (RIP). To celebrate #SuperbOwl Sunday we rebuilt the same concept on top of the excellent iNaturalist API. Search for a place to see which owls have been spotted there, or click the magic button to geolocate your device and see which owls have been spotted in your nearby area! # 4th February 2018, 10:26 pm

Conditional aggregation in Django 2.0 (via) I hadn’t realised how clever this new Django ORM feature by Tom Forbes is. It lets you build an aggregation against a subset of rows, e.g. Client.objects.aggregate(regular=Count(’pk’, filter=Q(account_type=Client.REGULAR)))—then if you are using PostgreSQL it translates it into a fast FILTER WHERE clause, while other databases emulate the same behaviour using a CASE statement. # 3rd February 2018, 9:38 pm

How I made a Who’s On First subset database. Inspired by Paul Ford on Twitter, I tried out a new trick with SQLite: connect to a database containing JSON, attach a brand new empty database file using “attach database”, then populate it using INSERT INTO ... SELECT plus the json_extract() function to extract out a subset of the JSON properties into a new table in the new database. # 3rd February 2018, 5:25 am

asgiref: AsyncToSync and SyncToAsync (via) Andrew’s classes in asgiref that can turn a synchronous callable into an awaitable (that runs in a thread pool) or an awaitable callable into a synchronous callable, using Python 3 futures and asyncio. # 2nd February 2018, 7:06 pm

Channels 2.0. Andrew just shipped Channels 2.0—a major rewrite and redesign of the Channels project he started back in 2014. Channels brings async to Django, providing a logical, standardized way of supporting things like WebSockets and asynchronous execution on top of a Django application. Previously it required you to run a separate Twisted server and redis/RabbitMQ queue, but thanks to Python 3 async everything can now be deployed as a single process. And the new ASGI spec means its turtles all the way down! Everything from URL routing to view functions to middleware can be composed together using the same ASGI interface. # 2nd February 2018, 6:19 pm

Family fun with deepfakes. Or how I got my wife onto the Tonight Show. deepfakes is dystopian nightmare technology: take a few thousand photos of two different people with similar shaped faces and you can produce an extremely realistic video where you swap one person’s face for the other. Unsurprisingly it’s being used for porn. This is a pleasantly SFW explanation of how it works, complete with a demo where Sven Charleer swaps his wife Elke for Anne Hathaway on the Tonight Show. # 2nd February 2018, 4:06 pm

Using setup.py in Your (Django) Project. Includes this neat trick: if you list manage.py in the setup(scripts=) argument you can call it from e.g. cron using the full path to manage.py within your virtual environment and it will execute in the correct context without needing to explicitly activate the environment first. # 2nd February 2018, 12:33 pm

How the Citus distributed database rebalances your data. Citus is a fascinating implementation of database sharding built on top of PostgreSQL primitives. PostgreSQL 10 introduced extremely flexible logical replication—in this post Craig Kerstiens explains how Citus use this new ability to re-balance shards (e.g. when you move from two to four physical PostgreSQL nodes) without downtime. # 1st February 2018, 10:50 pm

Building a Full-Text Search App Using Docker and Elasticsearch. Deep, comprehensive tutorial from Patrick Triest showing how to use docker-compose to run three containers (Node API, nginx static content, elasticsearch) and then use that to build a neat Vue.js web search UI against 100 books from Project Gutenberg. # 1st February 2018, 3:41 pm

Observable notebook: San Francisco trees from Datasette. I used an Observable notebook to rebuild my San Francisco tree search demo against a Datasette API of a CSV of trees published by the SF Department of Public Works. The map updates live as you type a query, and every cell can be toggled to view the underlying source code. # 1st February 2018, 12:37 am