Blogmarks
Filters: Sorted by date
pillow-simd (via) A “friendly fork” of the Python Pillow image library that takes advantage of SIMD operations on certain CPUs to obtain massive speed-ups—they claim 16 to 40 times faster than ImageMagick.
Using SVG as placeholders — More Image Loading Techniques. This is such a good idea: generate a tiny SVG placeholder for an image, and display that until the image itself has loaded. This article explores potential ways of generating those SVGs in some depth.
Datasettes · simonw/datasette. I’m collecting examples of datasette-powered APIs on the project wiki.
Datasette for Polar Bears. I found a fun dataset of Polar Bear ear tag tracking data put out by the USGS Alaska Science Center and deployed it using datasette in just a couple of minutes—here’s how I did it.
How to train your own Object Detector with TensorFlow’s Object Detector API (via) Dat Tran built a TensorFlow model that can detect raccoons! Impressive results, especially given it was only trained on 200 raccoon images from Google Image search.
Entering the Quantum Era—How Firefox got fast again and where it’s going to get faster. I’ve been trying out the beta of Firefox 57 and it’s fantastic. All of that work on Servo and Rust is definitely paying off!
CSS element() function (via) Vincent De Oliveira explores the CSS element function (2 years ago), which sadly is still only supported by Firefox. It lets you render any HTML element as a CSS background image, enabling folding effects, animated backgrounds, live previews and more.
simonw/csvs-to-sqlite. I built a simple tool for bulk converting multiple CSV files into a SQLite database.
Inside a low budget consumer hardware espionage implant. Fascinating break-down of a cheap ($25) USB charging cable with a hidden microphone and SIM that can be activated via SMS message or by calling the device.
Exploring Line Lengths in Python Packages. Interesting exploration of the impact if the 79 character length limit rule of thumb on various Python packages—and a thoroughly useful guide to histogram plotting in Jupyter, pandas and matplotlib.
Interactive Database of the World’s River Basins (via) “This database provides the first-ever compilation of the world’s river basins developed specifically for corporate disclosure. It features a comprehensive list of river basins worldwide, including their names, boundaries, and other helpful information.”
Pink Trombone (via) Explorable explanation of the human vocal system by Neil Thapen. View source for some pleasantly readable JavaScript / canvas / window.AudioContext code.
dhash (via) Python library to calculate the perceptual difference hash for an image. Delightfully simple algorithm that’s fully explained in the README—it works by scaling the image to 8x8 grayscale and then creating a bitmap representing of each pixel is lighter or darker than the previous one.
Use a Render Prop! Michael Jackson makes a convincing argument for using the render prop pattern in React in preference to HOCs (Higher Order Components, the React equivalent of Python class decorators).
The Booming Japanese Rent-a-Friend Business. “In Japan, you can pay an actor to impersonate your relative, spouse, coworker, or any kind of acquaintance.”. Surprisingly poignant interview with a man who runs a company that employs 800 professional actors for almost any conceivable situation.
How Sentry Receives 20 Billion Events Per Month While Preparing To Handle Twice That. RabbitMQ federation, nginx and HAProxy, Riak as a key/value store, data processing is still mainly Python with a little bit of Rust. As of July 2017 it’s all hosted on Google Cloud Platform.
Eager Execution: An imperative, define-by-run interface to TensorFlow. Lets you evaluate TensorFlow expressions interactively in Python without needing to constantly run tf.Session().run(variable).
TensorFlow 101. Concise, readable introduction to TensorFlow, with Python examples you can execute (and visualize) in Jupyter.
spaCy. “Industrial-strength Natural Language Processing in Python”. Exciting alternative to nltk—spaCy is mostly written in Cython, makes bold performance claims and ships with a range of pre-built statistical models covering multiple different languages. The API design is clean and intuitive and spaCy even includes an SVG visualizer that works with Jupyter.
Redis Streams and the Unified Log. In which Brandur Leach explores the new Kafka-style streams functionality coming to Redis 4.0, and shows an example of a robust at-least once processing architecture built on a combination of Redis streams and PostgreSQL transactions. I really like the pattern of writing log records to a staging table in PostgreSQL first in order to bundle them up in the same transaction as the originating state change, then have a separate process read them from that table and publish them to Redis.
ZEIT – 6x Faster Now Uploads with HTTP/2 (via) Fantastic optimization write-up by Pranay Prakash. The Now deployment tool works by computing a hash for every local file in a project, then uploading just the ones that are missing. Pranay switched to uploading over HTTP/2 using the fetch-h2 library and got a 6x speedup for larger projects.
Feature Visualization (via) Another gorgeous paper published on Distill, the journal that prides itself on including interactive visualizations to help provide clear explanations of machine learning.
GOV.UK Registers (via) Canonical sources of “lists of information” intended for use by GDS teams building software for the UK government, but available for anyone. 17 registers are “ready for use”, 45 are “in progress”. Covers things like the FCO’s country list, the official list of prison estates, and DEFRA’s list of public bodies in England that manage drainage systems.
Pull request #4120 · python/cpython. I just had my first ever change merged into Python! It was a one sentence documentation improvement (on how to cancel SQLite operations) but it was fascinating seeing how Python’s GitHub flow is set up—clever use of labels, plus a bot that automatically checks that you have signed a copy of their CLA.
Cloud SQL for PostgreSQL adds high availability and replication. Google Cloud Platform now offers PostgreSQL with automatic asynchronous disk-level replication to a separate instance in a different availability zone, via their new “Regional Disks“ feature. Between this, Heroku, Citus and Amazon RDS the appeal of a self-maintained PostgreSQL instance continues to fall.
Something is wrong on the internet. James Bridle takes a fascinating and deeply troubling dive into the world of Kids’ YouTube videos, which appear to be increasingly algorithmically generated and are evolving in a very dark direction.
How Balanced does Database Migrations with Zero-Downtime. I’m fascinated by the idea of “pausing” traffic during a blocking site maintenance activity (like a database migration) and then un-pausing when the operation is complete—so end clients just see some of their requests taking a few seconds longer than expected. I first saw this trick described by Braintree. Balanced wrote about a neat way of doing this just using HAproxy, which lets you live reconfigure the maxconns to your backend down to zero (causing traffic to be queued up) and then bring the setting back up again a few seconds later to un-pause those requests.
Secondary indexing with Redis. I haven’t seen this section of the official Redis documentation before, and it’s absolutely fantastic—well worth reading the whole thing. It talks through various ways in which you can set up indexes in Redis, mainly by leaning on sorted sets—which it turns out will binary lexicographically sort items with the same score. This makes it easy to implement autocomplete with Redis—but if you use them creatively you can implement subject/predicate/object graph searches or even N-dimensional range queries as well.
I’m a Unicorn. I got to try out Animoji on an iPhone X, and it was amazing.
How technology helped a blind athlete run free at the New York Marathon. Fascinating piece on technology to help blind people better navigate the world—combing GPS and chest-mounted ultrasonic sonar.