Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

Alt-texts: The Ultimate Guide. By Daniel Göransson, a web developer with vision impairment who uses a screen reader. This is the best, most practical guide to writing image alt text I’ve seen. Just one of the neat tips contained within: consider ending your alt text in a period, so the screen user knows to pause.

# 6th November 2017, 4:54 pm / accessibility, alt-text

walrus. Fascinating collection of Python utilities for working with Redis, by Charles Leifer. There are a ton of interesting ideas in here. It starts with Python object wrappers for Redis so you can interact with lists, sets, sorted sets and Redis hashes using Python-like objects. Then it gets really interesting: walrus ships with implementations of autocomplete, rate limiting, a graph engine (using a sorted set hexastore) and an ORM-style models mechanism which manages secondary indexes and even implements basic full-text search.

# 6th November 2017, 1:14 am / python, redis, charles-leifer

direnv (via) A shell extension (for bash, zsh and others) which can automatically set and unset environment variables when you cd into specific directories. Useful for managing things like a project’s GOPATH or automatically activating Python virtual environments.

# 5th November 2017, 7:59 pm / bash, shell, zsh

Landsat on AWS (via) TIL Amazon make data from the Landsat 8 satellite available for free on S3 (though they are no doubt hoping you’ll pay for EC2 instances to process the data). “All new Landsat 8 scenes are made available each day, often within hours of production. The satellite images the entire Earth every 16 days at a roughly 30 meter resolution”.

# 5th November 2017, 7:56 pm / aws, satellite

Try hosting on PyPy by simonw. I had a go at hosting my blog on PyPy. Thanks to the combination of Travis CI, Sentry and Heroku it was pretty easy to give it a go—I had to swap psycopg2 for psycopg2cffi and switch to the currently undocumented pypy3-5.8.0 Heroku runtime (pypy3-5.5.0 is only compatible with Python 3.3, which Django 2.0 does not support). I ran it in production for a few minutes and didn’t get any Sentry errors but did end up using more Heroku dyno memory than I’m comfortable with—see the graph I posted in a comment. I’m going to stick with CPython 3.6 for the moment. Amusingly I did almost all of the work on this on my phone! Travis CI means it’s easy to create and test a branch through GitHub’s web UI, and deploying a tested branch to Heroku is then just a button click.

# 5th November 2017, 7:17 pm / pypy, python, heroku, travis, sentry

Super Fast String Matching in Python (via) Interesting technique for calculating string similarity at scale in Python, with much better performance than Levenshtein distances. The trick here uses TF/IDF against N-Grams, plus a CSR (Compressed Sparse Row) scipy matrix to run the calculations. Includes clear explanations of each of these concepts.

# 5th November 2017, 3:26 pm / python, scipy

Docker.qcow2 never shrinks—disk space usage leak in docker for mac (via) Interesting year-long thread on disk usage by Docker for Mac, including a bunch of potential workarounds for if it swallows too much disk space.

# 5th November 2017, 3:06 pm / docker

On Being A Senior Engineer. Thoughts on characteristics of a mature engineer from John Allspaw back in 2012. So much good thinking in here—my favourite piece of writing on the subject.

# 5th November 2017, 6:16 am / john-allspaw, careers

Docker Containers on the Desktop (via) Jessie Frazelle’s classic explanation from 2015 of how she runs every desktop application on her Linux machine in its own Docker container.

# 5th November 2017, 4:16 am / linux, docker

gillyb/sensitive: A native desktop version of the kibana sense plugin. I love using the Sense UI for developing against Elasticsearch, but it’s infuriatingly hard to obtain these days. You can install it as a Kibana plugin but I work with multiple Elasticsearch instances and I don’t want to have to get it installed on all of them. Until recently I was using a Chrome extension for it, but that’s now been disabled as containing malware and removed from the Chrome extension store. I’ve now switched to Sensitive, which packages Sense up as a native OS X application using Electron.

# 4th November 2017, 7:35 pm / elasticsearch, electron

Animoji karaoke performing Bohemian Rhapsody (via) Animoji just might be the most important advance in computer science in a decade.

# 4th November 2017, 7:29 pm / emoji

My blog: Items tagged “askmetafilter”. I’ve imported all 55 of my answers to questions on Ask MetaFilter (to accompany my previous Quora import) going back to 2005.

# 4th November 2017, 4:43 am / ask-metafilter

The Story Behind the Chicago Newspaper That Bought a Bar (via) Absolutely fascinating story—the Chicago Sun-Times bought a bar back in 1976 to investigate corrupt city inspectors, staffing it with journalists and with photographers hidden in a back room.

# 3rd November 2017, 3:27 pm / journalism

Connecting to Google Sheets with Python. Useful guide to interacting with Google Sheets via the gspread python library, including how to work with Google’s unintuitive “service account keys”.

# 3rd November 2017, 4:13 am / googlespreadsheet, python

How Adversarial Attacks Work. Adversarial attacks against machine learning classifiers involve constructing an input that deliberately produces the wrong classification. This article shows how these can be constructed, and includes examples generated using PyTorch which produce a sports car that gets identified as a toaster and a photo of Sylvester Stallone that gets classified as Keanu Reeves.

# 2nd November 2017, 8:25 pm / machine-learning, python

A Minimalist Guide to SQLite. Pretty comprehensive actually—covers the sqlite3 command line app, importing CSVs, integrating with Python, Pandas and Jupyter notebooks, visualization and more.

# 2nd November 2017, 1:23 am / pandas, python, sqlite, jupyter

Hemingway Editor. Hemingway is a web-based editor that applies style checks to your writing. It looks for complicated words, unnecessary adverbs and sentences that are hard to read. It highlighted the previous sentence as hard to read. It gave this whole paragraph a Grade 8 readability score.

# 1st November 2017, 8:38 pm / writing

Live htop. Neat, simplest-thing-that-could-possibly-work implementation of a tool that continually pipes the output of the htop command to a browser over a WebSocket. The htopgen.sh scripts loops every 2 seconds, runs htop, pipes it through a utility to convert the output to HTML and writes that to a file. Then the server.js Node.js script watches for changes to that file and pipes the entire file contents to the browser via socket.io. The index.html page in the browser subscribes to the WebSocket and updates the entire page using innerHTML every time it receives an event.

# 1st November 2017, 6:07 pm / nodejs, websockets

Oxford Deep NLP 2017 course (via) Slides, course description and links to lecture videos for the 2017 Deep Natural Language Processing course at the University of Oxford presented by a team from Google DeepMind.

# 31st October 2017, 8:39 pm / google, machine-learning, nlp

Squeezing every drop of performance out of a Django app on Heroku. Ben Firshman describes some lesser known tricks for scaling Django on Heroku—in particular, using gunicorn gevent asynchronous workers and setting up PostgreSQL connection pooling using django-db-geventpool.

# 31st October 2017, 2:08 pm / django, postgresql, heroku

Scaling the GitLab database. Lots of interesting details on how GitLab have worked to scale their PostgreSQL setup. They’ve avoided sharding so far, instead opting for database pooling with pgbouncer and read-only replicas using hot standbys. I like the way they deal with replica lag—they store the current WAL position in a redis key for the user every time there’s a write, then use pg_last_xlog_replay_location() on the various replicas to check and see if they have caught up next time the user makes a request that needs to read some data.

# 30th October 2017, 8:53 pm / postgresql, redis, replication, scaling, gitlab

Fun facts about Rust’s growing popularity. TIL Rust is used “on hundreds of millions of desktops via Dropbox”. And Rust developers are called Rustaceans.

# 30th October 2017, 4:45 am / dropbox, rust

Exploring United States Policing Data Using Python. Outstanding introduction to data analysis with Jupyter and Pandas.

# 29th October 2017, 4:58 pm / open-data, pandas, python, jupyter

Fast GeoSpatial Analysis in Python. Some clever advanced performance tricks with Cython and Dask, but it also introduced me to GeoPandas.

# 29th October 2017, 4:47 pm / geospatial, python

How to teach technical concepts with cartoons. Julia Evans: “This post is about a few patterns I use when illustrating ideas about computers. If you are interested in using drawings to teach people about your very favorite computer topics, hopefully this will help you!”

# 28th October 2017, 2:55 pm / teaching, julia-evans

profiling. “An interactive continuous Python profiler”. This is really neat—simply run “profiling myscript.py” to get an interactive, navigable console-based profile inspector at the end of your script... or run “profiling live-profile mywebverver.py” to see a live, updating profile of a long-running process. Has options for statistical profiling as well, which has a much lower overhead in exchange for a less accurate view of what is going on.

# 28th October 2017, 1:06 am / profiling, python

Benefit of TEXT with CHECK over VARCHAR(X) in PostgreSQL. Brandur suggests using email TEXT CHECK (char_length(email) <= 255) to define a column with a length limit in PostgreSQL over VARCHAR(255) because TEXT and VARCHAR are equally performant but a CHECK length can be changed later on without locking the table, whereas a VARCHAR requires an ALTER TABLE with an exclusive lock.

# 28th October 2017, 12:59 am / postgresql, brandur-leach

Implementing Stripe-like Idempotency Keys in Postgres (via) Having clients send “idempotency keys” with API requests in order to be able to safely retry them if something’s goes wrong is a really neat trick for making transactional APIs more robust. Here Brandur Leach talks implementation strategies.

# 27th October 2017, 5:51 pm / api-design, idempotency, postgresql, brandur-leach, stripe

Dynamically changing favicons with JavaScript. I just spotted GitHub using this trick to swap out the favicon on a pull request that hadn’t passed its tests yet for github.com/favicon-failure.ico with a little cross icon in the corner.

# 26th October 2017, 7:56 pm / favicons

Contributors to python/cpython, Aug 5, 1990—Oct 26, 2017. I love how the graphs on this page summarize the history of the last 27 years of Python development, showing exactly when each core contributor was most active.

# 26th October 2017, 6:58 pm / python

Years

Tags