Simon Willison’s Weblog

Subscribe

October 2017

Oct. 22, 2017

Parse shell one-liners with pyparsing. Neat introduction to the pyparsing library, both for parsing tokens into labeled sections and constructing an AST from them.

# 1:35 pm / parsing, python

System Requirements For SQLite. Document describing the high level goals and objectives of SQLite. Like everything to do with SQLite this exhibits some incredibly well thought out software engineering. I particularly like “S80000: SQLite shall exhibit ductile failure characteristics“ where ductile is described in opposition to brittle: a ductile system begins showing signs of trouble well in advance of failure.

# 9:53 pm / sqlite

Oct. 23, 2017

hupper (via) Handy Python module for adding “live reload” development support to just about anything. I’m using it with Sanic—I run “hupper -m app” and it starts up my code in app.py and automatically reloads it any time any of the corresponding files changes on disk.

# 12:34 am / python, sanic

Can You Afford It?: Real-world Web Performance Budgets. Alex Russell’s magnum opus on web performance budgets in 2017. He proposes a baseline testing device equivalent to a $200 Android phone on a slow 3G network emulated at 400ms RTT/400Kbps transfer and encourages a goal of 5s time-to-interactive on first load and 2s TTI for subsequent views. This means around 130kb of gzipped JavaScript—challenging but not impossible with modern JavaScript frameworks.

# 1:51 pm / javascript, web-performance, alex-russell

Getting Your First 10 Customers (via) There is so much good advice embedded in this article by Patrick McKenzie, and it constantly comes back to the theme of doing whatever it takes to get to your first ten paying customers.

# 5:36 pm / startups, sales, patrick-mckenzie

We made a sale for Appointment Reminder from someone whose only way of getting data into the system was to fax it to us. Guess the cheat. If you guess "CEO signs up for HelloFax, receives the fax, and types 600 patient names and phone numbers by hand" you have good instincts.

Patrick McKenzie

# 5:45 pm / sales, patrick-mckenzie

Oct. 24, 2017

Saying Goodbye to Firebug. Firebug has reached end-of-life as a separate extension, replaced by the built-in Firefox Developer Tools. It’s hard to overstate how influential Firebug has been in helping to turn front-end web development into a respected and productive engineering discipline.

# 10:38 pm / firebug, firefox

Oct. 25, 2017

Introducing Faktory. A brand new worker queue system from Mike Perham, the author of Sidekiq for Ruby. It’s written in Go on top of RocksDB and is explicitly designed to support clients and workers in multiple different languages.

# 3:09 am / go, queues, sidekiq

arxiv-vanity (via) Beautiful new project from Ben Firshman and Andreas Jansson: “Arxiv Vanity renders academic papers from Arxiv as responsive web pages so you don’t have to squint at a PDF”. It works by pulling the raw LaTeX source code from Arxiv and rendering it to HTML using a heavily customized Pandoc workflow. The real fun is in the architecture: it’s a Django app running on Heroku which fires up on-demand Hyper.sh Docker containers for each individual rendering job.

# 8:06 pm / docker, science, ben-firshman, django, pdf

Oct. 26, 2017

Elaborate Halloween Costume Tips from a 19th-Century Guide to Fancy Dress (via) The gilded age had some ridiculous parties. Here are highlights of the most popular costume guide of the era, now available on the Internet Archive.

# 2:01 pm / internet-archive, history

Early this year, the U.S. intelligence community named RT and Sputnik as implementing state-sponsored Russian efforts to interfere with and disrupt the 2016 Presidential election, which is not something we want on Twitter.

Twitter PublicPolicy

# 2:38 pm / twitter

Contributors to python/cpython, Aug 5, 1990—Oct 26, 2017. I love how the graphs on this page summarize the history of the last 27 years of Python development, showing exactly when each core contributor was most active.

# 6:58 pm / python

Dynamically changing favicons with JavaScript. I just spotted GitHub using this trick to swap out the favicon on a pull request that hadn’t passed its tests yet for github.com/favicon-failure.ico with a little cross icon in the corner.

# 7:56 pm / favicon

Oct. 27, 2017

I’ve heard managers and teams mandating 100% code coverage for applications. That’s a really bad idea. The problem is that you get diminishing returns on our tests as the coverage increases much beyond 70% (I made that number up… no science there). Why is that? Well, when you strive for 100% all the time, you find yourself spending time testing things that really don’t need to be tested. Things that really have no logic in them at all (so any bugs could be caught by ESLint and Flow). Maintaining tests like this actually really slow you and your team down.

Kent C. Dodds

# 6:20 am / testing

Implementing Stripe-like Idempotency Keys in Postgres (via) Having clients send “idempotency keys” with API requests in order to be able to safely retry them if something’s goes wrong is a really neat trick for making transactional APIs more robust. Here Brandur Leach talks implementation strategies.

# 5:51 pm / api-design, idempotency, postgresql, brandur-leach, stripe

In the Bay Area, we have a collection of fiefdoms. Villages are parading as cities, addressing problems myopically. For example, Brisbane (a city of 5,000 people between San Francisco and SFO) is currently blocking a large housing development for local reasons. It’s NIMBY-ism on a broad scale – a regional tragedy of the commons.

Justin Krause

# 7:22 pm / san-francisco

Late night dining near Great American Music Hall

Tommy’s Joynt is a couple of blocks away and is a San Francisco institution—great comfort food, inexpensive, crammed with personality and open late.

[... 40 words]

Oct. 28, 2017

Benefit of TEXT with CHECK over VARCHAR(X) in PostgreSQL. Brandur suggests using email TEXT CHECK (char_length(email) <= 255) to define a column with a length limit in PostgreSQL over VARCHAR(255) because TEXT and VARCHAR are equally performant but a CHECK length can be changed later on without locking the table, whereas a VARCHAR requires an ALTER TABLE with an exclusive lock.

# 12:59 am / postgresql, brandur-leach

profiling. “An interactive continuous Python profiler”. This is really neat—simply run “profiling myscript.py” to get an interactive, navigable console-based profile inspector at the end of your script... or run “profiling live-profile mywebverver.py” to see a live, updating profile of a long-running process. Has options for statistical profiling as well, which has a much lower overhead in exchange for a less accurate view of what is going on.

# 1:06 am / profiling, python

How to teach technical concepts with cartoons. Julia Evans: “This post is about a few patterns I use when illustrating ideas about computers. If you are interested in using drawings to teach people about your very favorite computer topics, hopefully this will help you!”

# 2:55 pm / teaching, julia-evans

Oct. 29, 2017

Fast GeoSpatial Analysis in Python. Some clever advanced performance tricks with Cython and Dask, but it also introduced me to GeoPandas.

# 4:47 pm / gis, python

Exploring United States Policing Data Using Python. Outstanding introduction to data analysis with Jupyter and Pandas.

# 4:58 pm / pandas, open-data, jupyter, python

Oct. 30, 2017

Fun facts about Rust’s growing popularity. TIL Rust is used “on hundreds of millions of desktops via Dropbox”. And Rust developers are called Rustaceans.

# 4:45 am / rust, dropbox

Scaling the GitLab database. Lots of interesting details on how GitLab have worked to scale their PostgreSQL setup. They’ve avoided sharding so far, instead opting for database pooling with pgbouncer and read-only replicas using hot standbys. I like the way they deal with replica lag—they store the current WAL position in a redis key for the user every time there’s a write, then use pg_last_xlog_replay_location() on the various replicas to check and see if they have caught up next time the user makes a request that needs to read some data.

# 8:53 pm / scaling, postgresql, redis, gitlab, replication

Oct. 31, 2017

Squeezing every drop of performance out of a Django app on Heroku. Ben Firshman describes some lesser known tricks for scaling Django on Heroku—in particular, using gunicorn gevent asynchronous workers and setting up PostgreSQL connection pooling using django-db-geventpool.

# 2:08 pm / postgresql, heroku, django

Oxford Deep NLP 2017 course (via) Slides, course description and links to lecture videos for the 2017 Deep Natural Language Processing course at the University of Oxford presented by a team from Google DeepMind.

# 8:39 pm / machine-learning, nlp, google

2017 » October

MTWTFSS
      1
2345678
9101112131415
16171819202122
23242526272829
3031