Simon Willison’s Weblog

Subscribe

Items in Oct, 2017

Filters: Year: 2017 × Month: Oct × Sorted by date


Oxford Deep NLP 2017 course (via) Slides, course description and links to lecture videos for the 2017 Deep Natural Language Processing course at the University of Oxford presented by a team from Google DeepMind. # 31st October 2017, 8:39 pm

Squeezing every drop of performance out of a Django app on Heroku. Ben Firshman describes some lesser known tricks for scaling Django on Heroku—in particular, using gunicorn gevent asynchronous workers and setting up PostgreSQL connection pooling using django-db-geventpool. # 31st October 2017, 2:08 pm

Scaling the GitLab database. Lots of interesting details on how GitLab have worked to scale their PostgreSQL setup. They’ve avoided sharding so far, instead opting for database pooling with pgbouncer and read-only replicas using hot standbys. I like the way they deal with replica lag—they store the current WAL position in a redis key for the user every time there’s a write, then use pg_last_xlog_replay_location() on the various replicas to check and see if they have caught up next time the user makes a request that needs to read some data. # 30th October 2017, 8:53 pm

Fun facts about Rust’s growing popularity. TIL Rust is used “on hundreds of millions of desktops via Dropbox”. And Rust developers are called Rustaceans. # 30th October 2017, 4:45 am

Exploring United States Policing Data Using Python. Outstanding introduction to data analysis with Jupyter and Pandas. # 29th October 2017, 4:58 pm

Fast GeoSpatial Analysis in Python. Some clever advanced performance tricks with Cython and Dask, but it also introduced me to GeoPandas. # 29th October 2017, 4:47 pm

How to teach technical concepts with cartoons. Julia Evans: “This post is about a few patterns I use when illustrating ideas about computers. If you are interested in using drawings to teach people about your very favorite computer topics, hopefully this will help you!” # 28th October 2017, 2:55 pm

profiling. “An interactive continuous Python profiler”. This is really neat—simply run “profiling myscript.py” to get an interactive, navigable console-based profile inspector at the end of your script... or run “profiling live-profile mywebverver.py” to see a live, updating profile of a long-running process. Has options for statistical profiling as well, which has a much lower overhead in exchange for a less accurate view of what is going on. # 28th October 2017, 1:06 am

Benefit of TEXT with CHECK over VARCHAR(X) in PostgreSQL. Brandur suggests using email TEXT CHECK (char_length(email) <= 255) to define a column with a length limit in PostgreSQL over VARCHAR(255) because TEXT and VARCHAR are equally performant but a CHECK length can be changed later on without locking the table, whereas a VARCHAR requires an ALTER TABLE with an exclusive lock. # 28th October 2017, 12:59 am

Late night dining near Great American Music Hall

Tommy’s Joynt is a couple of blocks away and is a San Francisco institution—great comfort food, inexpensive, crammed with personality and open late.

[... 40 words]

In the Bay Area, we have a collection of fiefdoms. Villages are parading as cities, addressing problems myopically. For example, Brisbane (a city of 5,000 people between San Francisco and SFO) is currently blocking a large housing development for local reasons. It’s NIMBY-ism on a broad scale – a regional tragedy of the commons.

Justin Krause # 27th October 2017, 7:22 pm

Implementing Stripe-like Idempotency Keys in Postgres (via) Having clients send “idempotency keys” with API requests in order to be able to safely retry them if something’s goes wrong is a really neat trick for making transactional APIs more robust. Here Brandur Leach talks implementation strategies. # 27th October 2017, 5:51 pm

I’ve heard managers and teams mandating 100% code coverage for applications. That’s a really bad idea. The problem is that you get diminishing returns on our tests as the coverage increases much beyond 70% (I made that number up… no science there). Why is that? Well, when you strive for 100% all the time, you find yourself spending time testing things that really don’t need to be tested. Things that really have no logic in them at all (so any bugs could be caught by ESLint and Flow). Maintaining tests like this actually really slow you and your team down.

Kent C. Dodds # 27th October 2017, 6:20 am

Dynamically changing favicons with JavaScript. I just spotted GitHub using this trick to swap out the favicon on a pull request that hadn’t passed its tests yet for github.com/favicon-failure.ico with a little cross icon in the corner. # 26th October 2017, 7:56 pm

Contributors to python/cpython, Aug 5, 1990—Oct 26, 2017. I love how the graphs on this page summarize the history of the last 27 years of Python development, showing exactly when each core contributor was most active. # 26th October 2017, 6:58 pm

Early this year, the U.S. intelligence community named RT and Sputnik as implementing state-sponsored Russian efforts to interfere with and disrupt the 2016 Presidential election, which is not something we want on Twitter.

Twitter PublicPolicy # 26th October 2017, 2:38 pm

Elaborate Halloween Costume Tips from a 19th-Century Guide to Fancy Dress (via) The gilded age had some ridiculous parties. Here are highlights of the most popular costume guide of the era, now available on the Internet Archive. # 26th October 2017, 2:01 pm

arxiv-vanity (via) Beautiful new project from Ben Firshman and Andreas Jansson: “Arxiv Vanity renders academic papers from Arxiv as responsive web pages so you don’t have to squint at a PDF”. It works by pulling the raw LaTeX source code from Arxiv and rendering it to HTML using a heavily customized Pandoc workflow. The real fun is in the architecture: it’s a Django app running on Heroku which fires up on-demand Hyper.sh Docker containers for each individual rendering job. # 25th October 2017, 8:06 pm

Introducing Faktory. A brand new worker queue system from Mike Perham, the author of Sidekiq for Ruby. It’s written in Go on top of RocksDB and is explicitly designed to support clients and workers in multiple different languages. # 25th October 2017, 3:09 am

Saying Goodbye to Firebug. Firebug has reached end-of-life as a separate extension, replaced by the built-in Firefox Developer Tools. It’s hard to overstate how influential Firebug has been in helping to turn front-end web development into a respected and productive engineering discipline. # 24th October 2017, 10:38 pm

We made a sale for Appointment Reminder from someone whose only way of getting data into the system was to fax it to us. Guess the cheat. If you guess “CEO signs up for HelloFax, receives the fax, and types 600 patient names and phone numbers by hand” you have good instincts.

Patrick McKenzie # 23rd October 2017, 5:45 pm

Getting Your First 10 Customers (via) There is so much good advice embedded in this article by Patrick McKenzie, and it constantly comes back to the theme of doing whatever it takes to get to your first ten paying customers. # 23rd October 2017, 5:36 pm

Can You Afford It?: Real-world Web Performance Budgets. Alex Russell’s magnum opus on web performance budgets in 2017. He proposes a baseline testing device equivalent to a $200 Android phone on a slow 3G network emulated at 400ms RTT/400Kbps transfer and encourages a goal of 5s time-to-interactive on first load and 2s TTI for subsequent views. This means around 130kb of gzipped JavaScript—challenging but not impossible with modern JavaScript frameworks. # 23rd October 2017, 1:51 pm

hupper (via) Handy Python module for adding “live reload” development support to just about anything. I’m using it with Sanic—I run “hupper -m app” and it starts up my code in app.py and automatically reloads it any time any of the corresponding files changes on disk. # 23rd October 2017, 12:34 am

System Requirements For SQLite. Document describing the high level goals and objectives of SQLite. Like everything to do with SQLite this exhibits some incredibly well thought out software engineering. I particularly like “S80000: SQLite shall exhibit ductile failure characteristics“ where ductile is described in opposition to brittle: a ductile system begins showing signs of trouble well in advance of failure. # 22nd October 2017, 9:53 pm

Parse shell one-liners with pyparsing. Neat introduction to the pyparsing library, both for parsing tokens into labeled sections and constructing an AST from them. # 22nd October 2017, 1:35 pm

SurviveJS—Webpack (via) Free online book about Webpack. I’ve read the first couple of chapters and it looks like a concise, well constructed guide to a key component of the modern JavaScript stack. # 22nd October 2017, 1:22 pm

It’s Not a Feature Problem—Avoiding Startup Tarpits (via) “When we turned on paid advertising for the first time the increase we had a sizable increase in signups. We always feared that a new user would just churn because of what we perceived as deficiencies in the product. While there were users who churned for that reason, it was never the nightmare scenario that we imagined.” # 22nd October 2017, 12:53 pm

I am currently documenting a language called Seenku, spoken by fewer than 15,000 people in the rolling hills of southwestern Burkina Faso in West Africa. Like Chinese, it is a tonal language, meaning the pitch on which a word is pronounced can radically alter its meaning. For instance, tsu can mean “thatch” when pronounced with an extra low pitch, but “hippopotamus” when pronounced with falling pitch. In fact, pitch plays such a huge role in Seenku that it can be “spoken” through music alone, most notably on the traditional xylophone.

Laura McPherson # 22nd October 2017, 12:49 pm

Getting the Most out of Sqlite3 with Python. A couple of neat tricks I didn’t know: you can skip cursors entirely by calling .execute and .executemany directly on the connection object, and you can use the connection object as a context manager to execute transactions using a “with” block. # 22nd October 2017, 12:35 pm