Simon Willison’s Weblog

Items in 2019

Filters: Year: 2019 ×

Lessons from 6 software rewrite stories (via) Herb Caudill takes on the classic idea that rewriting from scratch is “the single worst strategic mistake that any software company can make” and investigates it through the lens of six well-chosen examples: Netscape 6, Basecamp Classic/2/3, Visual Studio/VS Code, Gmail/Inbox, FogBugz/Wasabi/Trello, and finally FreshBooks/BillSpring. Each story has details I had never heard before, and the lessons and conclusions are deeply insightful. # 19th February 2019, 9:55 pm

parameterized. I love the @parametrize decorator in pytest, which lets you run the same test multiple times against multiple parameters. The only catch is that the decorator in pytest doesn’t work for old-style unittest TestCase tests, which means you can’t easily add it to test suites that were built using the older model. I just found out about parameterized which works with unittest tests whether or not you are running them using the pytest test runner. # 19th February 2019, 9:05 pm

The Eleven Laws of Showrunning (via) Fascinating essay on how to run a modern TV show by Javier Grillo-Marxuach. Being a showrunner basically involves running a 100+ person startup with a 7 digit budget, almost immovable deadlines, high maintenance activist investors and you’re still expected to write some of the scripts! So many useful lessons here about management, creativity and delegation: almost everything in here is relevant to product management, startup founding and engineering management as well. # 19th February 2019, 7:27 pm

This paper introduces Mesh, a plug-in replacement for malloc that, for the first time, eliminates fragmentation in unmodified C/C++ applications. Mesh combines novel randomized algorithms with widely-supported virtual memory operations to provably reduce fragmentation, breaking the classical Robson bounds with high probability. Mesh generally matches the runtime performance of state-of-the-art memory allocators while reducing memory consumption; in particular, it reduces the memory of consumption of Firefox by 16% and Redis by 39%.

Mesh: Compacting Memory Management for C/C++ Applications # 18th February 2019, 3:26 pm

Discussion about Altavista on Hacker News. Fascinating thread on Hacker News where Bryant Durrell, a former Director from Altavista provides some insider thoughts on how they lost against Google. # 16th February 2019, 6:57 pm

If you want the fastest website despite implementation difficulty, the answer is: SSR behind a CDN with assets in best compression formats (webp, Brotli, woff2) served over http2 (or 3) from same origin with JS as enhancement only

Mike Sherov # 15th February 2019, 7:12 pm

Data science is different now (via) Detailed examination of the current state of the job market for data science. Boot camps and university courses have produced a growing volume of junior data scientists seeking work, but the job market is much more competitive than many expected—especially for those without prior experience. Meanwhile the job itself is much more about data cleanup and software engineering skills: machine learning models and applied statistics end up being a small portion of the actual work. # 15th February 2019, 3:36 pm

Vitess (via) I remember looking at Vitess when it was first released by YouTube in 2012. The idea of a proven horizontally scalable sharding mechanism for MySQL was exciting, but I was put off by the need for a custom Go or Java client library. Apparently that changed with Vitess 2.1 in April 2017, the first version to introduce a MySQL protocol compatible proxy which can be connected to by existing code written in any language. Vitess 3.0 came out last December so now the MySQL proxy layer is much more stable. Vitess is used in production by a bunch of other companies now (including Slack and Square) so it’s definitely worth a closer look. # 14th February 2019, 5:35 am

Operations engineering does not consist of firefighting your shitty software, it is the science of delivering value to users.

Charity Majors # 14th February 2019, 1:12 am

django-zombodb (via) The hardest part of working with an external search engine like Elasticsearch is always keeping that index synchronized with your relational database. ZomboDB is a PostgreSQL extension which lets you create a new type of index backed by an external Elasticsearch cluster. Updated rows will be pushed to the index automatically, and custom SQL syntax can then be used to execute searches. django-zombodb is a brand new library by Flávio Juvenal which integrates ZomboDB directly into the Django ORM, letting you add Elasticsearch-backed functionality with just a few lines of extra configuration. It even includes custom Django migrations for enabling the extension in PostgreSQL! # 13th February 2019, 10:14 pm

Private blockchains are completely uninteresting. (By this, I mean systems that use the blockchain data structure but don’t have the above three elements.) In general, they have some external limitation on who can interact with the blockchain and its features. These are not anything new; they’re distributed append-only data structures with a list of individuals authorized to add to it. Consensus protocols have been studied in distributed systems for more than 60 years. Append-only data structures have been similarly well covered. They’re blockchains in name only, and -- as far as I can tell -- the only reason to operate one is to ride on the blockchain hype.

Bruce Schneier # 12th February 2019, 7:14 pm

socrata2sql (via) Phenomenal new open source tool released by Andrew Chavez at the Dallas Morning News. Socrata is the open data portal software used by huge numbers of local governments worldwide. socrata2sql is a tool that interacts with the standard Socrata API and can use it to suck down a dataset and save it as a SQLite, PostgreSQL, MySQL or other SQLAlchemy-supported database. I just tried this and it took a single command to create a SQLite database of every police arrest in Dallas in the past five years. # 8th February 2019, 3:27 pm

db-to-sqlite (via) I just released version 0.2 of a tiny CLI utility I’ve been working on. It builds on top of SQLAlchemy and lets you connect to any SQLAlchemy-supported database and convert the data from it to a local SQLite database file. The new --all option will mirror all available tables (including foreign key relationships), or you can use --sql to save the results of custom SQL queries. # 8th February 2019, 6:08 am

Questions for a new technology. Kellan poses 8 questions which should be asked of any technology that is being proposed for inclusion in an existing tech stack. I’m particularly fond of “Will this solution kill and eat the solution that it replaces?”. My rule of thumb these days is that new technology either needs to make something possible that isn’t possible at all with the existing stack, or it needs to represent at least a 3X productivity improvement in order to compensate for the switching and retraining costs across a large team. # 6th February 2019, 4:10 am

The Datasette Ecosystem. I’ve written a page of documentation that introduces the wider Datasette Ecosystem: csvs-to-sqlite, sqlite-utils, db-to-sqlite, dbf-to-sqlite, markdown-to-sqlite and a full collection of Datasette plugins. # 1st February 2019, 4:41 am

Datasette 0.27 (via) The latest release of Datasette introduces an option to output tables and SQL query results as newline-delimited JSON—plus a new “datasette plugins” command for listing available plugins. # 1st February 2019, 4:39 am

Everyone is angry about CSS again. I’m not even going to try to summarize the arguments. However it always seems to boil down to the fact that CSS is simultaneously too easy to bother with, yet so hard it needs to be wrapped up in a ball of JavaScript in case it scares the horses.

Rachel Andrew # 30th January 2019, 11:14 pm

Since Mozilla moved on from Firefox OS, its derivatives have shipped on an order of magnitude more devices than during its entire time under Mozilla’s leadership and it has gone on to form the basis of the third largest and fastest growing mobile operating system in the world.

Ben Francis # 28th January 2019, 12:59 pm

websocketd (via) Delightfully clever piece of design: “It’s like CGI, twenty years later, for WebSockets”. Simply run “websocketd --port=8080 my-program” and it will start up a WebSocket server on port 8080 and fire up a new process running your script every time it sees a new WebSocket connection. Standard in and standard out are automatically hooked up to the socket connection. Since it spawns a new process per connection this won’t work well with thousands of connections but for smaller scale projects it’s an excellent addition to the toolbok—and since it’s written in Go there are pre-compiled binaries available for almost everything. # 26th January 2019, 2:38 am

Practical Deep Learning for Coders 2019 (via) The deep learning evening course I took a few months ago has now been shared online in full, and it’s outstanding. “After the first lesson you’ll be able to train a state-of-the-art image classification model on your own data”—can confirm: after just the first lesson I built a bobcat v.s. cougar classifier using photos from iNaturalist. The biggest thing I learned from the course is how powerful transfer learning is. I used to think you needed a huge amount of data to get good results from deep learning. That’s no longer true: you can take an existing model (eg ResNet for image classification) and train on top of it. ResNet can classify images as 1,000 classes (house, cat, etc)—training an extra few hundred images of e.g. Bobcats vs Cougars only takes a couple of minutes on a GPU and can give you crazily accurate results. It works because the pre-trained model can already pick up really subtle details—fur patterns, ear shapes etc—so you only need to train a few more layers on it for it to be able to classify against the patterns in your new set of training images. And this doesnt just work for image classification! Natural language processing benefits from transfer learning too: take an existing model trained on the entire corpus of Wikipedia (so it knows patterns from sentence structures) and you can build IMDB sentiment analysis on top. That’s in lesson 4. # 26th January 2019, 12:32 am

Subscribe to my blog on Telegram (via) I created a Telegram bot that’s subscribed to my Atom feed, so if you want to get notifications when I post to my blog you can do that using Telegram now. # 20th January 2019, 4:11 am

togeojson (via) Handy JavaScript library and command-mine tool for converting KML and GPX to GeoJSON, by Tom MacWright # 18th January 2019, 11:50 pm

[On 5G] This is the great thing about the decentralized, permissionless innovation of the internet—telcos don’t need to decide in advance what the use cases are, any more than Intel had to decide what the use cases for faster CPUs would be.

Ben Evans # 18th January 2019, 6:55 am

SQLite in 2018: A state of the art SQL dialect (via) In 2018 SQLite gained boolean literals, window functions, filter clauses, upserts and the ability to rename a column. If you want to try it out the latest official datasetteproject/datasette Docker image now bundles SQLite 3.26. # 15th January 2019, 4:21 pm

An Interactive Introduction to Fourier Transforms (via) I love interactive exploitable explanations and this is the best I’ve seen in a while: Jez Swanson breaks down exactly what a Fourier transform does, first by letting you interactively draw and deconstruct wave patterns and then by showing Epicycles andcexplsining JPEG compression. All with not a formula in sight! # 12th January 2019, 2:55 am

Usable Data (via) A Paul Ford essay from February 2016 in which he advocates for SQLite as the ideal format for sharing interesting data. I don’t know how I missed this one—it predates Datasette, but it perfectly captures the benefits that I’m trying to expose with the project. “In my dream universe, there would be a massive searchable torrent site filled with open, explorable data sets, in SQLite format, some with full text search indexes already in place.” # 11th January 2019, 6:33 pm

Exploring search relevance algorithms with SQLite

SQLite isn’t just a fast, high quality embedded database: it also incorporates a powerful full-text search engine in the form of the FTS4 and FTS5 extensions. You’ve probably used these a bunch of times already: many iOS, Android and desktop applications use SQLite under-the-hood and use it to implement their built-in search.

[... 1398 words]

Launching LiteCLI (via) Really neat alternative command-line client for SQLite, written in Python and using the same underlying framework as the similar pgcli (PostgreSQL) and mycli (MySQL) tools. Provides really intuitive autocomplete against table names, columns and other bits and pieces of SQLite syntax. Installation is as easy as “pip install litecli”. # 5th January 2019, 11:16 pm