Simon Willison’s Weblog


Filters: Type: blogmark ×

SQLite in 2018: A state of the art SQL dialect (via) In 2018 SQLite gained boolean literals, window functions, filter clauses, upserts and the ability to rename a column. If you want to try it out the latest official datasetteproject/datasette Docker image now bundles SQLite 3.26. # 15th January 2019, 4:21 pm

An Interactive Introduction to Fourier Transforms (via) I love interactive exploitable explanations and this is the best I’ve seen in a while: Jez Swanson breaks down exactly what a Fourier transform does, first by letting you interactively draw and deconstruct wave patterns and then by showing Epicycles andcexplsining JPEG compression. All with not a formula in sight! # 12th January 2019, 2:55 am

Usable Data (via) A Paul Ford essay from February 2016 in which he advocates for SQLite as the ideal format for sharing interesting data. I don’t know how I missed this one—it predates Datasette, but it perfectly captures the benefits that I’m trying to expose with the project. “In my dream universe, there would be a massive searchable torrent site filled with open, explorable data sets, in SQLite format, some with full text search indexes already in place.” # 11th January 2019, 6:33 pm

Launching LiteCLI (via) Really neat alternative command-line client for SQLite, written in Python and using the same underlying framework as the similar pgcli (PostgreSQL) and mycli (MySQL) tools. Provides really intuitive autocomplete against table names, columns and other bits and pieces of SQLite syntax. Installation is as easy as “pip install litecli”. # 5th January 2019, 11:16 pm

The Friendship That Made Google Huge. The New Yorker profiles Jeff Dean and Sanjay Ghemawat, Google’s first and only level 11 Senior Fellows. This is some of the best writing on complex software engineering topics (map-reduce, Tensor Flow and the like) aimed at a general audience that I’ve ever seen. Also a very compelling case study in pair programming. # 31st December 2018, 3:56 am

benfred/py-spy (via) A Python port of Julia Evans’ rbspy profiler, which she describes as “probably better” than the original. I just tried it out and it’s really impressive: it’s written in Rust but has precompiled binaries so you can just run “pip install py-spy” to install it. Shows live output in the terminal while your program is running and also includes the option to generate neat SVG flame graphs. # 29th December 2018, 5:18 am

Fast Autocomplete Search for Your Website (via) I wrote a tutorial for the 24 ways advent calendar on building fast autocomplete search for a website on top of Datasette and SQLite. I built the demo against 24 ways itself—I used wget to recursively fetch all 330 articles as HTML, then wrote code in a Jupyter notebook to extract the raw data from them (with BeautifulSoup) and load them into SQLite using my sqlite-utils Python library. I deployed the resulting database using Datasette, then wrote some vanilla JavaScript to implement autocomplete using fast SQL queries against the Datasette JSON API. # 19th December 2018, 12:26 am

Develop Your Naturalist Superpowers with Observable Notebooks and iNaturalist (via) Natalie’s article for this year’s 24 ways advent calendar shows how you can use Observable notebooks to quickly build interactive visualizations against web APIs. She uses the iNaturalist API to show species of Nudibranchs that you might see in a given month, plus a Vega-powered graph of sightings over the course of the year. This really inspired me to think harder about how I can use Observable to solve some of my API debugging needs, and I’ve already spun up a couple of private Notebooks to exercise new APIs that I’m building at work. It’s a huge productivity boost. # 18th December 2018, 10:39 pm

PEP 8016 -- The Steering Council Model (via) The votes are in and Python has a new governance model, partly inspired by the model used by the Django Software Foundation. A core elected council of five people (with a maximum of two employees from any individual company) will oversee the project. # 17th December 2018, 4:02 pm

Pampy: Pattern Matching for Python (via) Ingenious implementation of Erlang/Rust style pattern matching in just 150 lines of extremely cleanly designed and well-tested Python. # 17th December 2018, 7:14 am “NIP.IO maps <anything>.<IP Address> to the corresponding <IP Address>, even maps to”—looks useful. is a different service that does the same thing. Being able to put anything at the start looks handy for testing systems that handle different subdomains. # 12th December 2018, 6:18 pm

The _repr_html_ method in Jupyter notebooks (via) Today I learned that if you add a _repr_html_ method returning a string of HTML to any Python class Jupyter notebooks will render that HTML inline to represent that object. # 12th December 2018, 6:09 pm

Things About Real-World Data Science Not Discussed In MOOCs and Thought Pieces (via) Really good article, pointing out that carefully optimizing machine learning models is only a small part of the day-to-day work of a data scientist: cleaning up data, building dashboards, shipping models to production, deciding on trade-offs between performance and production and considering the product design and ethical implementations of what you are doing make up a much larger portion of the job. # 11th December 2018, 8:51 pm

How Does React Tell a Class from a Function? (via) Understanding the answer requires a deep understanding of the history of the JavaScript class system plus how Babel and other tools work with it. “For an API to be simple to use, you often need to consider the language semantics (possibly, for several languages, including future directions), runtime performance, ergonomics with and without compile-time steps, the state of the ecosystem and packaging solutions, early warnings, and many other things. The end result might not always be the most elegant, but it must be practical.” # 3rd December 2018, 1:30 am

Software Sprawl, The Golden Path, and Scaling Teams With Agency (via) This is smart: the “golden path” approach to encouraging a standard stack within a large engineering organization. If you build using the components on the golden path you get guaranteed ongoing support and as much free monitoring/tooling as can possibly be provided. I also really like the suggestion that this should be managed by a “council” of senior engineers with one member of the council rotated out every quarter to keep things from getting stale and cabal-like. # 2nd December 2018, 7:40 pm

repo2docker (via) Neat tool from the Jupyter project team: run “jupyter-repo2docker” and it will pull a GitHub repository, create a new Docker container for it, install Jupyter and launch a Jupyter instance for you to start trying out the library. I’ve been doing this by hand using virtual environments, but using Docker for even cleaner isolation seems like a smart improvement. # 28th November 2018, 10:06 pm

BigInt: arbitrary-precision integers in JavaScript (via) The BigInt specification is now supported in Chrome—but it hasn’t yet made it to other browsers. The Chrome team have a really interesting solution: they’ve released a JSBI library which you can use to do BigInt calculations in any browser today, and an accompanying Babel plugin which can rewrite calls to that library into BigInt syntax once browser support catches up. I’ve never seen a library that includes a tool for refactoring itself into oblivion before. # 28th November 2018, 2:22 am

AWS Ground Station – Ingest and Process Data from Orbiting Satellites. OK this is cool. “Instead of building your own ground station or entering in to a long-term contract, you can make use of AWS Ground Station on an as-needed, pay-as-you-go basis. [...] You don’t need to build or maintain antennas, and can focus on your work or research. We’re starting out with a pair of ground stations today, and will have 12 in operation by mid-2019. Each ground station is associated with a particular AWS Region; the raw analog data from the satellite is processed by our modem digitizer into a data stream (in what is formally known as VITA 49 baseband or VITA 49 RF over IP data streams) and routed to an EC2 instance that is responsible for doing the signal processing to turn it into a byte stream.” # 28th November 2018, 1:04 am

Helicopter accident analysis notebook (via) Ben Welsh worked on an article for the LA Times about helicopter accident rates, and has published the underlying analysis as an extremely detailed Jupyter notebook. Lots of neat new (to me) notebook tricks in here as well. # 19th November 2018, 6:25 pm

dive (via) Handy command-line tool (as with so much of the Docker ecosystem it’s written in Go, which means you can download a Darwin binary directly from the GitHub releases page and run it directly on your Mac) for visually exploring the different layers of a given Docker image. # 19th November 2018, 4:41 am

Squoosh. This is by far the most useful example of web assembly I’ve seen so far: Squoosh is a progressive web app for image optimization (JPEG, PNG, GIF, SVG and more) which uses emscripten-compiled versions of best in breed image codec implementations to provide a browser interface for applying and previewing those optimizations. # 12th November 2018, 11:15 pm

Tracking Jupyter: Newsletter, the Third... (via) Tony Hirst’s tracking Jupyter newsletter is fantastic. The Jupyter ecosystem is incredibly exciting and fast moving at the moment as more and more groups discover how productive it is, and Tony’s newsletter is a wealth of information on what’s going on out there. # 9th November 2018, 5:42 pm

A Netflix Web Performance Case Study (via) Fascinating description of how Netflix knocked the 3G loading times of their homepage in half for logged-out users by rendering the React templates on the server-side and using the bare amount of vanilla JavaScript necessary to get the homepage interactive—then XHR prefetching the full React code needed to power the subsequent signup flow. Via Alex Russell, who tweets “I’m increasingly optimistic that we can cap JS emissions by quarantining legacy frameworks to the server side.” # 6th November 2018, 8:54 pm

Optimizing Django Admin Paginator. The Django admin paginator uses a count(*) to calculate the total number of rows, so it knows how many pages to display. This makes it unpleasantly slow over large datasets. Haki Benita has an ingenious solution: drop in a custom paginator which uses the PostgreSQL “SET LOCAL statement_timeout TO 200” statement first, then if a timeout error is raised returns 9999999999 as the count instead. This means small tables get accurate page counts and giant tables load display in the admin within a reasonable time period. # 6th November 2018, 6:17 pm

11 barriers to coding in the open and how to overcome them (via) “Terence Eden, open standards lead at GDS, also gave a talk about overcoming barriers to coding in the open”—an intriguing recap of that talk revealing exactly how the UK government have been encouraging a culture of coding in the open and going open source first. # 5th November 2018, 8:53 pm

Twitter conversation about long-term pre-paid archival storage. I kicked off a conversation on Twitter yesterday about long-time archival storage of web content: “Anyone know of a web hosting provider where I can pay a lump sum of money to host a file at a reliable URL essentially forever? Is this even remotely feasible?”. The thread is really interesting—this is definitely an unsolved problem, and it’s clear that the challenge is more organizational (how do you create an entity that can keep this kind of promise—does it need to be some kind of foundation or trust?) than technical. # 5th November 2018, 6:50 pm

Apple’s New Map (via) Map nerds rejoice! Justin O’Beirne had written another spectacularly illustrated essay about web cartography, this time examining the iOS 12 upgrade to Apple Maps in most of California and a little bit of Nevada. # 3rd November 2018, 9:28 pm

Svelte RFC 0001: Reactive assignments (via) Svelte is a really interesting JavaScript framework: it offers a similar component-based developer experience to React but does it while delivering a tiny amount of code-to-the-browser thanks to being built entirely around a compiler that generates the minimum code necessary. In this RFC the Svelte team propose taking this approach even further, by generating code to accompany every relevant variable assignment that can trigger the corresponding view update. The document also has a very clear explanation of how React, Vue and current Svelte differ in their solutions to the challenge of updating the visible HTML view when the corresponding state changes. # 3rd November 2018, 4:58 pm

jantic/DeOldify (via) “A Deep Learning based project for colorizing and restoring old images”. Delightful (and well documented) project that uses a Self-Attention Generative Adversarial Network to colorize old black and white photos, with extremely impressive results. Built on an older version of the fastai library, and trained by running for several days on a 1080TI graphics card. # 2nd November 2018, 11:13 am