Simon Willison’s Weblog

Subscribe

February 2020

Feb. 5, 2020

Weeknotes: Shaving yaks for Datasette Cloud

I’ve been shaving a lot of yaks, but I’m finally ready to for other people to start kicking the tires on the MVP of Datasette Cloud.

[... 392 words]

Feb. 7, 2020

Deep learning isn’t hard anymore. This article does a great job of explaining how transfer learning is unlocking a new wave of innovation around deep learning. Previously if you wanted to train a model you needed vast amounts if data and thousands of dollars of compute time. Thanks to transfer learning you can now take an existing model (such as GPT2) and train something useful on top of it that’s specific to a new domain in just minutes it hours, with only a few hundred or a few thousand new labeled samples.

# 8:47 am / machine-learning, transferlearning

Feb. 8, 2020

We write a lot of JavaScript at Basecamp, but we don’t use it to create “JavaScript applications” in the contemporary sense. All our applications have server-side rendered HTML at their core, then add sprinkles of JavaScript to make them sparkle. [...] It allows us to party with productivity like days of yore. A throwback to when a single programmer could make rapacious progress without getting stuck in layers of indirection or distributed systems. A time before everyone thought the holy grail was to confine their server-side application to producing JSON for a JavaScript-based client application.

David Heinemeier Hansson

# 8:10 am / dhh, javascript

Feb. 11, 2020

How to cheat at unit tests with pytest and Black

I’ve been making a lot of progress on Datasette Cloud this week. As an application that provides private hosted Datasette instances (initially targeted at data journalists and newsrooms) the majority of the code I’ve written deals with permissions: allowing people to form teams, invite team members, promote and demote team administrators and suchlike.

[... 933 words]

Feb. 13, 2020

A group of software engineers gathered around a whiteboard are a joint cognitive system. The scrawls on the board are spatial cues for building a shared model of a complex system.

Eric Dobbs

# 6:48 pm / collaboration

Feb. 14, 2020

pup. This is a great idea: a command-line tool for parsing HTML on stdin using CSS selectors. It’s like jq but for HTML. Supports a sensible collection of selectors and has a number of output options for the selected nodes, including plain text and JSON. It also works as a simple pretty-printer for HTML.

# 4:25 pm / parsing, html, cli

Feb. 19, 2020

Things I learned about shapefiles building shapefile-to-sqlite

Visit Things I learned about shapefiles building shapefile-to-sqlite

The latest in my series of x-to-sqlite tools is shapefile-to-sqlite. I learned a whole bunch of things about the ESRI shapefile format while building it.

[... 1,073 words]

Feb. 21, 2020

So next time someone is giving you feedback about something you made, think to yourself that to win means getting two or three insights, ideas, or suggestions that you are excited about, and that you couldn’t think up on your own.

Juliette Cezzar

# 1:04 am / design

Feb. 23, 2020

Why Google invested in providing Google Fonts for free. Fascinating comment from former Google Fonts team member Raph Levien. In short: text rendered as PNGs hurt Google Search, fonts were a delay in the transition from Flash, Google Docs needed them to better compete with Office and anything that helps create better ads is easy to find funding for.

# 2:13 pm / typography, google

Feb. 24, 2020

I’ve really come to appreciate that performance isn’t just some property of a tool independent from its functionality or its feature set. Performance — in particular, being notably fast — is a feature in and of its own right, which fundamentally alters how a tool is used and perceived.

Nelson Elhage

# 2:32 pm / performance

Feb. 26, 2020

Weeknotes: Datasette Writes

As discussed previously, the biggest hole in Datasette’s feature set at the moment involves writing to the database.

[... 604 words]

Feb. 28, 2020

Wildcard: Spreadsheet-Driven Customization of Web Applications (via) What a fascinating collection of ideas. Wildcard is a browser extension (currently using Tampermonkey and sadly not yet available to try out) which lets you add “spreadsheet-driven customization” to any web application. Watching the animated screenshots in the videos helps explain what this mean—essentially it’s a two-way scraping trick, where content on the page (e.g. Airbnb listings) are extracted into a spreadsheet-like table interface using JavaScript—but then interactions you make in that spreadsheet like filtering and sorting are reflected back on the original page. It even has the ability to serve editable cells by mapping them to form inputs on the page. Lots to think about here.

# 7:39 pm / spreadsheets, greasemonkey

2020 » February

MTWTFSS
     12
3456789
10111213141516
17181920212223
242526272829