Simon Willison’s Weblog

Blogmarks in Aug

Filters: Type: blogmark × Month: Aug ×


airtable-export. I wrote a command-line utility for exporting data from Airtable and dumping it to disk as YAML, JSON or newline delimited JSON files. This means you can backup an Airtable database from a GitHub Action and get a commit history of changes made to your data. # 29th August 2020, 9:48 pm

California Protected Areas Database in Datasette (via) I built this yesterday: it’s a Datasette interface on top of the CPAD 2020 GIS database of protected areas in California maintained by GreenInfo Network. This was a useful excuse to build a GitHub Actions flow that builds a SpatiaLite database using my shapefile-to-sqlite tool, and I fixed a few bugs in my datasette-leaflet-geojson plugin as well. # 21st August 2020, 11:15 pm

Announcing the Consortium for Python Data API Standards (via) Interesting effort to unify the fragmented DataFrame API ecosystem, where increasing numbers of libraries offer APIs inspired by Pandas that imitate each other but aren’t 100% compatible. The announcement includes some very clever code to support the effort: custom tooling to compare the existing APIs, and an ingenious GitHub Actions setup to run traces (via sys.settrace), derive type signatures and commit those generated signatures back to a repository. # 19th August 2020, 5:48 am

Datasette 0.46 (via) I just released Datasette 0.46 with a security fix for an issue involving CSRF tokens on canned query pages, plus a new debugging tool, improved file downloads and a bunch of other smaller improvements. # 9th August 2020, 4:57 pm

Pysa: An open source static analysis tool to detect and prevent security issues in Python code (via) Interesting new static analysis tool for auditing Python for security vulnerabilities—things like SQL injection and os.execute() calls. Built by Facebook and tested extensively on Instagram, a multi-million line Django application. # 7th August 2020, 8:50 pm

Design Docs at Google. Useful description of the format used for software design docs at Google—informal documents of between 3 and 20 pages that outline the proposed design of a new project, discuss trade-offs that were considered and solicit feedback before the code starts to be written. # 7th August 2020, 4:31 pm

Zero Downtime Release: Disruption-free Load Balancing of a Multi-Billion User Website (via) I remain fascinated by techniques for zero downtime deployment—once you have it working it makes shipping changes to your software so much less stressful, which means you can iterate faster and generally be much more confident in shipping code. Facebook have invested vast amounts of effort into getting this right, and their new paper for the ACM SIGCOMM conference goes into detail about how it all works. # 5th August 2020, 3:27 am

How a Cheese Goes Extinct (via) Ruby Tandoh writes for the New Yorker about the culture, history and anthropology of cheesemaking through the lens of the British cheese industry. I learned that two of my favourite British cheeses—Tymsboro and Innes Log, have sadly ceased production. Beautifully written. # 2nd August 2020, 5:51 pm

sqlite-utils 2.14 (via) I finally figured out porter stemming with SQLite full-text search today—it turns out it’s as easy as adding tokenize=’porter’ to the CREATE VIRTUAL TABLE statement. So I just shipped sqlite-utils 2.14 with a tokenize= option (plus the ability to insert binary file data from stdin). # 1st August 2020, 9:19 pm

James Bennett on why Django should not support JWT in core (via) The topic of adding JWT support to Django core comes up occasionally—here’s James Bennett’s detailed argument for not doing that. The short version is that the JWT specification isn’t just difficult to implement securely: it’s fundamentally flawed, which results in things like five implementations in three different languages all manifesting the same vulnerability. Third party modules exist that add JWT support to Django, but baking it into core would act as a form of endorsement and Django’s philosophy has always been to encourage people towards best practices. # 1st August 2020, 12:28 am

Subsume JSON a.k.a. JSON ⊂ ECMAScript (via) TIL that JSON isn’t a subset of ECMAScript after all! “In ES2018, ECMAScript string literals couldn’t contain unescaped U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR characters, because they are considered to be line terminators even in that context.” # 15th August 2019, 10:30 am

OPP (Other People’s Problems) (via) Camille Fournier provides a comprehensive guide to picking your battles: in a large organization how can you navigate the enormous array of problems you can see that you’d like to fix, especially when so many of those problems aren’t directly in your area of control? # 7th August 2019, 1:58 pm

Optimizing for the mobile web: Moving from Angular to Preact. Grubhub reduced their mobile web load times from 9-11s to 3-4s by replacing Angular with Preact (and replacing other libraries such as lodash with native JavaScript code). The conversion took 6 months and involved running Angular and Preact simultaneously during the transition—not a huge additional overhead as Preact itself is only 4KB. They used TypeScript throughout and credit it with providing a great deal of confidence and productivity to the overall refactoring. # 5th August 2019, 12:26 pm

Working with many-to-many relationships in sqlite-utils (via) I just released sqlite-utils 1.9 with syntactic sugar support for creating many-to-many relationships for records stored in SQLite databases. # 4th August 2019, 3:57 am

Logs vs. metrics: a false dichotomy (via) Nick Stenning discusses the differences between logs and metrics: most notably that metrics can be derived from logs but logs cannot be reconstituted starting with time-series metrics. # 3rd August 2019, 4:46 pm

PyPI now supports uploading via API token (via) All of my open source Python libraries are set up to automatically deploy new tagged releases as PyPI packages using Circle CI or Travis, but I’ve always get a bit uncomfortable about sharing my PyPI password with those CI platforms to get this to work. PyPI just added scopes authentication tokens, which means I can issue a token that’s only allowed to upload a specific project and see an audit log of when that token was last used. # 1st August 2019, 4:03 pm

Advice for a new executive, by Chad Dickerson (via) Lara Hogan shares the advice she was given by Chad Dickerson (CTO and then CEO of Etsy) when she first became VP Engineering at Kickstarter. There is so much good material in here. I can vouch for the “peer support group” recommendation: Natalie and I benefited from that through Y Combinator and ended up building our own founder peer support group when we moved our startup back to London. Having a confidential trusted group with which to discuss the challenges of growing a company was invaluable. # 31st August 2018, 1:45 pm

Serverless for data scientists (via) Slides and accompanying notes from a talk by Mike Lee Williams at PyBay, providing an overview of Zappa and diving a bit more deeply into pywren, which makes it trivial to parallelize a function across a set of AWS lambda instances (serverless Python map() execution essentially). I really like this format for sharing presentations—I used something similar for my own PyBay talk. # 25th August 2018, 11:01 pm

Computational and Inferential Thinking: The Foundations of Data Science. Free online textbook written for the UC Berkeley Foundations of Data Science class. The examples are all provided as Jupyter notebooks, using the mybinder web application to allow students to launch interactive notebooks for any of the examples without having to install any software on their own machines. # 25th August 2018, 10:13 pm

The Future of Notebooks: Lessons from JupyterCon (via) It sounds like reactive notebooks (where cells keep track of their dependencies on other cells and re-evaluate when those update) were a hot topic at JupyterCon this year. # 25th August 2018, 9:55 pm

Honeycomb changelog (via) Too few hosted services have detailed user-facing changelogs. This one from Honeycomb (a metrics, tracing and observavility platform) is a particularly great example. I especially like the use of animated screenshots, something I’ve been evangelizing pretty heavily recently for internal communication at work. # 25th August 2018, 3:12 am

I don’t like Jupyter Notebooks—a presentation by Joel Grus (via) Fascinating talk by Joel Grus at the Jupyter conference in New York. He highlights some of the drawbacks of he Jupyter way of working, including the huge confusion that can come from the ability to execute cells out of order (something Observable notebooks solve brilliantly using spreadsheet-style reactive cell associations). He also makes strong arguments that notebooks encourage a way of working that discourages people from producing stable, repeatable and well tested code. # 25th August 2018, 3:04 am

jq recipes. Remy Sharp’s handy collection of jq recipes, each one linking to an interactive demo on jqterm.com. I thought jq was just for extracting values from a JSON document—I hadn’t realized how powerful it was for modifying and extending those documents as well. # 22nd August 2018, 3:23 pm

6 Great Uses of the Spread Operator. As I’ve been getting more comfortable with 2018-era JavaScript the spread operator and object restructuring are two of the features I have found most interesting. # 22nd August 2018, 3:17 pm

Slides, notes and links from my Datasette talk at PyBay (via) I presented a session about Datasette at the PyBay conference in San Francisco this morning. I talked about the project itself and demonstrated ways of creating and publishing databases using csvs-to-sqlite, Datasette Publish and my new sqlite-utils library. # 19th August 2018, 11:23 pm

Observable Tutorial 2: Dog pictures (via) Observable have a neat new set of tutorials on how to get started with their reactive notebooks. You don’t even need to sign up for the service: they have a “Scratchpad” link in their navigation bar now which lets you spin up a test notebook with one click. # 18th August 2018, 7:55 pm

Redux vs. The React Context API. Nice explanation of the new Context API in React 16.3, which provides an easy way for passing props down through a tree of components without needing to explicitly pass the prop at every level of the tree. The comparison with Redux doubles as a useful explanation of the value that Redux provides. # 18th August 2018, 6:51 pm

Beyond Interactive: Notebook Innovation at Netflix. Netflix have been investing heavily in their internal Jupyter notebooks infrastructure: it’s now the most popular tool for working with data at Netflix. They also use parameterized notebooks to make it easy to create templates for reusable operations, and scheduled notebooks for recurring tasks. “When a Spark or Presto job executes from the scheduler, the source code is injected into a newly-created notebook and executed. That notebook then becomes an immutable historical record, containing all related artifacts — including source code, parameters, runtime config, execution logs, error messages, and so on.” # 18th August 2018, 5:55 pm

Text to Image (via) Ridiculously entertaining demo by Cris Valenzuela that feeds any text you type to a neural network that then attempts to generate an image for your text. # 18th August 2018, 5:33 pm

Compiling SQLite for use with Python Applications (via) Charles Leifer’s recent tutorial on how to compile and build the latest SQLite (with window function support) for use from Python via his pysqlite3 library. # 15th August 2018, 3:51 pm