Items in 2020
Filters: Year: 2020 ×
Last week’s weeknotes took the form of my Personal Data Warehouses: Reclaiming Your Data talk write-up, which represented most of what I got done that week. This week I mainly worked on datasette-indieauth, but I also gave a keynote at PyCon Argentina and released a version of datasette-graphql with a small security fix.[... 718 words]
datasette-graphql 1.2 (via) A new release of the datasette-graphql plugin, fixing a minor security flaw: previous versions of the plugin could expose the schema (but not the actual data) of tables in databases that were otherwise protected by Datasette’s permission system. # 21st November 2020, 10:21 pm
I Lived Through A Stupid Coup. America Is Having One Now (via) If, like me, you have been avoiding the word “coup” since it feels like a clear over-reaction to what’s going on, I challenge you to read this piece and not change your mind. # 21st November 2020, 1:21 pm
The open secret Jennings filled me in on is that OpenStreetMap (OSM) is now at the center of an unholy alliance of the world’s largest and wealthiest technology companies. The most valuable companies in the world are treating OSM as critical infrastructure for some of the most-used software ever written. The four companies in the inner circle— Facebook, Apple, Amazon, and Microsoft— have a combined market capitalization of over six trillion dollars.
The trouble with transaction.atomic (via) David Seddon provides a detailed explanation of Django’s nestable transaction.atomic() context manager and describes a gotcha that can occur if you lose track of whether your code is already running in a transaction block, since you may be working with savepoints instead—along with some smart workarounds. # 20th November 2020, 3:57 pm
Internet Archive Software Library: Flash (via) A fantastic new initiative from the Internet Archive: they’re now archiving Flash (.swf) files and serving them for modern browsers using Ruffle, a Flash Player emulator written in Rust and compiled to WebAssembly. They are fully interactive and audio works too. Considering the enormous quantity of creative material released in Flash over the decades this helps fill a big hole in the Internet’s cultural memory. # 19th November 2020, 9:19 pm
Security vulnerability in datasette-indieauth: Implementation trusts the “me” field returned by the authorization server without verifying it. I spotted a critical security vulnerability in my new datasette-indieauth plugin: it accepted the “me” profile URL value returned from the authorization server in the final step of the IndieAuth flow without verifying it, which means a malicious server could imitate any user. I’ve shipped 1.1 with a fix and posted a security advisory to the GitHub repository. # 19th November 2020, 9:14 pm
IndieAuth is a spiritual successor to OpenID, developed and maintained by the IndieWeb community and based on OAuth 2. This weekend I attended IndieWebCamp East Coast and was inspired to try my hand at an implementation. datasette-indieauth is the result, a new plugin which enables IndieAuth logins to a Datasette instance.[... 1225 words]
Amstelvar (via) A real showcase of what variable fonts can do: this open source font by David Berlow has 17 different variables controlling many different aspects of the font. # 17th November 2020, 3:24 pm
Ok Google: please publish your DKIM secret keys (via) The DKIM standard allows email providers such as Gmail to include cryptographic headers that protect against spoofing, proving that an email was sent by a specific host and has not been tampered with. But it has an unintended side effect: if someone’s email is leaked (as happened to John Podesta in 2016) DKIM headers can be used to prove the validity of the leaked emails. This makes DKIM an enabling factor for blackmail and other security breach related crimes. Matthew Green proposes a neat solution: providers like Gmail should rotate their DKIM keys frequently and publish the PRIVATE key after rotation. By enabling spoofing of past email headers they would provide deniability for victims of leaks, fixing this unintended consequence of the DKIM standard. # 16th November 2020, 10:02 pm
CoronaFaceImpact (via) Variable fonts are fonts that can be customized by passing in additional parameters, which is done in CSS using the font-variation-settings property. Here’s a variable font that shows multiple effects of Covid-19 lockdown on a bearded face, created by Friedrich Althausen. # 15th November 2020, 10:41 pm
Hunting for Malicious Packages on PyPI (via) Jordan Wright installed all 268,000 Python packages from PyPI in containers, and ran Sysdig to capture syscalls made during installation to see if any of them were making extra network calls or reading or writing from the filesystem. Absolutely brilliant piece of security engineering and research. # 14th November 2020, 4:48 am
I gave a talk yesterday about personal data warehouses for GitHub’s OCTO Speaker Series, focusing on my Datasette and Dogsheep projects. The video of the talk is now available, and I’m presenting that here along with an annotated summary of the talk, including links to demos and further information.[... 5150 words]
Intent to Remove: HTTP/2 and gQUIC server push (via) The Chrome / Blink team announce their intent to remove HTTP/2 server push support, where servers can start pushing an asset to a client before it has been requested. It’s been in browsers for over five years now and adoption is terrible. “Over the past 28 days [...] 99.97% of connections never received a pushed stream that got matched with a request [...] These numbers are exactly the same as in June 2019”. Datasette serves redirects with Link: preload headers that cause smart proxies (like Cloudflare) to push the redirected page to the client along with the redirect, but I don’t exepect to miss that optimization if it quietly stops working. # 12th November 2020, 1:44 am
Natalie and I decided to escape San Francisco for election week, and have been holed up in Fort Bragg on the Northern California coast. I’ve mostly been on vacation, but I did find time to make some significant changes to sqlite-utils. Plus notes on an exciting Git scraping project.[... 603 words]
nyt-2020-election-scraper. Brilliant application of git scraping by Alex Gaynor and a growing team of contributors. Takes a JSON snapshot of the NYT’s latest election poll figures every five minutes, then runs a Python script to iterate through the history and build an HTML page showing the trends, including what percentage of the remaining votes each candidate needs to win each state. This is the perfect case study in why it can be useful to take a “snapshot if the world right now” data source and turn it into a git revision history over time. # 6th November 2020, 2:24 pm
selenium-wire. Really useful scraping tool: enhances the Python Selenium bindings to run against a proxy which then allows Python scraping code to look at captured requests—great for if a site you are working with triggers Ajax requests and you want to extract data from the raw JSON that came back. # 2nd November 2020, 6:58 pm
I shipped Datasette 0.51 today, with a new visual design, plugin hooks for adding navigation options, better handling of binary data, URL building utility methods and better support for running Datasette behind a proxy. It’s a lot of stuff! Here are the annotated release notes.[... 2020 words]
Defining Data Intuition. Ryan T. Harter, Principal Data Scientist at Mozilla defines data intuition as “a resilience to misleading data and analyses”. He also introduces the term “data-stink” as a similar term to “code smell”, where your intuition should lead you to distrust analysis that exhibits certain characteristics without first digging in further. I strongly believe that data reports should include a link the raw methodology and numbers to ensure they can be more easily vetted—so that data-stink can be investigated with the least amount of resistance. # 29th October 2020, 3:14 pm
Seniors generally report having more trust in the people around them, a characteristic that may make them more credulous of information that comes from friends and family. There is also the issue of context: Misinformation appears in a stream that also includes baby pictures, recipes and career updates. Users may not expect to toggle between light socializing and heavy truth-assessing when they’re looking at their phone for a few minutes in line at the grocery store.
Apple now receives an estimated $8 billion to $12 billion in annual payments — up from $1 billion a year in 2014 — in exchange for building Google’s search engine into its products. It is probably the single biggest payment that Google makes to anyone and accounts for 14 to 21 percent of Apple’s annual profits.
I’ve been writing my talk for PyCon Argentina this week, which has proved surprisingly time consuming. I hope to have that wrapped up soon—I’m pre-recording it, which it turns out is much more work than preparing a talk to stream live.[... 630 words]
OCTO Speaker Series: Simon Willison—Personal Data Warehouses: Reclaiming Your Data. I’m giving a talk in the GitHub OCTO (Office of the CTO) speaker series about Datasette and my Dogsheep personal analytics project. You can register for free here—the stream will be on Thursday November 12, 2020 at 8:30am PST (4:30pm GMT). # 23rd October 2020, 3 am
CG-SQL (via) This is the toolkit the Facebook Messenger team wrote to bring stored procedures to SQLite. It implements a custom version of the T-SQL language which it uses to generate C code that can then be compiled into a SQLite module. # 22nd October 2020, 6:25 pm
Project LightSpeed: Rewriting the Messenger codebase for a faster, smaller, and simpler messaging app (via) Facebook rewrote their iOS messaging app earlier this year, dropping it from 1.7m lines of code to 360,000 and reducing the binary size to a quarter of what it was. A key part of the new app’s architecture is much heavier reliance on SQLite to coordinate data between views, and to dynamically configure how different views are displayed. They even built their own custom system to add stored procedures to SQLite so they could execute portable business logic inside the database. # 22nd October 2020, 6:22 pm
Writing the code to sign data with a private key and verify it with a public key would have been easier to get correct than correctly invoking the JWT library. In fact, the iOS app (which gets this right) doesn’t use a JWT library at all, but manages to verify using a public key in fewer lines of code than the Android app takes to incorrectly use a JWT library!