Simon Willison’s Weblog

Subscribe

October 2020

Oct. 2, 2020

Weeknotes: Datasette column actions, plus three new plugins

Visit Weeknotes: Datasette column actions, plus three new plugins

A renewed emphasis on building out Datasette Cloud has produced three new plugins this week: datasette-dateutil, datasette-import-table and datasette-edit-schema, plus a major improvement to Datasette’s default interface for browsing tables.

[... 1,093 words]

Oct. 4, 2020

Potential new elevator pitch / tagline for Datasette: The best way to publish data online (via) One of the biggest challenges I’ve had with Datasette is compressing it into a single elevator pitch or tagline that helps answer the question “what does this software do?”—the project does a lot of different things, so finding the right angle for explaining it has proved really difficult. I’m workshopping a new tagline over on the Datasette discussion forum—feedback, suggestions and challenges very welcome!

# 12:03 am / marketing, pitching, datasette

Oct. 5, 2020

I’ve often joked with other internet culture reporters about what I call the “normie tipping point.” In every emerging internet trend, there is a point at which “normies” — people who don’t spend all day online, and whose brains aren’t rotted by internet garbage — start calling, texting and emailing us to ask what’s going on. Why are kids eating Tide Pods? What is the Momo Challenge? Who is Logan Paul, and why did he film himself with a dead body?

The normie tipping point is a joke, but it speaks to one of the thorniest questions in modern journalism, specifically on this beat: When does the benefit of informing people about an emerging piece of misinformation outweigh the possible harms?

Kevin Roose

# 3:40 pm / journalism

Oct. 7, 2020

Running Datasette on DigitalOcean App Platform (via) I spent some time with DigitalOcean’s new App Platform today, which is a Heroku-style PaaS that starts at $5/month. It looks like it could be a really good fit for Datasette. Disk is ephemeral, but if you’re publishing read-only data that doesn’t matter since you can build the SQLite database as part of the deployment and bundle it up in the Docker/Kubernetes container.

# 2:52 am / deployment, heroku, datasette, digitalocean

Bedrock: The SQLitening (via) Back in March 2018 www.mozilla.org switched over to running on Django using SQLite! They’re using the same pattern I’ve been exploring with Datasette: their SQLite database is treated as a read-only cache by their frontend servers, and a new SQLite database is built by a separate process and fetched onto the frontend machines every five minutes by a scheduled task. They have a healthcheck page which shows the latest version of the database and when it was fetched, and even lets you download the 25MB SQLite database directly (I’ve been exploring it using Datasette).

# 11:47 pm / django, mozilla, sqlite, datasette, baked-data

Oct. 9, 2020

Git scraping: track changes over time by scraping to a Git repository

Visit Git scraping: track changes over time by scraping to a Git repository

Git scraping is the name I’ve given a scraping technique that I’ve been experimenting with for a few years now. It’s really effective, and more people should use it.

[... 963 words]

Datasette 0.50: The annotated release notes

Visit Datasette 0.50: The annotated release notes

I released Datasette 0.50 this morning, with a new user-facing column actions menu feature and a way for plugins to make internal HTTP requests to consume the JSON API of their parent Datasette instance.

[... 792 words]

Animating a commit based Sudoku game using Puppeteer (via) This is really clever. There’s a GitHub repo that tracks progress in a game of Sudoku: Anish Karandikar wrote code which iterates through the game board state commit by commit, uses that state to generate an HTML table, passes that table to Puppeteer using a data: URI, renders a PNG of each stage and then concatenates those PNGs together into an animated GIF using the gifencoder Node.js library.

# 10:28 pm / datauri, gifs, puppeteer

Oct. 10, 2020

Weeknotes: Mainly Datasette 0.50

Most of what I’ve been up to this week is covered in Datasette 0.50: The annotated release notes and Git scraping: track changes over time by scraping to a Git repository.

[... 196 words]

Datasette Weekly (via) I’m trying something new: I’ve decided to start an email newsletter called the Datasette Weekly (I’m already worried I’ll regret that weekly promise) which will share news about Datasette and the Datasette ecosystem, plus tips and tricks for getting the most out of Datasette and SQLite.

# 7:05 pm / email, projects, datasette

Datasette Weekly: Datasette 0.50, git scraping, extracting columns (via) The first edition of the new Datasette Weekly newsletter—covering Datasette 0.50, Git scraping, extracting columns with sqlite-utils and featuring datasette-graphql as the first “plugin of the week”

# 9 pm / email, projects, sqlite, graphql, datasette, git-scraping, sqlite-utils

It’s probably a bad idea to risk paying your ransom, though — the US Treasury Dept has issued clarifying guidance that companies paying off ransomware, and all companies facilitating the payment, can be charged with sanctions violations if the bitcoins end up at North Korea or sanctioned cybercrime groups.

David Gerard

# 9:17 pm / crime, bitcoin

Oct. 12, 2020

evernote-to-sqlite (via) The latest tool in my Dogsheep series of utilities for personal analytics: evernote-to-sqlite takes Evernote note exports en their ENEX XML format and loads them into a SQLite database. Embedded images are loaded into a BLOB column and the output of their cloud-based OCR system is added to a full-text search index. Notes have a latitude and longitude which means you can visualize your notes on a map using Datasette and datasette-cluster-map.

# 12:38 am / projects, sqlite, datasette, dogsheep

xml-analyser. In building evernote-to-sqlite I dusted off an ancient (2009) project I built that scans through an XML file and provides a summary of what elements are present in the document and how they relate to each other. I’ve now packaged it up as a CLI app and published it on PyPI.

# 12:41 am / projects, xml

Oct. 16, 2020

Building an Evernote to SQLite exporter

Visit Building an Evernote to SQLite exporter

I’ve been using Evernote for over a decade, and I’ve long wanted to export my data from it so I can do interesting things with it.

[... 1,879 words]

Weeknotes: evernote-to-sqlite, Datasette Weekly, scrapers, csv-diff, sqlite-utils

Visit Weeknotes: evernote-to-sqlite, Datasette Weekly, scrapers, csv-diff, sqlite-utils

This week I built evernote-to-sqlite (see Building an Evernote to SQLite exporter), launched the Datasette Weekly newsletter, worked on some scrapers and pushed out some small improvements to several other projects.

[... 827 words]

Oct. 19, 2020

The stampede of the affluent into grim-faced, highly competitive sports has been a tragicomedy of perverse incentives and social evolution in unequal times: a Darwinian parable of the mayhem that can ensue following the discovery of even a minor advantage. Like a peacock rendered nearly flightless by gaudy tail feathers, the overserved athlete is the product of a process that has become maladaptive, and is now harming the very blue-chip demographic it was supposed to help.

Ruth S. Barrett

# 3:09 pm / sports

Dogsheep: Personal analytics with Datasette. The second edition of my new Datasette Weekly newsletter, talks about Dogsheep, Dogsheep Beta, Datasette 1.0 and features datasette-cluster-map as the plugin of the week.

# 4:38 pm / datasette, dogsheep

Oct. 21, 2020

Pikchr. Interesting new project from SQLite creator D. Richard Hipp. Pikchr is a new mini language for describing visual diagrams, designed to be embedded in Markdown documentation. It’s already enabled for the SQLite forum. Implementation is a no-dependencies C library and output is SVG.

# 4:02 pm / c, sqlite, svg, markdown, d-richard-hipp

Proof of concept: sqlite_utils magic for Jupyter (via) Tony Hirst has been experimenting with building a Jupyter “magic” that adds special syntax for using sqlite-utils to insert data and run queries. Query results come back as a Pandas DataFrame, which Jupyter then displays as a table.

# 5:26 pm / pandas, sqlite, tony-hirst, jupyter, sqlite-utils

Writing the code to sign data with a private key and verify it with a public key would have been easier to get correct than correctly invoking the JWT library. In fact, the iOS app (which gets this right) doesn’t use a JWT library at all, but manages to verify using a public key in fewer lines of code than the Android app takes to incorrectly use a JWT library!

James 'zofrex' Sanderson

# 9:34 pm / security, jwt

Oct. 22, 2020

Project LightSpeed: Rewriting the Messenger codebase for a faster, smaller, and simpler messaging app (via) Facebook rewrote their iOS messaging app earlier this year, dropping it from 1.7m lines of code to 360,000 and reducing the binary size to a quarter of what it was. A key part of the new app’s architecture is much heavier reliance on SQLite to coordinate data between views, and to dynamically configure how different views are displayed. They even built their own custom system to add stored procedures to SQLite so they could execute portable business logic inside the database.

# 6:22 pm / facebook, sqlite

CG-SQL (via) This is the toolkit the Facebook Messenger team wrote to bring stored procedures to SQLite. It implements a custom version of the T-SQL language which it uses to generate C code that can then be compiled into a SQLite module.

# 6:25 pm / c, facebook, sqlite

Oct. 23, 2020

OCTO Speaker Series: Simon Willison—Personal Data Warehouses: Reclaiming Your Data. I’m giving a talk in the GitHub OCTO (Office of the CTO) speaker series about Datasette and my Dogsheep personal analytics project. You can register for free here—the stream will be on Thursday November 12, 2020 at 8:30am PST (4:30pm GMT).

# 3 am / github, speaking, datasette, dogsheep

Oct. 24, 2020

Weeknotes: incremental improvements

I’ve been writing my talk for PyCon Argentina this week, which has proved surprisingly time consuming. I hope to have that wrapped up soon—I’m pre-recording it, which it turns out is much more work than preparing a talk to stream live.

[... 630 words]

Oct. 26, 2020

Apple now receives an estimated $8 billion to $12 billion in annual payments — up from $1 billion a year in 2014 — in exchange for building Google’s search engine into its products. It is probably the single biggest payment that Google makes to anyone and accounts for 14 to 21 percent of Apple’s annual profits.

Apple, Google and a Deal That Controls the Internet

# 3:10 am / apple, google

Oct. 29, 2020

Seniors generally report having more trust in the people around them, a characteristic that may make them more credulous of information that comes from friends and family. There is also the issue of context: Misinformation appears in a stream that also includes baby pictures, recipes and career updates. Users may not expect to toggle between light socializing and heavy truth-assessing when they’re looking at their phone for a few minutes in line at the grocery store.

Michael Hobbes

# 3:06 pm / social-media, disinformation

Defining Data Intuition. Ryan T. Harter, Principal Data Scientist at Mozilla defines data intuition as “a resilience to misleading data and analyses”. He also introduces the term “data-stink” as a similar term to “code smell”, where your intuition should lead you to distrust analysis that exhibits certain characteristics without first digging in further. I strongly believe that data reports should include a link the raw methodology and numbers to ensure they can be more easily vetted—so that data-stink can be investigated with the least amount of resistance.

# 3:14 pm / analytics, mozilla, datascience

2020 » October

MTWTFSS
   1234
567891011
12131415161718
19202122232425
262728293031