Simon Willison’s Weblog

Subscribe

May 2021

May 2, 2021

Query Engines: Push vs. Pull (via) Justin Jaffray (who has worked on Materialize) explains the difference between push and pull query execution engines using some really clear examples built around JavaScript generators.

# 2:49 am / generators, databases, javascript

One year of TILs

Just over a year ago I started tracking TILs, inspired by Josh Branchaud’s collection. I’ve since published 148 TILs across 43 different topics. It’s a great format!

[... 224 words]

Hosting SQLite databases on Github Pages (via) I've seen the trick of running SQLite compiled to WASM in the browser before, but here it comes with an incredibly clever bonus trick: it uses SQLite's page structure to fetch subsets of the database file via HTTP range requests, which means you can run indexed SQL queries against a 600MB database file while only fetching a few MBs of data over the wire. Absolutely brilliant. Tucked away at the end of the post is another neat trick: making the browser DOM available to SQLite as a virtual table, so you can query and update the DOM of the current page using SQL!

# 6:55 pm / sqlite, webassembly

May 3, 2021

Adding GeoDjango to an existing Django project

Work on VIAL for Vaccinate The States continues.

[... 1,503 words]

May 4, 2021

Practical SQL for Data Analysis (via) This is a really great SQL tutorial: it starts with the basics, but quickly moves on to a whole array of advanced PostgreSQL techniques - CTEs, window functions, efficient sampling, rollups, pivot tables and even linear regressions executed directly in the database using regr_slope(), regr_intercept() and regr_r2(). I picked up a whole bunch of tips for things I didn't know you could do with PostgreSQL here.

# 3:11 am / sql, postgresql, haki-benita

Observable Plot (via) This is huge: a brand new high-level JavaScript visualization library from Mike Bostock, the author of D3—partially inspired by Vega-Lite which I’ve used enthusiastically in the past. First impressions are that this is a big step forward for quickly building high-quality visualizations. It’s released under the ISC license which is “functionally equivalent to the BSD 2-Clause and MIT licenses”.

# 4:28 pm / open-source, visualization, observable, mike-bostock, d3, observable-plot

Plot & Vega-Lite. Useful documentation comparing the brand new Observable Plot to Vega-Lite, complete with examples of how to achieve the same thing in both libraries.

# 4:32 pm / d3, observable, visualization, observable-plot

cinder: Instagram’s performance oriented fork of CPython (via) Instagram forked CPython to add some performance-oriented features they wanted, including a method-at-a-time JIT compiler and a mechanism for eagerly evaluating coroutines (avoiding the overhead of creating a coroutine if an awaited function returns a value without itself needing to await). They’re open sourcing the code to help start conversations about implementing some of these features in CPython itself. I particularly enjoyed the warning that accompanies the repo: this is not intended to be a supported release, and if you decide to run it in production you are on your own!

# 10:13 pm / open-source, python

May 5, 2021

A museum bot (via) Shawn Graham built a Twitter bot, using R, which tweets out random items from the collection at the Canadian Science and Technology Museum—using a Datasette instance that he’s running based on a CSV export of their collections data.

# 7:09 pm / twitter, datasette, museums

May 10, 2021

Django SQL Dashboard

Visit Django SQL Dashboard

I’ve released the first non-alpha version of Django SQL Dashboard, which provides an interface for running arbitrary read-only SQL queries directly against a PostgreSQL database, protected by the Django authentication scheme. It can also be used to create saved dashboards that can be published or shared internally.

[... 2,171 words]

May 12, 2021

New Major Versions Released! Flask 2.0, Werkzeug 2.0, Jinja 3.0, Click 8.0, ItsDangerous 2.0, and MarkupSafe 2.0. Huge set of releases from the Pallets team. Python 3.6+ required and comprehensive type annotations. Flask now supports async views, Jinja async templates (used extensively by Datasette) “no longer requires patching”, Click has a bunch of new code around shell tab completion, ItsDangerous supports key rotation and so much more.

# 5:37 pm / flask, jinja, python, async

Async functions require an event loop to run. Flask, as a WSGI application, uses one worker to handle one request/response cycle. When a request comes in to an async view, Flask will start an event loop in a thread, run the view function there, then return the result.

Each request still ties up one worker, even for async views. The upside is that you can run async code within a view, for example to make multiple concurrent database queries, HTTP requests to an external API, etc. However, the number of requests your application can handle at one time will remain the same.

Using async and await in Flask 2.0

# 5:59 pm / async, flask, python

May 13, 2021

Folks think s3 is static assets hosting but really it's a consistent and highly available key value store with first class blob support

Brian LeRoux

# 3:01 pm / s3

May 14, 2021

Powering the Python Package Index in 2021. PyPI now serves “nearly 900 terabytes over more than 2 billion requests per day”. Bandwidth is donated by Fastly, a value estimated at 1.8 million dollars per month! Lots more detail about how PyPI has evolved over the past years in this post by Dustin Ingram.

# 4:50 am / pypi, python, fastly, dustin-ingram

May 17, 2021

geocode-sqlite. Neat command-line Python utility by Chris Amico: point it at a SQLite database file and it will add latitude and longitude columns and populate them by geocoding one or more of the other fields, using your choice from four currently supported geocoders.

# 1:15 am / geocoding, sqlite, chris-amico

No feigning surprise (via) Don’t feign surprise if someone doesn’t know something that you think they should know. Even better: even if you are surprised, don’t let them know! “When people feign surprise, it’s usually to make them feel better about themselves and others feel worse.”

# 4:30 pm / julia-evans, teaching, communication

May 18, 2021

Weeknotes: Velma, more Django SQL Dashboard

Matching locations for Vaccinate The States, fun with GeoJSON and more improvements to Django SQL Dashboard.

[... 555 words]

May 19, 2021

Flat Data. New project from the GitHub OCTO (the Office of the CTO, love that backronym) somewhat inspired by my work on Git scraping: I’m really excited to see GitHub embracing git for CSV/JSON data in this way. Flat incorporates a reusable Action for scraping and storing data (using Deno), a VS Code extension for setting up those workflows and a very nicely designed Flat Viewer web app for browsing CSV and JSON data hosted on GitHub.

# 1:05 am / git-scraping, github, deno

May 24, 2021

How to look at the stack with gdb. Useful short tutorial on gdb from first principles.

# 6:23 pm / c, julia-evans, debugger

HackSoft Django styleguide: services and selectors. HackSoft’s Django styleguide uses the terms “services” and “selectors”. Services are functions that live in services.py and perform business logic operations such as creating new entities that might span multiple Django models. Selectors live in selectors.py and perform more complex database read operations, such as returning objects in a way that respects visibility permissions.

# 7:17 pm / django

May 25, 2021

Weeknotes: Spinning back up on Datasette

I’ve been somewhat distracted from Datasette for the past couple of months, thanks to my work on VIAL and the accompanying open source project django-sql-dashboard. This week I scraped back some time to work on Datasette.

[... 401 words]

May 26, 2021

M1RACLES: M1ssing Register Access Controls Leak EL0 State. You need to read (or at least scan) all the way to the bottom: this security disclosure is a masterpiece. It not only describes a real flaw in the M1 silicon but also deconstructs the whole culture of over-hyped name-branded vulnerability reports. The TLDR is that you don’t really need to worry about this one, and if you’re writing this kind if thing up for a news article you should read all the way to the end first!

# 3:25 pm / journalism, security

May 28, 2021

explain.dalibo.com (via) By far the best tool I’ve seen for turning the output of PostgreSQL EXPLAIN ANALYZE into something I can actually understand—produces a tree visualization which includes clear explanations of what each step (such as a “Index Only Scan Node”) actually means.

# 5:41 pm / postgresql

2021 » May

MTWTFSS
     12
3456789
10111213141516
17181920212223
24252627282930
31