Simon Willison’s Weblog

Subscribe

January 2023

Jan. 26, 2023

Guppe Groups. This is a really neat mechanism for helping build topic-oriented communities on Mastodon: follow @any-group-name@a.gup.pe to join (or create) a group, then that account will re-broadcast any messages from people in that group who mention the group in their message.

I found it via the histodons group. I was pondering how something like this might work this recently, so it’s great to see someone has built it already.

# 7:45 pm / mastodon, fediverse

Jan. 27, 2023

Exploring MusicCaps, the evaluation data released to accompany Google’s MusicLM text-to-music model

Visit Exploring MusicCaps, the evaluation data released to accompany Google's MusicLM text-to-music model

Google Research just released MusicLM: Generating Music From Text. It’s a new generative AI model that takes a descriptive prompt and produces a “high-fidelity” music track. Here’s the paper (and a more readable version using arXiv Vanity).

[... 1,323 words]

babelmark3 (via) I found this tool today while investigating an bug in Datasette’s datasette-render-markdown plugin: it lets you run a fragment of Markdown through dozens of different Markdown libraries across multiple different languages and compare the results. Under the hood it works with a registry of API URL endpoints for different implementations, most of which are encrypted in the configuration file on GitHub because they are only intended to be used by this comparison tool.

# 11:34 pm / apis, markdown

graphql-voyager. Neat tool for producing an interactive graph visualization of any GraphQL API. Click “Change schema” and then “Introspection” and it will give you a GraphQL query you can run against your own API—copy and paste back the JSON results and the visualizer will show you how your API fits together. I tested this against a datasette-graphql instance and it worked exactly as described.

# 11:58 pm / graphql

Jan. 28, 2023

sqlite-fastrand. Alex Garcia just dropped three new SQLite extensions, and I’m going to link to all of them. The first is sqlite-fastrand, which adds new functions for generating random numbers (and alphanumeric characters too). Impressively, these out-perform the default SQLite random() and randomblob() functions by about 1.6-2.6x, thanks to being built on the Rust fastrand crate which builds on wyhash, an extremely fast (though not cryptographically secure) hashing function.

# 3:41 am / rust, sqlite, alex-garcia

sqlite-ulid. Alex Garcia’s sqlite-ulid adds lightning-fast SQL functions for generating ULIDs—Universally Unique Lexicographically Sortable Identifiers. These work like UUIDs but are smaller and faster to generate, and can be canonically encoded as a URL-safe 26 character string (UUIDs are 36 characters). Again, this builds on a Rust crate—ulid-rs—and can generate 1 million byte-represented ULIDs with the ulid_bytes() function in just 88.4ms.

# 3:45 am / uuid, sqlite, alex-garcia, rust

sqlite-jsonschema. “A SQLite extension for validating JSON objects with JSON Schema”, building on the jsonschema Rust crate. SQLite and JSON are already a great combination—Alex suggests using this extension to implement check constraints to validate JSON columns before inserting into a table, or just to run queries finding existing data that doesn’t match a given schema.

# 3:50 am / jsonschema, rust, sqlite, json, alex-garcia

Cyber (via) “Cyber is a new language for fast, efficient, and concurrent scripting.” Lots of interesting ideas in here, but the one that really caught my eye is that its designed to be easily embedded into other languages and “will allow the host to insert gas mileage checks in user scripts. This allows the host to control how long a script can run”—my dream feature for implementing a safe, sandboxed extension mechanism! Cyber is implemented using Zig and LLVM.

# 4:25 am / sandboxing, zig, programming-languages, llvm

We’ve built many tools for publishing to the web - but I want to make the claim that we have underdeveloped the tools and platforms for publishing collections, indexes and small databases. It’s too hard to build these kinds of experiences, too hard to maintain them and a lack of collaborative tools.

Tom Critchlow

# 4:43 pm / datasette

Jan. 29, 2023

Examples of sites built using Datasette (via) I gave the examples page on the Datasette website a significant upgrade today: it now includes screenshots (taken using shot-scraper) of six projects chosen to illustrate the variety of problems Datasette can be used to tackle.

# 3:40 am / projects, shot-scraper, datasette

datasette-scraper walkthrough on YouTube (via) datasette-scraper is Colin Dellow’s new plugin that turns Datasette into a powerful web scraping tool, with a web UI based on plugin-driven customizations to the Datasette interface. It’s really impressive, and this ten minute demo shows quite how much it is capable of: it can crawl sitemaps and fetch pages, caching them (using zstandard with optional custom dictionaries for extra compression) to speed up subsequent crawls... and you can add your own plugins to extract structured data from crawled pages and save it to a separate SQLite table!

# 5:23 am / scraping, datasette, plugins, colin-dellow

Jan. 30, 2023

datasette-scraper, Big Local News and other weeknotes

Visit datasette-scraper, Big Local News and other weeknotes

In addition to exploring the new MusicCaps training and evaluation data I’ve been working on the big Datasette JSON refactor, and getting excited about a Datasette project that I didn’t work on at all.

[... 1,744 words]

2023 » January

MTWTFSS
      1
2345678
9101112131415
16171819202122
23242526272829
3031