Simon Willison’s Weblog

Subscribe

Items tagged rust in 2023

Filters: Year: 2023 × rust × Sorted by date


ast-grep (via) There are a lot of interesting things about this year-old project.

sg (an alias for ast-grep) is a CLI tool for running AST-based searches against code, built in Rust on top of the Tree-sitter parsing library. You can run commands like this:

sg -p ’await await_me_maybe($ARG)’ datasette --lang python

To search the datasette directory for code that matches the search pattern, in a syntax-aware way.

It works across 19 different languages, and can handle search-and-replace too, so it can work as a powerful syntax-aware refactoring tool.

My favourite detail is how it’s packaged. You can install the CLI utility using Homebrew, Cargo, npm or pip/pipx—each of which will give you a CLI tool you can start running. On top of that it provides API bindings for Rust, JavaScript and Python! # 10th December 2023, 7:56 pm

Writing Python like it’s Rust (via) Fascinating article by Jakub Beránek describing in detail patterns for using type annotations in Python inspired by working in Rust. I learned new tricks about both languages from reading this. # 21st May 2023, 12:18 am

Rye. Armin Ronacher’s take on a Python packaging tool. There are a lot of interesting ideas in this one—it’s written in Rust, configured using pyproject.toml and has some very strong opinions, including completely hiding pip from view and insisting you use “rye add package” instead. Notably, it doesn’t use the system Python at all: instead, it downloads a pre-compiled standalone Python from Gregory Szorc’s python-build-standalone project—the same approach I used for the Datasette Desktop Electron app.

Armin warns that this is just an exploration, with no guarantees of future maintenance—and even has an issue open titled “Should Rye exist?” # 24th April 2023, 4:02 am

How Discord Stores Trillions of Messages (via) This is a really interesting case-study. Discord migrated from MongoDB to Cassandra back in 2016 to handle billions of messages. Today they’re handling trillions, and they completed a migration from Cassandra to Scylla, a Cassandra-like data store written in C++ (as opposed to Cassandra’s Java) to help avoid problems like GC pauses. In addition to being a really good scaling war story this has some interesting details about their increased usage of Rust. As a fan of request coalescing (which I’ve previously referred to as dogpile prevention) I particularly liked this bit:

“Our data services sit between the API and our ScyllaDB clusters. They contain roughly one gRPC endpoint per database query and intentionally contain no business logic. The big feature our data services provide is request coalescing. If multiple users are requesting the same row at the same time, we’ll only query the database once. The first user that makes a request causes a worker task to spin up in the service. Subsequent requests will check for the existence of that task and subscribe to it. That worker task will query the database and return the row to all subscribers.” # 8th March 2023, 7:07 pm

The technology behind GitHub’s new code search (via) I’ve been a beta user of the new GitHub code search for a while and I absolutely love it: you really can run a regular expression search across the entire of GitHub, which is absurdly useful for both finding code examples of under-documented APIs and for seeing how people are using open source code that you have released yourself. It turns out GitHub built their own search engine for this from scratch, called Blackbird. It’s implemented in Rust and makes clever use of sharded ngram indexes—not just trigrams, because it turns out those aren’t quite selective enough for a corpus that includes a lot of three letter keywords like “for”.

I also really appreciated the insight into how they handle visibility permissions: they compile those into additional internal search clauses, resulting in things like “RepoIDs(...) or PublicRepo()” # 6th February 2023, 6:38 pm

sqlite-jsonschema. “A SQLite extension for validating JSON objects with JSON Schema”, building on the jsonschema Rust crate. SQLite and JSON are already a great combination—Alex suggests using this extension to implement check constraints to validate JSON columns before inserting into a table, or just to run queries finding existing data that doesn’t match a given schema. # 28th January 2023, 3:50 am

sqlite-ulid. Alex Garcia’s sqlite-ulid adds lightning-fast SQL functions for generating ULIDs—Universally Unique Lexicographically Sortable Identifiers. These work like UUIDs but are smaller and faster to generate, and can be canonically encoded as a URL-safe 26 character string (UUIDs are 36 characters). Again, this builds on a Rust crate—ulid-rs—and can generate 1 million byte-represented ULIDs with the ulid_bytes() function in just 88.4ms. # 28th January 2023, 3:45 am

sqlite-fastrand. Alex Garcia just dropped three new SQLite extensions, and I’m going to link to all of them. The first is sqlite-fastrand, which adds new functions for generating random numbers (and alphanumeric characters too). Impressively, these out-perform the default SQLite random() and randomblob() functions by about 1.6-2.6x, thanks to being built on the Rust fastrand crate which builds on wyhash, an extremely fast (though not cryptographically secure) hashing function. # 28th January 2023, 3:41 am

datasette-granian (via) Granian is a new Python web server—similar to Gunicorn—written in Rust. I built a small plugin that adds a “datasette granian” command starting a Granian server that serves Datasette’s ASGI application, using the same pattern as my existing datasette-gunicorn plugin. # 20th January 2023, 2:12 am

Servo to Advance in 2023 (via) This is excellent news: Serve, the browser-in-Rust project started by Mozilla in 2012 that produced the Rust programming language, is getting re-activated with four new full-time developers provided by Igalia.

Igalia are a fascinating organization—I hadn’t realized quite how influential they’ve been until I read their Wikipedia page just now

They’ve been around since 2001, and “in 2019 they were the #2 committers to both the WebKit and Chromium codebases and in the top 10 contributors to Gecko/Servo”—including implementing and maintaining CSS Grid Layout! # 16th January 2023, 5:08 pm

Introducing sqlite-xsv: The Fastest CSV Parser for SQLite. Alex Garcia continues to push the boundaries of SQLite extensions. This new extension in Rust wraps the lightning fast Rust csv crate and provides a new csv_reader() virtual table that can handle regular, gzipped and zstd compressed files. # 14th January 2023, 9:54 pm