Simon Willison’s Weblog

Subscribe

Blogmarks

Filters: Sorted by date

Introducing sqlite-xsv: The Fastest CSV Parser for SQLite. Alex Garcia continues to push the boundaries of SQLite extensions. This new extension in Rust wraps the lightning fast Rust csv crate and provides a new csv_reader() virtual table that can handle regular, gzipped and zstd compressed files.

# 14th January 2023, 9:54 pm / csv, sqlite, rust, alex-garcia, zstd

Examples of floating point problems (via) I learned so much practical stuff from this post by Julia Evans. There are no 32-bit floating point numbers between 262144.0 and 262144.03125, which breaks code that attempts to keep incrementing by 0.01. I knew about the JavaScript tweet ID problem (JavaScript can’t handle numbers like 1612850010110005250) but I didn’t realize it affected jq as well. Lots more great examples in here.

# 13th January 2023, 3:41 pm / javascript, jq, julia-evans

Mapping Python to LLVM (via) Codon is a fascinating new entry in the “compile Python code to something else” world—this time targeting LLVM. Ariya Shajii describes in great detail how it pulls this off, including tricks such as transforming Python generators to LLVM coroutines. Codon doesn’t promise that all Python code will work—it’s best thought of as a Python-like language which can be used to create compiled modules which can then be imported back into regular Python projects.

# 10th January 2023, 2:08 am / compilers, llvm, python

Retiring Pinafore (via) Nolan Lawson built Pinafore, which became my default Mastodon client on both desktop and mobile over the past month. He thoughtfully explains why he’s ending his involvement in the project—and why, for trust reasons, he’s not planning on handing over the reigns to someone else. Pinafore is everything I want a good SPA to be—it loads fast, works offline and packs a whole lot of functionality into a tiny package. I’m sad to see Nolan’s involvement come to end—it’s a superb piece of software.

# 10th January 2023, 2:05 am / javascript, mastodon, nolan-lawson

Petals (via) The challenge with large language models in the same scale ballpark as GPT-3 is that they’re large—really large. Far too big to run on a single machine at home. Petals is a fascinating attempt to address that problem: it works a little bit like BitTorrent, in that each user of Petal runs a subset of the overall language model on their machine and participates in a larger network to run inference across potentially hundreds of distributed GPUs. I tried it just now in Google Colab and it worked exactly as advertised, after downloading an 8GB subset of the 352GB BLOOM-176B model.

# 2nd January 2023, 11:29 pm / ai, gpt-3, generative-ai, llms, bloom, gpus

nanoGPT. “The simplest, fastest repository for training/finetuning medium-sized GPTs”—by Andrej Karpathy, in about 600 lines of Python.

# 2nd January 2023, 11:27 pm / python, ai, gpt-3, andrej-karpathy, generative-ai, llms

In 2022, web3 went just great. Molly White’s essential roundup of 2022 in cryptocurrency. “$4.27 billion was stolen in various hacks and scams this year alone”.

# 1st January 2023, 5:13 am / web3, blockchain, molly-white

Draw SVG rope using JavaScript (via) Delightful interactive tutorial by Stanko Tadić showing how to render an illustration of a rope using SVG, starting with a path. The way the tutorial is presented is outstanding.

# 31st December 2022, 5:31 pm / graphics, javascript, svg, explorables

Reverse Prompt Engineering for Fun and (no) Profit (via) swyx pulls off some impressive prompt leak attacks to reverse engineer the new AI features that just got added to Notion. He concludes that “Prompts are like clientside JavaScript. They are shipped as part of the product, but can be reverse engineered easily, and the meaningful security attack surface area is exactly the same.”

# 28th December 2022, 8:56 pm / gpt-3, prompt-engineering, prompt-injection, swyx, generative-ai, llms

Detailed comment on HN describing how Second Life works these days. “There are about 27,500 live regions today, each with its own simulator program, always on even if nobody is using it. Each simulator program takes about one CPU and under 4GB on a server.”

# 24th December 2022, 11:57 pm / secondlife

Speech-to-text with Whisper: How I Use It & Why. Sumana Harihareswara’s in-depth review of Whisper, the shockingly effective open source text-to-speech transcription model release by OpenAI a few months ago. Includes an extremely thoughtful section considering the ethics of using this model—some of the most insightful short-form writing I’ve seen on AI model ethics generally.

# 22nd December 2022, 9:49 pm / ethics, ai, openai, whisper, ai-ethics, speech-to-text

Boring Python: code quality. James Bennett provides an opinionated guide to setting up Python tools for linting, code formatting and and other code quality concerns. Of particular interest to me is his section on packaging checks, which introduces a whole bunch of new-to-me tools that can help avoid accidentally shipping broken packages to PyPI.

# 20th December 2022, 7:55 pm / james-bennett, packaging, python

Playing with ActivityPub (via) Tom MacWright describes his attempts to build the simplest possible ActivityPub publication—for a static site powered by Jekyll, where he used Netlify functions to handle incoming subscriptions (storing them in PlanetScale via their Deno API library) and wrote a script which loops through and notifies all of his subscriptions every time he publishes something new.

# 10th December 2022, 12:58 am / deno, tom-macwright, mastodon, activitypub

Data-driven performance optimization with Rust and Miri (via) Useful guide to some Rust performance optimization tools. Miri can be used to dump out a detailed JSON profile of a program which can then be opened and explored using the Chrome browser’s performance tool.

# 9th December 2022, 5:19 pm / chrome, performance, rust

Introducing sqlite-loadable-rs: A framework for building SQLite Extensions in Rust. Alex Garcia has built a new Rust library for creating SQLite extensions—initially supporting custom scalar functions, virtual tables and table functions and with more types of extension coming soon. This looks very easy to use, partly because the documentation and examples are already delightfully thorough, especially for an initial release.

# 7th December 2022, 11:08 pm / sqlite, rust, alex-garcia

talk.wasm (via) “Talk with an Artificial Intelligence in your browser”. Absolutely stunning demo which loads the Whisper speech recognition model (75MB) and a GPT-2 model (240MB) and executes them both in your browser via WebAssembly, then uses the Web Speech API to talk back to you. The result is a full speak-with-an-AI interface running entirely client-side. GPT-2 sadly mostly generates gibberish but the fact that this works at all is pretty astonishing.

# 7th December 2022, 10:52 pm / ai, webassembly, gpt-3, openai, generative-ai, whisper, speech-to-text

I Taught ChatGPT to Invent a Language (via) Dylan Black talks ChatGPT through the process of inventing a new language, with its own grammar. Really fun example of what happens when someone with a deep understanding of both the capabilities of language models and some other field (in this case linguistics) can achieve with an extended prompting session.

# 6th December 2022, 7:30 pm / linguistics, gpt-3, openai, prompt-engineering, generative-ai, chatgpt, llms

Understanding a Protocol. Andrew’s latest notes on how ActivityPub and Mastodon work under the hood, based on his extensive development work building out Takahē.

# 6th December 2022, 12:50 am / andrew-godwin, mastodon, activitypub

Building A Virtual Machine inside ChatGPT (via) Jonas Degrave presents a remarkable example of a creative use of ChatGPT: he prompts it to behave as a if it was a Linux shell, then runs increasingly complex sequences of commands against it and gets back surprisingly realistic results. By the end of the article he’s getting it to hallucinate responses to curl API requests run against imagined API versions of itself.

# 5th December 2022, 1:43 am / ai, gpt-3, openai, generative-ai, chatgpt, llms

three.js examples: webgl_postprocessing_pixel (via) Neat new example for three.js that uses a pixel-shader postprocessor to apply an isometric pixel-art feel to a 3D scene.

# 1st December 2022, 9:57 pm / 3d, javascript, pixelart, webgl

Scaling Mastodon: The Compendium (via) Hazel Weakly’s collection of notes on scaling Mastodon, covering PostgreSQL, Sidekiq, Redis, object storage and more.

# 29th November 2022, 5:46 am / postgresql, redis, scaling, mastodon, sidekiq

Stable Diffusion 2.0 and the Importance of Negative Prompts for Good Results. Stable Diffusion 2.0 is out, and it’s a very different model from 1.4/1.5. It’s trained using a new text encoder (OpenCLIP, in place of OpenAI’s CLIP) which means a lot of the old tricks—notably using “Greg Rutkowski” to get high quality fantasy art—no longer work. What DOES work, incredibly well, is negative prompting—saying things like “cyberpunk forest by Salvador Dali” but negative on “trees, green”. Max Woolf explores negative prompting in depth in this article, including how to combine it with textual inversion.

# 29th November 2022, 1:22 am / max-woolf, stable-diffusion, prompt-engineering, generative-ai, text-to-image

An Interactive Guide to Flexbox. Joshua Comeau built this fantastic guide to CSS flexbox layouts, with interactive examples of all of the properties. This is a really useful tour of the layout model.

# 26th November 2022, 2:50 am / css

Microsoft Flight Simulator: WebAssembly (via) This is such a smart application of WebAssembly: it can now be used to write extensions for Microsoft Flight Simulator, which means you can run code from untrusted sources safely in a sandbox. I’m really looking forward to more of this kind of usage—I love the idea of finally having a robust sandbox for running things like plugins.

# 24th November 2022, 2:08 am / microsoft, webassembly

Building a BFT JSON CRDT (via) Jacky Zhao describes their project to build a CRDT library for JSON data in Rust, and includes a thorough explanation of what CRDTs are and how they work. “I write this blog post mostly as a note to my past self, distilling a lot of what I’ve learned since into a blog post I wish I had read before going in”—the best kind of blog post!

# 21st November 2022, 7:56 pm / json, rust, crdt

Every remaining website using the .museum TLD (via) Jonty did a survey of every one of the 1,134 domains using the .museum TLD, which dates back to 2001 and is managed by The Museum Domain Management Association.

# 20th November 2022, 12:53 am / domains, museums

Datasette Lite: Loading JSON data (via) I added a new feature to Datasette Lite: you can now pass it the URL to a JSON file (hosted on a CORS-compatible hosting provider such as GitHub or GitHub Gists) and it will load that file into a database table for you. It expects an array of objects, but if your file has an object as the root it will search through it looking for the first key that is an array of objects and load those instead.

# 18th November 2022, 6:43 pm / json, projects, datasette, datasette-lite, cors

fasiha/yamanote (via) Yamanote is “a guerrilla bookmarking server” by Ahmed Fasih—it works using a bookmarklet that grabs a full serialized copy of the page—the innerHTML of both the head and body element—and passes it to the server, which stores it in a SQLite database. The files are then served with a Content-Security-Policy’: `default-src ’self’ header to prevent stored pages from fetching ANY external assets when they are viewed.

# 16th November 2022, 3:48 am / bookmarks, sqlite, content-security-policy

JSON Changelog with SQLite (via) One of my favourite database challenges is how to track changes to rows over time. This is a neat recipe from 2018 which uses SQLite triggers and the SQLite JSON functions to serialize older versions of the rows and store them in TEXT columns.

# 16th November 2022, 3:41 am / databases, json, sqlite

Home invasion: Mastodon’s Eternal September begins. Hugh Rundle’s thoughtful write-up of the impact of the massive influx of new users from Twitter on the existing Mastodon community. If you’re new to Mastodon (like me) you should read this and think carefully about how best to respectfully integrate with your new online space.

# 11th November 2022, 12:47 am / mastodon

Years

Tags