Simon Willison’s Weblog

Subscribe

Friday, 16th August 2024

Fly: We’re Cutting L40S Prices In Half (via) Interesting insider notes from Fly.io on customer demand for GPUs:

If you had asked us in 2023 what the biggest GPU problem we could solve was, we’d have said “selling fractional A100 slices”. [...] We guessed wrong, and spent a lot of time working out how to maximize the amount of GPU power we could deliver to a single Fly Machine. Users surprised us. By a wide margin, the most popular GPU in our inventory is the A10.

[…] If you’re trying to do something GPU-accelerated in response to an HTTP request, the right combination of GPU, instance RAM, fast object storage for datasets and model parameters, and networking is much more important than getting your hands on an H100.

# 4:44 am / fly

Datasette 1.0a15. Mainly bug fixes, but a couple of minor new features:

  • Datasette now defaults to hiding SQLite "shadow" tables, as seen in extensions such as SQLite FTS and sqlite-vec. Virtual tables that it makes sense to display, such as FTS core tables, are no longer hidden. Thanks, Alex Garcia. (#2296)
  • The Datasette homepage is now duplicated at /-/, using the default index.html template. This ensures that the information on that page is still accessible even if the Datasette homepage has been customized using a custom index.html template, for example on sites like datasette.io. (#2393)

Datasette also now serves more user-friendly CSRF pages, an improvement which required me to ship asgi-csrf 0.10.

# 5:06 am / csrf, projects, releases, datasette

LLMs are bad at returning code in JSON (via) Paul Gauthier's Aider is a terminal-based coding assistant which works against multiple different models. As part of developing the project Paul runs extensive benchmarks, and his latest shows an interesting result: LLMs are slightly less reliable at producing working code if you request that code be returned as part of a JSON response.

Coding skill by model and code wrapping strategy - four models, each showing their pass rate % average of five runs. Claude 3.5 Sonnet gets 60.5% with Markdown, 54.1% with JSON. DeepSeek-Coder V2 0724 gets 60.6% with Markdown, 51.1% with JSON. GPT-4o-2024-05-13 gets 60.0% with Markdown, 59.6% with JSON. GPT-4o-2024-08-06 gets 60.8% with Markdown, 57.6% with JSON, and 56.9% with JSON (strict). Markdown consistently performs better than JSON across all models.

The May release of GPT-4o is the closest to a perfect score - the August appears to have regressed slightly, and the new structured output mode doesn't help and could even make things worse (though that difference may not be statistically significant).

Paul recommends using Markdown delimiters here instead, which are less likely to introduce confusing nested quoting issues.

# 5:04 pm / json, ai, prompt-engineering, generative-ai, llms, aider, evals

Having worked at Microsoft for almost a decade, I remember chatting with their security people plenty after meetings. One interesting thing I learned is that Microsoft (and all the other top tech companies presumably) are under constant Advanced Persistent Threat from state actors. From literal secret agents getting jobs and working undercover for a decade+ to obtain seniority, to physical penetration attempts (some buildings on MS campus used to have armed security, before Cloud server farms were a thing!).

com2kid

# 9:11 pm / microsoft, security

datasette-checkbox. I built this fun little Datasette plugin today, inspired by a conversation I had in Datasette Office Hours.

If a user has the update-row permission and the table they are viewing has any integer columns with names that start with is_ or should_ or has_, the plugin adds interactive checkboxes to that table which can be toggled to update the underlying rows.

This makes it easy to quickly spin up an interface that allows users to review and update boolean flags in a table.

Animated demo showing checkboxes in columns for is_done, should_be_deleted and is_happy - checking the checkboxes shows an updated message next to each one which then fades away.

I have ambitions for a much more advanced version of this, where users can do things like add or remove tags from rows directly in that table interface - but for the moment this is a neat starting point, and it only took an hour to build (thanks to help from Claude to build an initial prototype, chat transcript here).

# 9:28 pm / plugins, projects, datasette, claude-3-5-sonnet

Whither CockroachDB? (via) CockroachDB - previously Apache 2.0, then BSL 1.1 - announced on Wednesday that they were moving to a source-available license.

Oxide use CockroachDB for their product's control plane database. That software is shipped to end customers in an Oxide rack, and it's unacceptable to Oxide for their customers to think about the CockroachDB license.

Oxide use RFDs - Requests for Discussion - internally, and occasionally publish them (see rfd1) using their own custom software.

They chose to publish this RFD that they wrote in response to the CockroachDB license change, describing in detail the situation they are facing and the options they considered.

Since CockroachDB is a critical component in their stack which they have already patched in the past, they're opting to maintain their own fork of a recent Apache 2.0 licensed version:

The immediate plan is to self-support on CochroachDB 22.1 and potentially CockroachDB 22.2; we will not upgrade CockroachDB beyond 22.2. [...] This is not intended to be a community fork (we have no current intent to accept outside contributions); we will make decisions in this repository entirely around our own needs. If a community fork emerges based on CockroachDB 22.x, we will support it (and we will specifically seek to get our patches integrated), but we may or may not adopt it ourselves: we are very risk averse with respect to this database and we want to be careful about outsourcing any risk decisions to any entity outside of Oxide.

The full document is a fascinating read - as Kelsey Hightower said:

This is engineering at its finest and not a single line of code was written.

# 10:06 pm / databases, open-source, software-engineering, oxide