<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: performance</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/performance.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-03-13T03:44:34+00:00</updated><author><name>Simon Willison</name></author><entry><title>Shopify/liquid: Performance: 53% faster parse+render, 61% fewer allocations</title><link href="https://simonwillison.net/2026/Mar/13/liquid/#atom-tag" rel="alternate"/><published>2026-03-13T03:44:34+00:00</published><updated>2026-03-13T03:44:34+00:00</updated><id>https://simonwillison.net/2026/Mar/13/liquid/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/Shopify/liquid/pull/2056"&gt;Shopify/liquid: Performance: 53% faster parse+render, 61% fewer allocations&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
PR from Shopify CEO Tobias Lütke against Liquid, Shopify's open source Ruby template engine that was somewhat inspired by Django when Tobi first created it &lt;a href="https://simonwillison.net/2005/Nov/6/liquid/"&gt;back in 2005&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Tobi found dozens of new performance micro-optimizations using a variant of &lt;a href="https://github.com/karpathy/autoresearch"&gt;autoresearch&lt;/a&gt;, Andrej Karpathy's new system for having a coding agent run hundreds of semi-autonomous experiments to find new effective techniques for training &lt;a href="https://github.com/karpathy/nanochat"&gt;nanochat&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Tobi's implementation started two days ago with this &lt;a href="https://github.com/Shopify/liquid/blob/2543fdc1a101f555db208fb0deeb2e3bf1ae9e36/auto/autoresearch.md"&gt;autoresearch.md&lt;/a&gt; prompt file and an &lt;a href="https://github.com/Shopify/liquid/blob/2543fdc1a101f555db208fb0deeb2e3bf1ae9e36/auto/autoresearch.sh"&gt;autoresearch.sh&lt;/a&gt; script for the agent to run to execute the test suite and report on benchmark scores.&lt;/p&gt;
&lt;p&gt;The PR now lists &lt;a href="https://github.com/Shopify/liquid/pull/2056/commits"&gt;93 commits&lt;/a&gt; from around 120 automated experiments. The PR description lists what worked in detail - some examples:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Replaced StringScanner tokenizer with &lt;code&gt;String#byteindex&lt;/code&gt;.&lt;/strong&gt; Single-byte &lt;code&gt;byteindex&lt;/code&gt; searching is ~40% faster than regex-based &lt;code&gt;skip_until&lt;/code&gt;. This alone reduced parse time by ~12%.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pure-byte &lt;code&gt;parse_tag_token&lt;/code&gt;.&lt;/strong&gt; Eliminated the costly &lt;code&gt;StringScanner#string=&lt;/code&gt; reset that was called for every &lt;code&gt;{% %}&lt;/code&gt; token (878 times). Manual byte scanning for tag name + markup extraction is faster than resetting and re-scanning via StringScanner. [...]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cached small integer &lt;code&gt;to_s&lt;/code&gt;.&lt;/strong&gt; Pre-computed frozen strings for 0-999 avoid 267 &lt;code&gt;Integer#to_s&lt;/code&gt; allocations per render.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This all added up to a 53% improvement on benchmarks - truly impressive for a codebase that's been tweaked by hundreds of contributors over 20 years.&lt;/p&gt;
&lt;p&gt;I think this illustrates a number of interesting ideas:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Having a robust test suite - in this case 974 unit tests - is a &lt;em&gt;massive unlock&lt;/em&gt; for working with coding agents. This kind of research effort would not be possible without first having a tried and tested suite of tests.&lt;/li&gt;
&lt;li&gt;The autoresearch pattern - where an agent brainstorms a multitude of potential improvements and then experiments with them one at a time - is really effective.&lt;/li&gt;
&lt;li&gt;If you provide an agent with a benchmarking script "make it faster" becomes an actionable goal.&lt;/li&gt;
&lt;li&gt;CEOs can code again! Tobi has always been more hands-on than most, but this is a much more significant contribution than anyone would expect from the leader of a company with 7,500+ employees. I've seen this pattern play out a lot over the past few months: coding agents make it feasible for people in high-interruption roles to productively work with code again.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here's Tobi's &lt;a href="https://github.com/tobi"&gt;GitHub contribution graph&lt;/a&gt; for the past year, showing a significant uptick following that &lt;a href="https://simonwillison.net/tags/november-2025-inflection/"&gt;November 2025 inflection point&lt;/a&gt; when coding agents got really good.&lt;/p&gt;
&lt;p&gt;&lt;img alt="1,658 contributions in the last year - scattered lightly through Jun, Aug, Sep, Oct and Nov and then picking up significantly in Dec, Jan, and Feb." src="https://static.simonwillison.net/static/2026/tobi-contribs.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;He used &lt;a href="https://github.com/badlogic/pi-mono"&gt;Pi&lt;/a&gt; as the coding agent and released a new &lt;a href="https://github.com/davebcn87/pi-autoresearch"&gt;pi-autoresearch&lt;/a&gt; plugin in collaboration with David Cortés, which maintains state in an &lt;code&gt;autoresearch.jsonl&lt;/code&gt; file &lt;a href="https://github.com/Shopify/liquid/blob/3182b7c1b3758b0f5fe2d0fcc71a48bbcb11c946/autoresearch.jsonl"&gt;like this one&lt;/a&gt;.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://x.com/tobi/status/2032212531846971413"&gt;@tobi&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rails"&gt;rails&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ruby"&gt;ruby&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/andrej-karpathy"&gt;andrej-karpathy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/november-2025-inflection"&gt;november-2025-inflection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tobias-lutke"&gt;tobias-lutke&lt;/a&gt;&lt;/p&gt;



</summary><category term="django"/><category term="performance"/><category term="rails"/><category term="ruby"/><category term="ai"/><category term="andrej-karpathy"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="coding-agents"/><category term="agentic-engineering"/><category term="november-2025-inflection"/><category term="tobias-lutke"/></entry><entry><title>How uv got so fast</title><link href="https://simonwillison.net/2025/Dec/26/how-uv-got-so-fast/#atom-tag" rel="alternate"/><published>2025-12-26T23:43:15+00:00</published><updated>2025-12-26T23:43:15+00:00</updated><id>https://simonwillison.net/2025/Dec/26/how-uv-got-so-fast/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://nesbitt.io/2025/12/26/how-uv-got-so-fast.html"&gt;How uv got so fast&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Andrew Nesbitt provides an insightful teardown of why &lt;a href="https://github.com/astral-sh/uv"&gt;uv&lt;/a&gt; is so much faster than &lt;code&gt;pip&lt;/code&gt;. It's not nearly as simple as just "they rewrote it in Rust" - &lt;code&gt;uv&lt;/code&gt; gets to skip a huge amount of Python packaging history (which &lt;code&gt;pip&lt;/code&gt; needs to implement for backwards compatibility) and benefits enormously from work over recent years that makes it possible to resolve dependencies across most packages without having to execute the code in &lt;code&gt;setup.py&lt;/code&gt; using a Python interpreter.&lt;/p&gt;
&lt;p&gt;Two notes that caught my eye that I hadn't understood before:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;HTTP range requests for metadata.&lt;/strong&gt; &lt;a href="https://packaging.python.org/en/latest/specifications/binary-distribution-format/"&gt;Wheel files&lt;/a&gt; are zip archives, and zip archives put their file listing at the end. uv tries PEP 658 metadata first, falls back to HTTP range requests for the zip central directory, then full wheel download, then building from source. Each step is slower and riskier. The design makes the fast path cover 99% of cases. None of this requires Rust.&lt;/p&gt;
&lt;p&gt;[...]&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Compact version representation&lt;/strong&gt;. uv packs versions into u64 integers where possible, making comparison and hashing fast. Over 90% of versions fit in one u64. This is micro-optimization that compounds across millions of comparisons.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I wanted to learn more about these tricks, so I fired up &lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/"&gt;an asynchronous research task&lt;/a&gt; and told it to checkout the &lt;code&gt;astral-sh/uv&lt;/code&gt; repo, find the Rust code for both of those features and try porting it to Python to help me understand how it works.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/simonw/research/tree/main/http-range-wheel-metadata"&gt;the report that it wrote for me&lt;/a&gt;, the &lt;a href="https://github.com/simonw/research/pull/57"&gt;prompts I used&lt;/a&gt; and the &lt;a href="https://gistpreview.github.io/?0f04e4d1a240bfc3065df5082b629884/index.html"&gt;Claude Code transcript&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You can try &lt;a href="https://github.com/simonw/research/blob/main/http-range-wheel-metadata/wheel_metadata.py"&gt;the script&lt;/a&gt; it wrote for extracting metadata from a wheel using HTTP range requests like this:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;uv run --with httpx https://raw.githubusercontent.com/simonw/research/refs/heads/main/http-range-wheel-metadata/wheel_metadata.py https://files.pythonhosted.org/packages/8b/04/ef95b67e1ff59c080b2effd1a9a96984d6953f667c91dfe9d77c838fc956/playwright-1.57.0-py3-none-macosx_11_0_arm64.whl -v&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;The Playwright wheel there is ~40MB. Adding &lt;code&gt;-v&lt;/code&gt; at the end causes the script to spit out verbose details of how it fetched the data - &lt;a href="https://gist.github.com/simonw/a5ef83b6e4605d2577febb43fa9ad018"&gt;which looks like this&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Key extract from that output:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[1] HEAD request to get file size...
    File size: 40,775,575 bytes
[2] Fetching last 16,384 bytes (EOCD + central directory)...
    Received 16,384 bytes
[3] Parsed EOCD:
    Central directory offset: 40,731,572
    Central directory size: 43,981
    Total entries: 453
[4] Fetching complete central directory...
    ...
[6] Found METADATA: playwright-1.57.0.dist-info/METADATA
    Offset: 40,706,744
    Compressed size: 1,286
    Compression method: 8
[7] Fetching METADATA content (2,376 bytes)...
[8] Decompressed METADATA: 3,453 bytes

Total bytes fetched: 18,760 / 40,775,575 (100.0% savings)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The section of the report &lt;a href="https://github.com/simonw/research/tree/main/http-range-wheel-metadata#bonus-compact-version-representation"&gt;on compact version representation&lt;/a&gt; is interesting too. Here's how it illustrates sorting version numbers correctly based on their custom u64 representation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Sorted order (by integer comparison of packed u64):
  1.0.0a1 (repr=0x0001000000200001)
  1.0.0b1 (repr=0x0001000000300001)
  1.0.0rc1 (repr=0x0001000000400001)
  1.0.0 (repr=0x0001000000500000)
  1.0.0.post1 (repr=0x0001000000700001)
  1.0.1 (repr=0x0001000100500000)
  2.0.0.dev1 (repr=0x0002000000100001)
  2.0.0 (repr=0x0002000000500000)
&lt;/code&gt;&lt;/pre&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sorting"&gt;sorting&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/http-range-requests"&gt;http-range-requests&lt;/a&gt;&lt;/p&gt;



</summary><category term="performance"/><category term="python"/><category term="sorting"/><category term="rust"/><category term="uv"/><category term="http-range-requests"/></entry><entry><title>Python 3.14 Is Here. How Fast Is It?</title><link href="https://simonwillison.net/2025/Oct/8/python-314-is-here-how-fast-is-it/#atom-tag" rel="alternate"/><published>2025-10-08T18:36:33+00:00</published><updated>2025-10-08T18:36:33+00:00</updated><id>https://simonwillison.net/2025/Oct/8/python-314-is-here-how-fast-is-it/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://blog.miguelgrinberg.com/post/python-3-14-is-here-how-fast-is-it"&gt;Python 3.14 Is Here. How Fast Is It?&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Miguel Grinberg uses some basic benchmarks (like &lt;code&gt;fib(40)&lt;/code&gt;) to test the new Python 3.14 on Linux and macOS and finds some substantial speedups over Python 3.13 - around 27% faster.&lt;/p&gt;
&lt;p&gt;The optional JIT didn't make a meaningful difference to his benchmarks. On a threaded benchmark he got 3.09x speedup with 4 threads using the free threading build - for Python 3.13 the free threading build only provided a 2.2x improvement.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://lobste.rs/s/p0iw9e/python_3_14_is_here_how_fast_is_it"&gt;lobste.rs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/gil"&gt;gil&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;&lt;/p&gt;



</summary><category term="gil"/><category term="performance"/><category term="python"/></entry><entry><title>Postgres LISTEN/NOTIFY does not scale</title><link href="https://simonwillison.net/2025/Jul/11/postgres-listen-notify/#atom-tag" rel="alternate"/><published>2025-07-11T04:39:42+00:00</published><updated>2025-07-11T04:39:42+00:00</updated><id>https://simonwillison.net/2025/Jul/11/postgres-listen-notify/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.recall.ai/blog/postgres-listen-notify-does-not-scale"&gt;Postgres LISTEN/NOTIFY does not scale&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I think this headline is justified. &lt;a href="https://www.recall.ai/"&gt;Recall.ai&lt;/a&gt;, a provider of meeting transcription bots, noticed that their PostgreSQL instance was being bogged down by heavy concurrent writes.&lt;/p&gt;
&lt;p&gt;After some spelunking they found &lt;a href="https://github.com/postgres/postgres/blob/a749c6f18fbacd05f432cd29f9e7294033bc666f/src/backend/commands/async.c#L940-L955"&gt;this comment&lt;/a&gt; in the PostgreSQL source explaining that transactions with a pending notification take out a global lock against the entire PostgreSQL instance (represented by database 0) to ensure "that queue entries appear in commit order".&lt;/p&gt;
&lt;p&gt;Moving away from LISTEN/NOTIFY to trigger actions on changes to rows gave them a significant performance boost under high write loads.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=44490510"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/databases"&gt;databases&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;&lt;/p&gt;



</summary><category term="databases"/><category term="performance"/><category term="postgresql"/></entry><entry><title>Serving 200 million requests per day with a cgi-bin</title><link href="https://simonwillison.net/2025/Jul/5/cgi-bin-performance/#atom-tag" rel="alternate"/><published>2025-07-05T23:28:31+00:00</published><updated>2025-07-05T23:28:31+00:00</updated><id>https://simonwillison.net/2025/Jul/5/cgi-bin-performance/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://jacob.gold/posts/serving-200-million-requests-with-cgi-bin/"&gt;Serving 200 million requests per day with a cgi-bin&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Jake Gold tests how well 90s-era CGI works today, using a Go + SQLite CGI program running on a 16-thread AMD 3700X.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Using CGI on modest hardware, it’s possible to serve 2400+ requests per second or 200M+ requests per day.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I got my start in web development with CGI back in the late 1990s - I was a huge fan of &lt;a href="https://web.archive.org/web/20010509081826/http://www.amphibianweb.com/newspro/"&gt;NewsPro&lt;/a&gt;, which was effectively a weblog system before anyone knew what a weblog was.&lt;/p&gt;
&lt;p&gt;CGI works by starting, executing and terminating a process for every incoming request. The nascent web community quickly learned that this was a bad idea, and invented technologies like PHP and &lt;a href="https://en.wikipedia.org/wiki/FastCGI"&gt;FastCGI&lt;/a&gt; to help avoid that extra overhead and keep code resident in-memory instead.&lt;/p&gt;
&lt;p&gt;This lesson ended up baked into my brain, and I spent the next twenty years convinced that you should &lt;em&gt;never&lt;/em&gt; execute a full process as part of serving a web page.&lt;/p&gt;
&lt;p&gt;Of course, computers in those two decades got a &lt;em&gt;lot&lt;/em&gt; faster. I finally overcame that twenty-year core belief in 2020, when &lt;a href="https://simonwillison.net/2020/Nov/28/datasette-ripgrep/"&gt;I built datasette-ripgrep&lt;/a&gt;, a Datasette plugin that shells out to the lightning fast &lt;a href="https://github.com/BurntSushi/ripgrep"&gt;ripgrep&lt;/a&gt; CLI tool (written in Rust) to execute searches. It worked great!&lt;/p&gt;
&lt;p&gt;As was &lt;a href="https://news.ycombinator.com/item?id=44464272#44465143"&gt;pointed out on Hacker News&lt;/a&gt;, part of CGI's problem back then was that we were writing web scripts in languages like Perl, Python and Java which had not been designed for lightning fast startup speeds. Using Go and Rust today helps make CGI-style requests a whole lot more effective.&lt;/p&gt;
&lt;p&gt;Jake notes that CGI-style request handling is actually a great way to take advantage of multiple CPU cores:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;These days, we have servers with 384 CPU threads. Even a small VM can have 16 CPUs. The CPUs and memory are much faster as well.&lt;/p&gt;
&lt;p&gt;Most importantly, CGI programs, because they run as separate processes, are excellent at taking advantage of many CPUs!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Maybe we should start coding web applications like it's 1998, albeit with Go and Rust!&lt;/p&gt;
&lt;p&gt;&lt;small&gt;To clarify, I don't think most people should do this. I just think it's interesting that it's not as bad an idea as it was ~25 years ago.&lt;/small&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=44464272"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cgi"&gt;cgi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/go"&gt;go&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;&lt;/p&gt;



</summary><category term="cgi"/><category term="go"/><category term="performance"/><category term="sqlite"/></entry><entry><title>python-importtime-graph</title><link href="https://simonwillison.net/2025/Jun/20/python-importtime-graph/#atom-tag" rel="alternate"/><published>2025-06-20T19:31:45+00:00</published><updated>2025-06-20T19:31:45+00:00</updated><id>https://simonwillison.net/2025/Jun/20/python-importtime-graph/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/kmichel/python-importtime-graph"&gt;python-importtime-graph&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I was exploring why a Python tool was taking over a second to start running and I learned about the &lt;code&gt;python -X importtime&lt;/code&gt; feature, &lt;a href="https://docs.python.org/3/using/cmdline.html#cmdoption-X"&gt;documented here&lt;/a&gt;. Adding that option causes Python to spit out a text tree showing the time spent importing every module.&lt;/p&gt;
&lt;p&gt;I tried that like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;python -X importtime -m llm plugins
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That's for &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; running 41 different plugins. Here's &lt;a href="https://gist.github.com/simonw/5b7ee41b5ee324105f23ee695d4c0906"&gt;the full output&lt;/a&gt; from that command, which starts like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import time: self [us] | cumulative | imported package
import time:        77 |         77 |   _io
import time:        19 |         19 |   marshal
import time:       131 |        131 |   posix
import time:       363 |        590 | _frozen_importlib_external
import time:       450 |        450 |   time
import time:       110 |        559 | zipimport
import time:        64 |         64 |     _codecs
import time:       252 |        315 |   codecs
import time:       277 |        277 |   encodings.aliases
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Kevin Michel built this excellent tool for visualizing these traces as a treemap. It runs in a browser - visit &lt;a href="https://kmichel.github.io/python-importtime-graph/"&gt;kmichel.github.io/python-importtime-graph/&lt;/a&gt; and paste in the trace to get the visualization.&lt;/p&gt;
&lt;p&gt;Here's what I got for that LLM example trace:&lt;/p&gt;
&lt;p&gt;&lt;img alt="An illegibly dense treemap" src="https://static.simonwillison.net/static/2025/llm-importtime.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;As you can see, it's pretty dense! Here's &lt;a href="https://static.simonwillison.net/static/2025/llm-importtime.svg"&gt;the SVG version&lt;/a&gt; which is a lot more readable, since you can zoom in to individual sections.&lt;/p&gt;
&lt;p&gt;Zooming in it looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Performance profiling treemap visualization showing execution times for Python libraries and modules, with color-coded rectangular blocks sized proportionally to their execution time, displaying &amp;quot;Total: 2845.828 ms&amp;quot; at the top with major components like &amp;quot;lim.cli: 2256.275 ms&amp;quot; and &amp;quot;openai: 150.043 ms&amp;quot;" src="https://static.simonwillison.net/static/2025/llm-importtime-zoom.jpg" /&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/visualization"&gt;visualization&lt;/a&gt;&lt;/p&gt;



</summary><category term="performance"/><category term="python"/><category term="visualization"/></entry><entry><title>Making PyPI's test suite 81% faster</title><link href="https://simonwillison.net/2025/May/1/making-pypis-test-suite-81-faster/#atom-tag" rel="alternate"/><published>2025-05-01T21:32:18+00:00</published><updated>2025-05-01T21:32:18+00:00</updated><id>https://simonwillison.net/2025/May/1/making-pypis-test-suite-81-faster/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://blog.trailofbits.com/2025/05/01/making-pypis-test-suite-81-faster/"&gt;Making PyPI&amp;#x27;s test suite 81% faster&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Fantastic collection of tips from Alexis Challande on speeding up a Python CI workflow.&lt;/p&gt;
&lt;p&gt;I've used &lt;a href="https://github.com/pytest-dev/pytest-xdist"&gt;pytest-xdist&lt;/a&gt; to run tests in parallel (across multiple cores) before, but the following tips were new to me:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;COVERAGE_CORE=sysmon pytest --cov=myproject&lt;/code&gt; tells &lt;a href="https://coverage.readthedocs.io/en/7.8.0/"&gt;coverage.py&lt;/a&gt; on Python 3.12 and higher to use the new &lt;a href="https://docs.python.org/3/library/sys.monitoring.html#module-sys.monitoring"&gt;sys.monitoring&lt;/a&gt; mechanism, which knocked their test execution time down from 58s to 27s.&lt;/li&gt;
&lt;li&gt;Setting &lt;code&gt;testpaths = ["tests/"]&lt;/code&gt; in &lt;code&gt;pytest.ini&lt;/code&gt; lets &lt;code&gt;pytest&lt;/code&gt; skip scanning other folders when trying to find tests.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;python -X importtime ...&lt;/code&gt; shows a trace of exactly how long every package took to import. I could have done with this last week when I was trying to &lt;a href="https://github.com/simonw/llm/issues/949"&gt;debug slow LLM startup time&lt;/a&gt; which turned out to be caused be heavy imports.&lt;/li&gt;
&lt;/ul&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://lobste.rs/s/1jb4l7/making_pypi_s_test_suite_81_faster"&gt;lobste.rs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pypi"&gt;pypi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pytest"&gt;pytest&lt;/a&gt;&lt;/p&gt;



</summary><category term="performance"/><category term="pypi"/><category term="python"/><category term="pytest"/></entry><entry><title>Double-keyed Caching: How Browser Cache Partitioning Changed the Web</title><link href="https://simonwillison.net/2025/Jan/9/browser-cache-partitioning/#atom-tag" rel="alternate"/><published>2025-01-09T19:00:56+00:00</published><updated>2025-01-09T19:00:56+00:00</updated><id>https://simonwillison.net/2025/Jan/9/browser-cache-partitioning/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://addyosmani.com/blog/double-keyed-caching/"&gt;Double-keyed Caching: How Browser Cache Partitioning Changed the Web&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Addy Osmani provides a clear explanation of how &lt;a href="https://developer.chrome.com/blog/http-cache-partitioning"&gt;browser cache partitioning&lt;/a&gt; has changed the landscape of web optimization tricks.&lt;/p&gt;
&lt;p&gt;Prior to 2020, linking to resources on a shared CDN could provide a performance boost as the user's browser might have already cached that asset from visiting a previous site.&lt;/p&gt;
&lt;p&gt;This opened up privacy attacks, where a malicious site could use the presence of cached assets (based on how long they take to load) to reveal details of sites the user had previously visited.&lt;/p&gt;
&lt;p&gt;Browsers now maintain a separate cache-per-origin. This has had less of an impact than I expected: Chrome's numbers show just a 3.6% increase in overall cache miss rate and 4% increase in bytes loaded from the network.&lt;/p&gt;
&lt;p&gt;The most interesting implication here relates to domain strategy: hosting different aspects of a service on different subdomains now incurs additional cache-related performance costs compared to keeping everything under the same domain.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=42630192"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/browsers"&gt;browsers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/caching"&gt;caching&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/web-performance"&gt;web-performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/addy-osmani"&gt;addy-osmani&lt;/a&gt;&lt;/p&gt;



</summary><category term="browsers"/><category term="caching"/><category term="performance"/><category term="web-performance"/><category term="addy-osmani"/></entry><entry><title>How we think about Threads’ iOS performance</title><link href="https://simonwillison.net/2024/Dec/29/threads-ios-performance/#atom-tag" rel="alternate"/><published>2024-12-29T21:45:14+00:00</published><updated>2024-12-29T21:45:14+00:00</updated><id>https://simonwillison.net/2024/Dec/29/threads-ios-performance/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://engineering.fb.com/2024/12/18/ios/how-we-think-about-threads-ios-performance/"&gt;How we think about Threads’ iOS performance&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This article by Dave LaMacchia and Jason Patterson provides an incredibly deep insight into what effective performance engineering looks like for an app with 100s of millions of users.&lt;/p&gt;
&lt;p&gt;I always like hearing about custom performance metrics with their own acronyms. Here we are introduced to &lt;strong&gt;%FIRE&lt;/strong&gt; - the portion of people who experience a &lt;em&gt;frustrating image-render experience&lt;/em&gt; (based on how long an image takes to load after the user scrolls it into the viewport), &lt;strong&gt;TTNC&lt;/strong&gt; (&lt;em&gt;time-to-network content&lt;/em&gt;) measuring time from app launch to fresh content visible in the feed and &lt;strong&gt;cPSR&lt;/strong&gt; (&lt;em&gt;creation-publish success rate&lt;/em&gt;) for how often a user manages to post content that they started to create.&lt;/p&gt;
&lt;p&gt;This article introduced me to the concept of a &lt;strong&gt;boundary test&lt;/strong&gt;, described like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A boundary test is one where we measure extreme ends of a boundary to learn what the effect is. In our case, we introduced a slight bit of latency when a small percentage of our users would navigate to a user profile, to the conversion view for a post, or to their activity feed. &lt;/p&gt;
&lt;p&gt;This latency would allow us to extrapolate what the effect would be if we similarly &lt;em&gt;improved&lt;/em&gt; how we delivered content to those views.&lt;/p&gt;
&lt;p&gt;[...]&lt;/p&gt;
&lt;p&gt;We learned that iOS users don’t tolerate a lot of latency. The more we added, the less often they would launch the app and the less time they would stay in it. With the smallest latency injection, the impact was small or negligible for some views, but the largest injections had negative effects across the board. People would read fewer posts, post less often themselves, and in general interact less with the app. Remember, we weren’t injecting latency into the core feed, either; just into the profile, permalink, and activity.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There's a whole lot more in there, including details of their custom internal performance logger (SLATE, the “Systemic LATEncy” logger) and several case studies of surprising performance improvements made with the assistance of their metrics and tools, plus some closing notes on how Swift concurrency is being adopted throughout Meta.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://bsky.app/profile/raf.eco/post/3lehpzyipic2c"&gt;Rafe Colburn&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/threads"&gt;threads&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ios"&gt;ios&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/meta"&gt;meta&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/swift"&gt;swift&lt;/a&gt;&lt;/p&gt;



</summary><category term="performance"/><category term="threads"/><category term="ios"/><category term="meta"/><category term="swift"/></entry><entry><title>Using Rust in non-Rust servers to improve performance</title><link href="https://simonwillison.net/2024/Oct/23/using-rust-in-non-rust-servers/#atom-tag" rel="alternate"/><published>2024-10-23T15:45:42+00:00</published><updated>2024-10-23T15:45:42+00:00</updated><id>https://simonwillison.net/2024/Oct/23/using-rust-in-non-rust-servers/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/pretzelhammer/rust-blog/blob/master/posts/rust-in-non-rust-servers.md"&gt;Using Rust in non-Rust servers to improve performance&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Deep dive into different strategies for optimizing part of a web server application - in this case written in Node.js, but the same strategies should work for Python as well - by integrating with Rust in different ways.&lt;/p&gt;
&lt;p&gt;The example app renders QR codes, initially using the pure JavaScript &lt;a href="https://www.npmjs.com/package/qrcode"&gt;qrcode&lt;/a&gt; package. That ran at 1,464 req/sec, but switching it to calling a tiny Rust CLI wrapper around the &lt;a href="https://crates.io/crates/qrcode"&gt;qrcode crate&lt;/a&gt; using Node.js &lt;code&gt;spawn()&lt;/code&gt; increased that to 2,572 req/sec.&lt;/p&gt;
&lt;p&gt;This is yet another reminder to me that I need to get over my &lt;code&gt;cgi-bin&lt;/code&gt; era bias that says that shelling out to another process during a web request is a bad idea. It turns out modern computers can quite happily spawn and terminate 2,500+ processes a second!&lt;/p&gt;
&lt;p&gt;The article optimizes further first through a Rust library compiled to WebAssembly (2,978 req/sec) and then through a Rust function exposed to Node.js as a native library (5,490 req/sec), then finishes with a full Rust rewrite of the server that replaces Node.js entirely, running at 7,212 req/sec.&lt;/p&gt;
&lt;p&gt;Full source code to accompany the article is available in the &lt;a href="https://github.com/pretzelhammer/using-rust-in-non-rust-servers"&gt;using-rust-in-non-rust-servers&lt;/a&gt; repository.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://lobste.rs/s/slviv2/using_rust_non_rust_servers_improve"&gt;lobste.rs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nodejs"&gt;nodejs&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/scaling"&gt;scaling&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;&lt;/p&gt;



</summary><category term="javascript"/><category term="nodejs"/><category term="performance"/><category term="scaling"/><category term="rust"/></entry><entry><title>Cerebras Inference: AI at Instant Speed</title><link href="https://simonwillison.net/2024/Aug/28/cerebras-inference/#atom-tag" rel="alternate"/><published>2024-08-28T04:14:00+00:00</published><updated>2024-08-28T04:14:00+00:00</updated><id>https://simonwillison.net/2024/Aug/28/cerebras-inference/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://cerebras.ai/blog/introducing-cerebras-inference-ai-at-instant-speed"&gt;Cerebras Inference: AI at Instant Speed&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
New hosted API for Llama running at absurdly high speeds: "1,800 tokens per second for Llama3.1 8B and 450 tokens per second for Llama3.1 70B".&lt;/p&gt;
&lt;p&gt;How are they running so fast? Custom hardware. Their &lt;a href="https://cerebras.ai/product-chip/"&gt;WSE-3&lt;/a&gt; is 57x &lt;em&gt;physically larger&lt;/em&gt; than an NVIDIA H100, and has 4 trillion transistors, 900,000 cores and 44GB of memory all on one enormous chip.&lt;/p&gt;
&lt;p&gt;Their &lt;a href="https://inference.cerebras.ai/"&gt;live chat demo&lt;/a&gt; just returned me a response at 1,833 tokens/second. Their API currently has a waitlist.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=41369705"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cerebras"&gt;cerebras&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-performance"&gt;llm-performance&lt;/a&gt;&lt;/p&gt;



</summary><category term="performance"/><category term="ai"/><category term="generative-ai"/><category term="llama"/><category term="llms"/><category term="cerebras"/><category term="llm-performance"/></entry><entry><title>Optimizing Datasette (and other weeknotes)</title><link href="https://simonwillison.net/2024/Aug/22/optimizing-datasette/#atom-tag" rel="alternate"/><published>2024-08-22T15:46:43+00:00</published><updated>2024-08-22T15:46:43+00:00</updated><id>https://simonwillison.net/2024/Aug/22/optimizing-datasette/#atom-tag</id><summary type="html">
    &lt;p&gt;I've been working with Alex Garcia on an experiment involving using &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; to explore FEC contributions. We currently have a 11GB SQLite database - trivial for SQLite to handle, but at the upper end of what I've comfortably explored with Datasette in the past.&lt;/p&gt;
&lt;p&gt;This was just the excuse I needed to dig into some optimizations! The next Datasette alpha release will feature some significant speed improvements for working with large tables - they're available on the &lt;code&gt;main&lt;/code&gt; branch already.&lt;/p&gt;
&lt;h3 id="datasette-tracing"&gt;Datasette tracing&lt;/h3&gt;
&lt;p&gt;Datasette has had a &lt;code&gt;?_trace=1&lt;/code&gt; feature for a while. It's only available if you run Datasette with the &lt;code&gt;trace_debug&lt;/code&gt; setting enabled - which you can do like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;datasette -s trace_debug 1 mydatabase.db&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then any request with &lt;code&gt;?_trace=1&lt;/code&gt; added to the URL will return a JSON blob at the end of the page showing every SQL query that was executed, how long it took and a truncated stack trace showing the code that triggered it.&lt;/p&gt;
&lt;p&gt;Scroll to the bottom of &lt;a href="https://latest.datasette.io/fixtures?_trace=1"&gt;https://latest.datasette.io/fixtures?_trace=1&lt;/a&gt; for an example.&lt;/p&gt;
&lt;p&gt;The JSON isn't very pretty. &lt;a href="https://datasette.io/plugins/datasette-pretty-traces"&gt;datasette-pretty-traces&lt;/a&gt; is a plugin I built to fix that - it turns that JSON into a much nicer visual representation.&lt;/p&gt;
&lt;p&gt;As I dug into tracing I found a nasty bug in the trace mechanism. It was meant to quietly give up on pages longer than 256KB, in order to avoid having to spool potentially megabytes of data into memory rather than streaming it to the client. That code had a bug: the user would get a blank page instead! &lt;a href="https://github.com/simonw/datasette/issues/2404"&gt;I fixed that first&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The next problem was that SQL queries that terminated with an error - including the crucial "query interrupted" error raised when a query took longer than the Datasette configured time limit - were not being included in the trace. That's &lt;a href="https://github.com/simonw/datasette/issues/2405"&gt;fixed too&lt;/a&gt;, and I &lt;a href="https://github.com/simonw/datasette-pretty-traces/issues/8"&gt;upgraded datasette-pretty-traces&lt;/a&gt; to render those errors with a pink background:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/datasette-pretty-traces-error.jpg" alt="Screenshot showing the new UI - a select * from no_table query is highlighted in pink and has an expanded box with information about where that call was made in the Python code and how long it took. Other queries show a bar indicating how long they took to run." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;This gave me all the information I needed to track down those other performance problems.&lt;/p&gt;
&lt;h4 id="rule-of-thumb-don-t-scan-more-than-10-000-rows"&gt;Rule of thumb: don't scan more than 10,000 rows&lt;/h4&gt;
&lt;p&gt;SQLite is fast, but you can still run into performance problems if you ask it to scan too many rows.&lt;/p&gt;
&lt;p&gt;Going forward, I'm introducing a new target for Datasette development: never scan more than 10,000 rows without a user explicitly requesting that scan.&lt;/p&gt;
&lt;p&gt;The most common time this happens is with a &lt;code&gt;select count(*)&lt;/code&gt; query. Datasette likes to display the number of rows in a table, and when you run a SQL query it likes to show you how many total rows match even when only displaying a subset of them in the paginated interface.&lt;/p&gt;
&lt;p&gt;These counts are shown in two key places: on the list of tables in a database, and on the table view itself.&lt;/p&gt;
&lt;p&gt;Counts are protected by Datasette's query time limit mechanism. On the table listing page this was configured such that if a count takes longer than 5ms it would be skipped and "Many rows" would be displayed. It turns out this mechanism isn't as reliable as I had hoped, maybe due to the overhead of cancelling the query. Given enough large tables those cancelled count queries could still add up to user-visible latency problems on that page.&lt;/p&gt;
&lt;p&gt;Here's the pattern I turned to that fixed the performance problem:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;select&lt;/span&gt; &lt;span class="pl-c1"&gt;count&lt;/span&gt;(&lt;span class="pl-k"&gt;*&lt;/span&gt;) &lt;span class="pl-k"&gt;from&lt;/span&gt; (
    &lt;span class="pl-k"&gt;select&lt;/span&gt; &lt;span class="pl-k"&gt;*&lt;/span&gt; &lt;span class="pl-k"&gt;from&lt;/span&gt; libfec_SA16 &lt;span class="pl-k"&gt;limit&lt;/span&gt; &lt;span class="pl-c1"&gt;10001&lt;/span&gt;
)&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This nested query first limits the table to 10,001 rows, then counts them. If the count is less than 10,001 we know that the count is entirely accurate. If it's exactly 10,001 we can show "&amp;gt;10,000 rows" in the UI.&lt;/p&gt;
&lt;p&gt;Capping the number of scanned rows to 10,000 for any of these counts makes a &lt;em&gt;huge&lt;/em&gt; difference in the performance of these pages!&lt;/p&gt;
&lt;p&gt;But what about those table pages? Showing "&amp;gt;10,000 rows" is a bit of a cop-out, especially if the question the user wants to answer is "how many rows are in this table / match this filter?"&lt;/p&gt;
&lt;p&gt;I addressed that in &lt;a href="https://github.com/simonw/datasette/issues/2408"&gt;issue #2408&lt;/a&gt;: Datasette still truncates the count at 10,000 on initial page load, but users now get a "count all" link they can click to execute the full count.&lt;/p&gt;
&lt;p&gt;The link goes to a SQL query page that runs the query, but I've also added a bit of progressive enhancement JavaScript to run that query and update the page in-place when the link is clicked. Here's what that looks like:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/datasette-count.gif" alt="Animated demo - the pgae shows  /&gt;10,000 rows with a count all link. Clicking that replaces it with the text counting... which then replaces the entire count text with 23,036,621 rows." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;In the future I may add various caching mechanisms so that counts that have been calculated can be displayed elsewhere in the UI without having to re-run the expensive queries. I may also incorporate SQL triggers for updating exact denormalized counts in a &lt;code&gt;_counts&lt;/code&gt; table, &lt;a href="https://sqlite-utils.datasette.io/en/stable/python-api.html#python-api-cached-table-counts"&gt;as implemented in sqlite-utils&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="optimized-facet-suggestions"&gt;Optimized facet suggestions&lt;/h4&gt;
&lt;p&gt;The other feature that was really hurting performance was facet suggestions.&lt;/p&gt;
&lt;p&gt;Datasette &lt;a href="https://docs.datasette.io/en/latest/facets.html"&gt;Facets&lt;/a&gt; are a really powerful way to quickly explore data. They can be applied to any column by the user, but to make the feature more visible Datasette suggests facets that might be a good fit for the current table by looking for things like columns that only contain 3 unique values.&lt;/p&gt;
&lt;p&gt;The suggestion code was designed with performance in mind - it uses tight time limits (governed by the &lt;a href="https://docs.datasette.io/en/latest/settings.html#facet-suggest-time-limit-ms"&gt;facet_suggest_time_limit_ms&lt;/a&gt; setting, defaulting to 50ms) and attempts to use other SQL tricks to quickly decide if a facet should be considered or not.&lt;/p&gt;
&lt;p&gt;I found a couple of tricks to dramatically speed these up against larger tables as well.&lt;/p&gt;
&lt;p&gt;First, I've started enforcing that new 10,000 limit for facet suggestions too - so each suggestion query only considers a maximum of 10,000 rows, even on tables with millions of items. These suggestions are just suggestions, so seeing a recommendation  that would not have been suggested if the full table had been scanned is a reasonable trade-off.&lt;/p&gt;
&lt;p&gt;Secondly, I spotted &lt;a href="https://github.com/simonw/datasette/issues/2407"&gt;a gnarly bug&lt;/a&gt; in the way the date facet suggestion works. The previous query looked like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;select&lt;/span&gt; &lt;span class="pl-k"&gt;date&lt;/span&gt;(column_to_test) &lt;span class="pl-k"&gt;from&lt;/span&gt; ( 
    &lt;span class="pl-k"&gt;select&lt;/span&gt; &lt;span class="pl-k"&gt;*&lt;/span&gt; &lt;span class="pl-k"&gt;from&lt;/span&gt; mytable
)
&lt;span class="pl-k"&gt;where&lt;/span&gt; column_to_test glob &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;????-??-*&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-k"&gt;limit&lt;/span&gt; &lt;span class="pl-c1"&gt;100&lt;/span&gt;;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That &lt;code&gt;limit 100&lt;/code&gt; was meant to restrict it to considering 100 rows... but that didn't actually work! If a table with 20 million columns in had NO rows that matched the glob pattern, the query would still scan all 20 million rows.&lt;/p&gt;
&lt;p&gt;The new query looks like this, and fixes the problem:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;select&lt;/span&gt; &lt;span class="pl-k"&gt;date&lt;/span&gt;(column_to_test) &lt;span class="pl-k"&gt;from&lt;/span&gt; ( 
    &lt;span class="pl-k"&gt;select&lt;/span&gt; &lt;span class="pl-k"&gt;*&lt;/span&gt; &lt;span class="pl-k"&gt;from&lt;/span&gt; mytable &lt;span class="pl-k"&gt;limit&lt;/span&gt; &lt;span class="pl-c1"&gt;100&lt;/span&gt;
)
&lt;span class="pl-k"&gt;where&lt;/span&gt; column_to_test glob &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;????-??-*&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Moving the limit to the inner query causes the SQL to only run against the first 100 rows, as intended.&lt;/p&gt;
&lt;p&gt;Thanks to these optimizations running Datasette against a database with huge tables now feels snappy and responsive. Expect them in an alpha release soon.&lt;/p&gt;
&lt;h4 id="on-the-blog"&gt;On the blog&lt;/h4&gt;
&lt;p&gt;I'm trying something new for the rest of my weeknotes. Since I'm investing a lot more effort in my link blog, I'm including a digest of everything I've linked to since the last edition. I &lt;a href="https://observablehq.com/@simonw/weeknotes"&gt;updated my weeknotes Observable notebook&lt;/a&gt; to help generate these, after &lt;a href="https://gist.github.com/simonw/d7f4f2950b426839f36713ed0ecf8c5d"&gt;prompting Claude&lt;/a&gt; to help prototype a bunch of different approaches.&lt;/p&gt;
&lt;p&gt;The following section was generated by this code - it includes everything I've posted, grouped by the most "interesting" tag assigned to each post. I'll likely iterate on this a bunch more in the future.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;openai&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/6/openai-structured-outputs"&gt;OpenAI: Introducing Structured Outputs in the API&lt;/a&gt; - 2024-08-06&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/8/gpt-4o-system-card"&gt;GPT-4o System Card&lt;/a&gt; - 2024-08-08&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/11/sqlite-vec"&gt;Using sqlite-vec with embeddings in sqlite-utils and Datasette&lt;/a&gt; - 2024-08-11&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;javascript&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/6/observable-plot-waffle-mark"&gt;Observable Plot: Waffle mark&lt;/a&gt; - 2024-08-06&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/18/reckoning"&gt;Reckoning&lt;/a&gt; - 2024-08-18&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;python&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/6/cibuildwheel"&gt;cibuildwheel 2.20.0 now builds Python 3.13 wheels by default&lt;/a&gt; - 2024-08-06&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/8/django-http-debug"&gt;django-http-debug, a new Django app mostly written by Claude&lt;/a&gt; - 2024-08-08&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/11/pep-750"&gt;PEP 750 – Tag Strings For Writing Domain-Specific Languages&lt;/a&gt; - 2024-08-11&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/13/mlx-whisper"&gt;mlx-whisper&lt;/a&gt; - 2024-08-13&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/17/python-m-pytest"&gt;Upgrading my cookiecutter templates to use python -m pytest&lt;/a&gt; - 2024-08-17&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/20/writing-your-pyproject-toml"&gt;Writing your pyproject.toml&lt;/a&gt; - 2024-08-20&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/20/uv-unified-python-packaging"&gt;uv: Unified Python packaging&lt;/a&gt; - 2024-08-20&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/21/usrbinenv-uv-run"&gt;#!/usr/bin/env -S uv run&lt;/a&gt; - 2024-08-21&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/21/armin-ronacher"&gt;Armin Ronacher: There is an elephant in the room which is that As...&lt;/a&gt; - 2024-08-21&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/22/light-the-torch"&gt;light-the-torch&lt;/a&gt; - 2024-08-22&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;security&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/7/google-ai-studio-data-exfiltration-demo"&gt;Google AI Studio data exfiltration demo&lt;/a&gt; - 2024-08-07&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/12/smuggling-queries-at-the-protocol-level"&gt;SQL Injection Isn't Dead: Smuggling Queries at the Protocol Level&lt;/a&gt; - 2024-08-12&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/14/living-off-microsoft-copilot"&gt;Links and materials for Living off Microsoft Copilot&lt;/a&gt; - 2024-08-14&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/15/adam-newbold"&gt;Adam Newbold: [Passkeys are] something truly unique, because ba...&lt;/a&gt; - 2024-08-15&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/16/com2kid"&gt;com2kid: Having worked at Microsoft for almost a decade, I...&lt;/a&gt; - 2024-08-16&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/20/data-exfiltration-from-slack-ai"&gt;Data Exfiltration from Slack AI via indirect prompt injection&lt;/a&gt; - 2024-08-20&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/20/sql-injection-like-attack-on-llms-with-special-tokens"&gt;SQL injection-like attack on LLMs with special tokens&lt;/a&gt; - 2024-08-20&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/21/dangers-of-ai-agents-unfurling"&gt;The dangers of AI agents unfurling hyperlinks and what to do about it&lt;/a&gt; - 2024-08-21&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;llm&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/7/q-what-do-i-title-this-article"&gt;q What do I title this article?&lt;/a&gt; - 2024-08-07&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;prompt-engineering&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/7/braggoscope-prompts"&gt;Braggoscope Prompts&lt;/a&gt; - 2024-08-07&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/11/using-gpt-4o-mini-as-a-reranker"&gt;Using gpt-4o-mini as a reranker&lt;/a&gt; - 2024-08-11&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/16/llms-are-bad-at-returning-code-in-json"&gt;LLMs are bad at returning code in JSON&lt;/a&gt; - 2024-08-16&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;andrej-karpathy&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/8/andrej-karpathy"&gt;Andrej Karpathy: The RM [Reward Model] we train for LLMs is just a...&lt;/a&gt; - 2024-08-08&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;projects&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/8/convert-claude-json-to-markdown"&gt;Share Claude conversations by converting their JSON to Markdown&lt;/a&gt; - 2024-08-08&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/16/datasette-10a15"&gt;Datasette 1.0a15&lt;/a&gt; - 2024-08-16&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/16/datasette-checkbox"&gt;datasette-checkbox&lt;/a&gt; - 2024-08-16&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/18/fix-covidsewage-bot"&gt;Fix @covidsewage bot to handle a change to the underlying website&lt;/a&gt; - 2024-08-18&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;anthropic&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/8/gemini-15-flash-price-drop"&gt;Gemini 1.5 Flash price drop&lt;/a&gt; - 2024-08-08&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/14/prompt-caching-with-claude"&gt;Prompt caching with Claude&lt;/a&gt; - 2024-08-14&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/15/alex-albert"&gt;Alex Albert: Examples are the #1 thing I recommend people use ...&lt;/a&gt; - 2024-08-15&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/20/introducing-zed-ai"&gt;Introducing Zed AI&lt;/a&gt; - 2024-08-20&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;sqlite&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/9/high-precision-datetime-in-sqlite"&gt;High-precision date/time in SQLite&lt;/a&gt; - 2024-08-09&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/13/django-querystring-template-tag"&gt;New Django {% querystring %} template tag&lt;/a&gt; - 2024-08-13&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;ethics&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/10/where-facebooks-ai-slop-comes-from"&gt;Where Facebook's AI Slop Comes From&lt;/a&gt; - 2024-08-10&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;jon-udell&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/10/jon-udell"&gt;Jon Udell: Some argue that by aggregating knowledge drawn fr...&lt;/a&gt; - 2024-08-10&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;browsers&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/11/ladybird-set-to-adopt-swift"&gt;Ladybird set to adopt Swift&lt;/a&gt; - 2024-08-11&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;explorables&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/11/transformer-explainer"&gt;Transformer Explainer&lt;/a&gt; - 2024-08-11&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;ai-assisted-programming&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/12/tom-macwright"&gt;Tom MacWright: But [LLM assisted programming] does make me wonde...&lt;/a&gt; - 2024-08-12&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;hacker-news&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/12/dang"&gt;dang: We had to exclude [dead] and eventually even just...&lt;/a&gt; - 2024-08-12&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;design&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/13/ai-designers"&gt;Help wanted: AI designers&lt;/a&gt; - 2024-08-13&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;prompt-injection&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/14/simple-prompt-injection-template"&gt;A simple prompt injection template&lt;/a&gt; - 2024-08-14&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;fly&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/16/fly-were-cutting-l40s-prices-in-half"&gt;Fly: We're Cutting L40S Prices In Half&lt;/a&gt; - 2024-08-16&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;open-source&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/16/whither-cockroachdb"&gt;Whither CockroachDB?&lt;/a&gt; - 2024-08-16&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;game-design&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/18/the-door-problem"&gt;“The Door Problem”&lt;/a&gt; - 2024-08-18&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;whisper&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/19/whisperfile"&gt;llamafile v0.8.13 (and whisperfile)&lt;/a&gt; - 2024-08-19&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;go&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://simonwillison.net/2024/Aug/19/migrating-mess-with-dns"&gt;Migrating Mess With DNS to use PowerDNS&lt;/a&gt; - 2024-08-19&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="releases"&gt;Releases&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-pretty-traces/releases/tag/0.5"&gt;datasette-pretty-traces 0.5&lt;/a&gt;&lt;/strong&gt; - 2024-08-21&lt;br /&gt;Prettier formatting for ?_trace=1 traces&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/sqlite-utils-ask/releases/tag/0.1a0"&gt;sqlite-utils-ask 0.1a0&lt;/a&gt;&lt;/strong&gt; - 2024-08-19&lt;br /&gt;Ask questions of your data with LLM assistance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-checkbox/releases/tag/0.1a2"&gt;datasette-checkbox 0.1a2&lt;/a&gt;&lt;/strong&gt; - 2024-08-16&lt;br /&gt;Add interactive checkboxes to columns in Datasette&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette/releases/tag/1.0a15"&gt;datasette 1.0a15&lt;/a&gt;&lt;/strong&gt; - 2024-08-16&lt;br /&gt;An open source multi-tool for exploring and publishing data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/asgi-csrf/releases/tag/0.10"&gt;asgi-csrf 0.10&lt;/a&gt;&lt;/strong&gt; - 2024-08-15&lt;br /&gt;ASGI middleware for protecting against CSRF attacks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/datasette/datasette-pins/releases/tag/0.1a3"&gt;datasette-pins 0.1a3&lt;/a&gt;&lt;/strong&gt; - 2024-08-07&lt;br /&gt;Pin databases, tables, and other items to the Datasette homepage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/django-http-debug/releases/tag/0.2"&gt;django-http-debug 0.2&lt;/a&gt;&lt;/strong&gt; - 2024-08-07&lt;br /&gt;Django app for creating endpoints that log incoming request and return mock data&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="tils"&gt;TILs&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://til.simonwillison.net/sqlite/sqlite-vec"&gt;Using sqlite-vec with embeddings in sqlite-utils and Datasette&lt;/a&gt; - 2024-08-11&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://til.simonwillison.net/django/pytest-django"&gt;Using pytest-django with a reusable Django application&lt;/a&gt; - 2024-08-07&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="performance"/><category term="sql"/><category term="sqlite"/><category term="datasette"/><category term="weeknotes"/></entry><entry><title>Quoting Nikita Melkozerov</title><link href="https://simonwillison.net/2024/Jul/13/nikita-melkozerov/#atom-tag" rel="alternate"/><published>2024-07-13T23:44:04+00:00</published><updated>2024-07-13T23:44:04+00:00</updated><id>https://simonwillison.net/2024/Jul/13/nikita-melkozerov/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/meln1k/status/1812116658300817477"&gt;&lt;p&gt;My architecture is a monolith written in  Go (this is intentional, I sacrificed scalability to improve my shipping  speed), and this is where SQLite shines. With a DB located on the local NVMe disk, a 5$ VPS can deliver a whopping 60K reads and 20K writes per second.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/meln1k/status/1812116658300817477"&gt;Nikita Melkozerov&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/go"&gt;go&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;&lt;/p&gt;



</summary><category term="go"/><category term="performance"/><category term="sqlite"/></entry><entry><title>Quoting D. Richard Hipp</title><link href="https://simonwillison.net/2024/Apr/30/d-richard-hipp/#atom-tag" rel="alternate"/><published>2024-04-30T13:59:50+00:00</published><updated>2024-04-30T13:59:50+00:00</updated><id>https://simonwillison.net/2024/Apr/30/d-richard-hipp/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://news.ycombinator.com/item?id=40206752#40209833"&gt;&lt;p&gt;Performance analysis indicates that SQLite spends very little time doing bytecode decoding and dispatch. Most CPU cycles are consumed in walking B-Trees, doing value comparisons, and decoding records - all of which happens in compiled C code. Bytecode dispatch is using less than 3% of the total CPU time, according to my measurements.&lt;/p&gt;
&lt;p&gt;So at least in the case of SQLite, compiling all the way down to machine code might provide a performance boost 3% or less. That's not very much, considering the size, complexity, and portability costs involved.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://news.ycombinator.com/item?id=40206752#40209833"&gt;D. Richard Hipp&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/d-richard-hipp"&gt;d-richard-hipp&lt;/a&gt;&lt;/p&gt;



</summary><category term="performance"/><category term="sqlite"/><category term="d-richard-hipp"/></entry><entry><title>Optimizing SQLite for servers</title><link href="https://simonwillison.net/2024/Mar/31/optimizing-sqlite-for-servers/#atom-tag" rel="alternate"/><published>2024-03-31T20:16:23+00:00</published><updated>2024-03-31T20:16:23+00:00</updated><id>https://simonwillison.net/2024/Mar/31/optimizing-sqlite-for-servers/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://kerkour.com/sqlite-for-servers"&gt;Optimizing SQLite for servers&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Sylvain Kerkour's comprehensive set of lessons learned running SQLite for server-based applications.&lt;/p&gt;
&lt;p&gt;There's a lot of useful stuff in here, including detailed coverage of the different recommended &lt;code&gt;PRAGMA&lt;/code&gt; settings.&lt;/p&gt;
&lt;p&gt;There was also a tip I haven't seen before about &lt;code&gt;BEGIN IMMEDIATE&lt;/code&gt; transactions:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;By default, SQLite starts transactions in &lt;code&gt;DEFERRED&lt;/code&gt; mode: they are considered read only. They are upgraded to a write transaction that requires a database lock in-flight, when query containing a write/update/delete statement is issued.&lt;/p&gt;
&lt;p&gt;The problem is that by upgrading a transaction after it has started, SQLite will immediately return a &lt;code&gt;SQLITE_BUSY&lt;/code&gt; error without respecting the &lt;code&gt;busy_timeout&lt;/code&gt; previously mentioned, if the database is already locked by another connection.&lt;/p&gt;
&lt;p&gt;This is why you should start your transactions with &lt;code&gt;BEGIN IMMEDIATE&lt;/code&gt; instead of only &lt;code&gt;BEGIN&lt;/code&gt;. If the database is locked when the transaction starts, SQLite will respect &lt;code&gt;busy_timeout&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://lobste.rs/s/rsagpv/sqlite_for_servers"&gt;lobste.rs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/databases"&gt;databases&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-busy"&gt;sqlite-busy&lt;/a&gt;&lt;/p&gt;



</summary><category term="databases"/><category term="performance"/><category term="sql"/><category term="sqlite"/><category term="sqlite-busy"/></entry><entry><title>DiskCache</title><link href="https://simonwillison.net/2024/Mar/19/diskcache/#atom-tag" rel="alternate"/><published>2024-03-19T15:43:18+00:00</published><updated>2024-03-19T15:43:18+00:00</updated><id>https://simonwillison.net/2024/Mar/19/diskcache/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/grantjenks/python-diskcache"&gt;DiskCache&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Grant Jenks built DiskCache as an alternative caching backend for Django (also usable without Django), using a SQLite database on disk. The performance numbers are impressive—it even beats memcached in microbenchmarks, due to avoiding the need to access the network.&lt;/p&gt;

&lt;p&gt;The source code (particularly in core.py) is a great case-study in SQLite performance optimization, after five years of iteration on making it all run as fast as possible.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=39750077#39754972"&gt;Hacker News comment&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;&lt;/p&gt;



</summary><category term="django"/><category term="performance"/><category term="python"/><category term="sqlite"/></entry><entry><title>Quoting Charlie Marsh</title><link href="https://simonwillison.net/2024/Feb/4/charlie-marsh/#atom-tag" rel="alternate"/><published>2024-02-04T19:41:16+00:00</published><updated>2024-02-04T19:41:16+00:00</updated><id>https://simonwillison.net/2024/Feb/4/charlie-marsh/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/charliermarsh/status/1754216198517014627"&gt;&lt;p&gt;Sometimes, performance just doesn't matter. If I make some codepath in Ruff 10x faster, but no one ever hits it, I'm sure it could get some likes on Twitter, but the impact on users would be meaningless.&lt;/p&gt;
&lt;p&gt;And yet, it's good to care about performance everywhere, even when it doesn't matter. Caring about performance is cultural and contagious. Small wins add up. Small losses add up even more.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/charliermarsh/status/1754216198517014627"&gt;Charlie Marsh&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ruff"&gt;ruff&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/charlie-marsh"&gt;charlie-marsh&lt;/a&gt;&lt;/p&gt;



</summary><category term="performance"/><category term="ruff"/><category term="charlie-marsh"/></entry><entry><title>Batch size one billion: SQLite insert speedups, from the useful to the absurd</title><link href="https://simonwillison.net/2023/Sep/26/batch-size-one-billion/#atom-tag" rel="alternate"/><published>2023-09-26T17:31:54+00:00</published><updated>2023-09-26T17:31:54+00:00</updated><id>https://simonwillison.net/2023/Sep/26/batch-size-one-billion/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://voidstar.tech/sqlite_insert_speed/"&gt;Batch size one billion: SQLite insert speedups, from the useful to the absurd&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Useful, detailed review of ways to maximize the performance of inserting a billion integers into a SQLite database table.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=37655261"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;&lt;/p&gt;



</summary><category term="performance"/><category term="sqlite"/></entry><entry><title>How CPython Implements and Uses Bloom Filters for String Processing</title><link href="https://simonwillison.net/2023/Sep/16/how-cpython-implements-and-uses-bloom-filters-for-string-process/#atom-tag" rel="alternate"/><published>2023-09-16T22:32:37+00:00</published><updated>2023-09-16T22:32:37+00:00</updated><id>https://simonwillison.net/2023/Sep/16/how-cpython-implements-and-uses-bloom-filters-for-string-process/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://codeconfessions.substack.com/p/cpython-bloom-filter-usage"&gt;How CPython Implements and Uses Bloom Filters for String Processing&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Fascinating dive into Python string internals by Abhinav Upadhyay. It turns out CPython uses very simple bloom filters in several parts of the core string methods, to solve problems like splitting on newlines where there are actually eight codepoints that could represent a newline, and a tiny bloom filter can help filter a character in a single operation before performing all eight comparisons only if that first check failed.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/bloom-filters"&gt;bloom-filters&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;&lt;/p&gt;



</summary><category term="bloom-filters"/><category term="performance"/><category term="python"/></entry><entry><title>Quoting Andrej Karpathy</title><link href="https://simonwillison.net/2023/Feb/4/andrej-karpathy/#atom-tag" rel="alternate"/><published>2023-02-04T00:08:18+00:00</published><updated>2023-02-04T00:08:18+00:00</updated><id>https://simonwillison.net/2023/Feb/4/andrej-karpathy/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/karpathy/status/1621578354024677377"&gt;&lt;p&gt;The most dramatic optimization to nanoGPT so far (~25% speedup) is to simply increase vocab size from 50257 to 50304 (nearest multiple of 64). This calculates added useless dimensions but goes down a different kernel path with much higher occupancy. Careful with your Powers of 2.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/karpathy/status/1621578354024677377"&gt;Andrej Karpathy&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-3"&gt;gpt-3&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/andrej-karpathy"&gt;andrej-karpathy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="performance"/><category term="ai"/><category term="gpt-3"/><category term="andrej-karpathy"/><category term="generative-ai"/><category term="llms"/></entry><entry><title>Data-driven performance optimization with Rust and Miri</title><link href="https://simonwillison.net/2022/Dec/9/data-driven-performance-optimization-with-rust-and-miri/#atom-tag" rel="alternate"/><published>2022-12-09T17:19:14+00:00</published><updated>2022-12-09T17:19:14+00:00</updated><id>https://simonwillison.net/2022/Dec/9/data-driven-performance-optimization-with-rust-and-miri/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://medium.com/source-and-buggy/data-driven-performance-optimization-with-rust-and-miri-70cb6dde0d35"&gt;Data-driven performance optimization with Rust and Miri&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Useful guide to some Rust performance optimization tools. Miri can be used to dump out a detailed JSON profile of a program which can then be opened and explored using the Chrome browser’s performance tool.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=33921731"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/chrome"&gt;chrome&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;&lt;/p&gt;



</summary><category term="chrome"/><category term="performance"/><category term="rust"/></entry><entry><title>Efficient Pagination Using Deferred Joins</title><link href="https://simonwillison.net/2022/Aug/16/efficient-pagination-using-deferred-joins/#atom-tag" rel="alternate"/><published>2022-08-16T17:35:27+00:00</published><updated>2022-08-16T17:35:27+00:00</updated><id>https://simonwillison.net/2022/Aug/16/efficient-pagination-using-deferred-joins/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://aaronfrancis.com/2022/efficient-pagination-using-deferred-joins"&gt;Efficient Pagination Using Deferred Joins&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Surprisingly simple trick for speeding up deep OFFSET x LIMIT y pagination queries, which get progressively slower as you paginate deeper into the data. Instead of applying them directly, apply them to a “select id from ...” query to fetch just the IDs, then either use a join or run a separate “select * from table where id in (...)” query to fetch the full records for that page.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://planetscale.com/blog/fastpage-faster-offset-pagination-for-rails-apps"&gt;Introducing FastPage: Faster offset pagination for Rails apps&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;&lt;/p&gt;



</summary><category term="performance"/><category term="sql"/></entry><entry><title>Announcing Pyston-lite: our Python JIT as an extension module</title><link href="https://simonwillison.net/2022/Jun/8/pyston-lite/#atom-tag" rel="alternate"/><published>2022-06-08T17:58:11+00:00</published><updated>2022-06-08T17:58:11+00:00</updated><id>https://simonwillison.net/2022/Jun/8/pyston-lite/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://blog.pyston.org/2022/06/08/announcing-pyston-lite-our-python-jit-as-an-extension-module/"&gt;Announcing Pyston-lite: our Python JIT as an extension module&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The Pyston JIT can now be installed in any Python 3.8 virtual environment by running “pip install pyston_lite_autoload”—which includes a hook to automatically inject the JIT. I just tried a very rough benchmark against Datasette (ab -n 1000 -c 10) and got 391.20 requests/second without the JIT compared to 404.10 request/second with it.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=31670120"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/jit"&gt;jit&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;&lt;/p&gt;



</summary><category term="jit"/><category term="performance"/><category term="python"/></entry><entry><title>Compiling Black with mypyc</title><link href="https://simonwillison.net/2022/May/31/compiling-black-with-mypyc/#atom-tag" rel="alternate"/><published>2022-05-31T23:24:16+00:00</published><updated>2022-05-31T23:24:16+00:00</updated><id>https://simonwillison.net/2022/May/31/compiling-black-with-mypyc/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://ichard26.github.io/blog/2022/05/31/compiling-black-with-mypyc-part-1/"&gt;Compiling Black with mypyc&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Richard Si is a Black contributor who recently obtained a 2x performance boost by compiling Black using the mypyc tool from the mypy project, which uses Python type annotations to generate a compiled C version of the Python logic. He wrote up this fantastic three-part series describing in detail how he achieved this, including plenty of tips on Python profiling and clever optimization tricks.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/llanga/status/1531741163539005449"&gt;Łukasz Langa&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mypy"&gt;mypy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/black"&gt;black&lt;/a&gt;&lt;/p&gt;



</summary><category term="performance"/><category term="python"/><category term="mypy"/><category term="black"/></entry><entry><title>Mypyc</title><link href="https://simonwillison.net/2022/Jan/30/mypyc/#atom-tag" rel="alternate"/><published>2022-01-30T01:31:12+00:00</published><updated>2022-01-30T01:31:12+00:00</updated><id>https://simonwillison.net/2022/Jan/30/mypyc/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://mypyc.readthedocs.io/en/latest/introduction.html"&gt;Mypyc&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Spotted this in the Black release notes: “Black is now compiled with mypyc for an overall 2x speed-up”. Mypyc is a tool that compiles Python modules (written in a subset of Python) to C extensions—similar to Cython but using just Python syntax, taking advantage of type annotations to perform type checking and type inference. It’s part of the mypy type checking project, which has been using it since 2019 to gain a 4x performance improvement over regular Python.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://black.readthedocs.io/en/stable/change_log.html#id1"&gt;Black release notes&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/c"&gt;c&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mypy"&gt;mypy&lt;/a&gt;&lt;/p&gt;



</summary><category term="c"/><category term="performance"/><category term="python"/><category term="mypy"/></entry><entry><title>Tricking Postgres into using an insane – but 200x faster – query plan</title><link href="https://simonwillison.net/2022/Jan/18/tricking-postgres/#atom-tag" rel="alternate"/><published>2022-01-18T20:53:01+00:00</published><updated>2022-01-18T20:53:01+00:00</updated><id>https://simonwillison.net/2022/Jan/18/tricking-postgres/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://spacelift.io/blog/tricking-postgres-into-using-query-plan"&gt;Tricking Postgres into using an insane – but 200x faster – query plan&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Jacob Martin talks through a PostgreSQL query optimization they implemented at Spacelift, showing in detail how to interpret the results of EXPLAIN (FORMAT JSON, ANALYZE) using the explain.dalibo.com visualization tool.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/optimization"&gt;optimization&lt;/a&gt;&lt;/p&gt;



</summary><category term="performance"/><category term="postgresql"/><category term="optimization"/></entry><entry><title>Weeknotes: datasette-tiddlywiki, filters_from_request</title><link href="https://simonwillison.net/2021/Dec/24/datasette-tiddlywiki/#atom-tag" rel="alternate"/><published>2021-12-24T07:08:03+00:00</published><updated>2021-12-24T07:08:03+00:00</updated><id>https://simonwillison.net/2021/Dec/24/datasette-tiddlywiki/#atom-tag</id><summary type="html">
    &lt;p&gt;I made some good progress on &lt;a href="https://simonwillison.net/2021/Dec/16/eternal-refactor/"&gt;the big refactor&lt;/a&gt; this week, including extracting some core logic out into a new Datasette plugin hook. I also got distracted by &lt;a href="https://tiddlywiki.com/"&gt;TiddlyWiki&lt;/a&gt; and released a new Datasette plugin that lets you run TiddlyWiki inside Datasette.&lt;/p&gt;
&lt;h4&gt;datasette-tiddlywiki&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://tiddlywiki.com/"&gt;TiddlyWiki&lt;/a&gt; is a fascinating and unique project. Jeremy Ruston has been working on it for 17 years now and I've still not seen another piece of software that works even remotely like it.&lt;/p&gt;
&lt;p&gt;It's a full-featured wiki that's implemented entirely as a single 2.3MB page of HTML and JavaScript, with a plugin system that allows it to be extended in all sorts of interesting ways.&lt;/p&gt;
&lt;p&gt;The most unique feature of TiddlyWiki is how it persists data. You can create a brand new wiki by opening &lt;a href="https://tiddlywiki.com/empty.html"&gt;tiddlywiki.com/empty.html&lt;/a&gt; in your browser, making some edits... and then clicking the circle-tick "Save changes" button to download a copy of the page with your changes baked into it! Then you can open that up on your own computer and keep on using it.&lt;/p&gt;
&lt;p&gt;There's actually a lot more to TiddlyWiki persistence than that: The &lt;a href="https://tiddlywiki.com/#GettingStarted"&gt;GettingStarted&lt;/a&gt; guide lists dozens of options that vary depending on operating system and browser - it's worth browsing through them just to marvel at how much innovation has happened around the project just in the persistence space.&lt;/p&gt;
&lt;p&gt;One of the options is to run a little server that implements the &lt;a href="https://tiddlywiki.com/#WebServer%20API"&gt;WebServer API&lt;/a&gt; and persists data sent via PUT requests. SQLite is an obvious candidate for a backend, and Datasette makes it pretty easy to provide APIs on top of SQLite... so I decided to experiment with building a Datasette plugin that offers a full persistant TiddlyWiki experience.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://datasette.io/plugins/datasette-tiddlywiki"&gt;datasette-tiddlywiki&lt;/a&gt; is the result.&lt;/p&gt;
&lt;p&gt;You can try it out by running &lt;code&gt;datasette install datasette-tiddlywiki&lt;/code&gt; and then &lt;code&gt;datasette tiddlywiki.db --create&lt;/code&gt; to start the server (with a &lt;code&gt;tiddlywiki.db&lt;/code&gt; SQLite database that will be created if it does not already exist.)&lt;/p&gt;
&lt;p&gt;Then navigate to &lt;code&gt;http://localhost:8001/-/tiddlywiki&lt;/code&gt; to start interacting with your new TiddlyWiki. Any changes you make there will be persisted to the &lt;code&gt;tiddlywiki&lt;/code&gt; database.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2021/datasette-tiddlywiki-loop.gif" alt="Animated demo showing creating a new tiddler" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I had &lt;a href="https://github.com/simonw/datasette-tiddlywiki/issues/2"&gt;a running research issue&lt;/a&gt; that I updated as I was figuring out how to build it - all sorts of fun TiddlyWiki links and TILs are embedded in that thread. The issue started out in my private "notes" GitHub repository but &lt;a href="https://til.simonwillison.net/github/transfer-issue-private-to-public"&gt;I transferred it&lt;/a&gt; to the &lt;code&gt;datasette-tiddlywiki&lt;/code&gt; repository after I had created and published the first version of the plugin.&lt;/p&gt;
&lt;h4&gt;filters_from_request() plugin hook&lt;/h4&gt;
&lt;p&gt;My big breakthrough in the ongoing &lt;a href="https://github.com/simonw/datasette/issues/1518"&gt;Datasette Table View refactor project&lt;/a&gt; was a realization that I could simplify the table logic by extracting some of it out into a new plugin hook.&lt;/p&gt;
&lt;p&gt;The new hook is called &lt;a href="https://docs.datasette.io/en/latest/plugin_hooks.html#filters-from-request-request-database-table-datasette"&gt;filters_from_request&lt;/a&gt;. It acknowledges that the primary goal of &lt;a href="https://latest.datasette.io/fixtures/facetable"&gt;the table page&lt;/a&gt; is to convert query string parameters - like &lt;code&gt;?_search=tony&lt;/code&gt; or &lt;code&gt;?id__gte=6&lt;/code&gt; or &lt;code&gt;?_where=id+in+(1,+2+,3)&lt;/code&gt; into SQL where clauses.&lt;/p&gt;
&lt;p&gt;(Here's a &lt;a href="https://docs.datasette.io/en/stable/json_api.html#table-arguments"&gt;full list of supported table arguments&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;So that's what &lt;code&gt;filters_from_request()&lt;/code&gt; does - given a &lt;code&gt;request&lt;/code&gt; object it can return SQL clauses that should be added to the &lt;code&gt;WHERE&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Datasette now uses those internally to implement &lt;code&gt;?_where=&lt;/code&gt; and &lt;code&gt;?_search=&lt;/code&gt; and &lt;code&gt;?_through=&lt;/code&gt;, see &lt;a href="https://github.com/simonw/datasette/blob/0.60a1/datasette/filters.py"&gt;datasette/filters.py&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I always try to accompany a new plugin hook with a plugin that actually uses it - in this case I've been updating &lt;a href="https://github.com/simonw/datasette-leaflet-freedraw"&gt;datasette-leaflet-freedraw&lt;/a&gt; to use that hook to add a "draw a shape on a map to filter this table" interface to any table that it detects has a SpatiaLite geometry column. There's a demo of that here:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://calands.datasettes.com/calands/CPAD_2020a_SuperUnits?_freedraw=%7B%22type%22%3A%22MultiPolygon%22%2C%22coordinates%22%3A%5B%5B%5B%5B-121.92627%2C37.73597%5D%2C%5B-121.83838%2C37.68382%5D%2C%5B-121.64063%2C37.45742%5D%2C%5B-121.57471%2C37.19533%5D%2C%5B-121.81641%2C36.80928%5D%2C%5B-122.146%2C36.63316%5D%2C%5B-122.56348%2C36.65079%5D%2C%5B-122.89307%2C36.79169%5D%2C%5B-123.06885%2C36.96745%5D%2C%5B-123.09082%2C37.33522%5D%2C%5B-123.0249%2C37.562%5D%2C%5B-122.91504%2C37.77071%5D%2C%5B-122.71729%2C37.92687%5D%2C%5B-122.58545%2C37.96152%5D%2C%5B-122.10205%2C37.96152%5D%2C%5B-121.92627%2C37.73597%5D%5D%5D%5D%7D"&gt;https://calands.datasettes.com/calands/CPAD_2020a_SuperUnits?_freedraw=%7B%22type%22%3A%22MultiPolygon%22%2C%22coordinates%22%3A%5B%5B%5B%5B-121.92627%2C37.73597%5D%2C%5B-121.83838%2C37.68382%5D%2C%5B-121.64063%2C37.45742%5D%2C%5B-121.57471%2C37.19533%5D%2C%5B-121.81641%2C36.80928%5D%2C%5B-122.146%2C36.63316%5D%2C%5B-122.56348%2C36.65079%5D%2C%5B-122.89307%2C36.79169%5D%2C%5B-123.06885%2C36.96745%5D%2C%5B-123.09082%2C37.33522%5D%2C%5B-123.0249%2C37.562%5D%2C%5B-122.91504%2C37.77071%5D%2C%5B-122.71729%2C37.92687%5D%2C%5B-122.58545%2C37.96152%5D%2C%5B-122.10205%2C37.96152%5D%2C%5B-121.92627%2C37.73597%5D%5D%5D%5D%7D&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2021/datasette-leaflet-freedraw-loop.gif" alt="Animated demo of drawing a shape on a map and then submitting the form to see items within that map region" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Note the new custom &lt;code&gt;?_freedraw={...}&lt;/code&gt; parameter which accepts a GeoJSON polygon and uses it to filter the table - that's implemented using the new hook.&lt;/p&gt;
&lt;p&gt;This isn't in a full Datasette release yet, but it's available in the &lt;a href="https://github.com/simonw/datasette/releases/tag/0.60a1"&gt;Datasette 0.60a1 alpha&lt;/a&gt; (added in &lt;a href="https://github.com/simonw/datasette/releases/tag/0.60a0"&gt;0.60a0&lt;/a&gt;) if you want to try it out.&lt;/p&gt;
&lt;h4&gt;Optimizing populate_table_schemas()&lt;/h4&gt;
&lt;p&gt;I introduced the &lt;a href="https://datasette.io/plugins/datasette-pretty-traces"&gt;datasette-pretty-traces&lt;/a&gt; plugin &lt;a href="https://simonwillison.net/2021/Dec/16/eternal-refactor/"&gt;last week&lt;/a&gt; - it makes it much easier to see the queries that are running on any given Datasette page.&lt;/p&gt;
&lt;p&gt;This week I realized it wasn't tracking write queries, so &lt;a href="https://github.com/simonw/datasette/issues/1568"&gt;I added support for that&lt;/a&gt; - and discovered that on first page load after starting up Datasette spends a &lt;em&gt;lot&lt;/em&gt; of time populating its own internal database containing schema information (see &lt;a href="https://simonwillison.net/2020/Dec/27/weeknotes-datasette-internals/"&gt;Weeknotes: Datasette internals&lt;/a&gt; from last year.)&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2021/datasette-trace-many-writes.png" alt="Example trace showing a cavalcade of write SQL" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I opened &lt;a href="https://github.com/simonw/datasette/issues/1555"&gt;a tracking ticket&lt;/a&gt; and made a bunch of changes to optimize this. The new code in &lt;a href="https://github.com/simonw/datasette/blob/8c401ee0f054de2f568c3a8302c9223555146407/datasette/utils/internal_db.py"&gt;datasette/utils/internal_db.py&lt;/a&gt; uses two new documented internal methods:&lt;/p&gt;
&lt;h4&gt;db.execute_write_script() and db.execute_write_many()&lt;/h4&gt;
&lt;p&gt;These are the new methods that were created as part of the optimization work. They are documented here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.datasette.io/en/latest/internals.html#await-db-execute-write-script-sql-block-true"&gt;await db.execute_write_script(sql, block=True)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.datasette.io/en/latest/internals.html#await-db-execute-write-many-sql-params-seq-block-true"&gt;await db.execute_write_many(sql, params_seq, block=True)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;They are Datasette's async wrappers around the Python &lt;code&gt;sqlite3&lt;/code&gt; module's &lt;code&gt;executemany()&lt;/code&gt; and &lt;code&gt;executescript()&lt;/code&gt; methods.&lt;/p&gt;
&lt;p&gt;I also made a breaking change to Datasette's existing &lt;code&gt;execute_write()&lt;/code&gt; and &lt;code&gt;execute_write_fn()&lt;/code&gt; methods: their &lt;code&gt;block=&lt;/code&gt; argument now defaults to &lt;code&gt;True&lt;/code&gt;, where it previously defaulted to &lt;code&gt;False&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Prior to this change, &lt;code&gt;db.execute_write(sql)&lt;/code&gt; would put the passed SQL in a queue to be executed once the write connection became available... and then return control to the calling code, whether or not that SQL had actually run- a fire-and-forget mechanism for executing SQL.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;block=True&lt;/code&gt; option would change it to blocking until the query had finished executing.&lt;/p&gt;
&lt;p&gt;Looking at my own code, I realized I had &lt;em&gt;never once&lt;/em&gt; used the fire-and-forget mechanism: I always used &lt;code&gt;block=True&lt;/code&gt; to ensure the SQL had finished writing before I moved on.&lt;/p&gt;
&lt;p&gt;So clearly &lt;code&gt;block=True&lt;/code&gt; was a better default. I made that change in &lt;a href="https://github.com/simonw/datasette/issues/1579"&gt;issue 1579&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is technically a breaking change... but I used the new &lt;a href="https://cs.github.com/"&gt;GitHub code search&lt;/a&gt; to see if anyone was using it in a way that would break and could only find one example of it in code not written by me, &lt;a href="https://github.com/mfa/datasette-webhook-write/blob/e82440f372a2f2e3ed27d1bd34c9fa3a53b49b94/datasette_webhook_write/__init__.py#L89"&gt;in datasette-webhook-write&lt;/a&gt; - and since they use &lt;code&gt;block=True&lt;/code&gt; there anyway this update won't break their code.&lt;/p&gt;
&lt;p&gt;If I'd released Datasette 1.0 I would still consider this a breaking change and bump the major version number, but thankfully I'm still in the 0.x range where I can be a bit less formal about these kinds of thing!&lt;/p&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-tiddlywiki"&gt;datasette-tiddlywiki&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-tiddlywiki/releases/tag/0.1"&gt;0.1&lt;/a&gt; - 2021-12-23
&lt;br /&gt;Run TiddlyWiki in Datasette and save Tiddlers to a SQLite database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/asyncinject"&gt;asyncinject&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/asyncinject/releases/tag/0.2"&gt;0.2&lt;/a&gt; - (&lt;a href="https://github.com/simonw/asyncinject/releases"&gt;4 releases total&lt;/a&gt;) - 2021-12-21
&lt;br /&gt;Run async workflows using pytest-fixtures-style dependency injection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette"&gt;datasette&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette/releases/tag/0.60a1"&gt;0.60a1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette/releases"&gt;104 releases total&lt;/a&gt;) - 2021-12-19
&lt;br /&gt;An open source multi-tool for exploring and publishing data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-pretty-traces"&gt;datasette-pretty-traces&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-pretty-traces/releases/tag/0.3.1"&gt;0.3.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-pretty-traces/releases"&gt;5 releases total&lt;/a&gt;) - 2021-12-19
&lt;br /&gt;Prettier formatting for ?_trace=1 traces&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/spatialite/minimal-spatialite-database-in-python"&gt;Creating a minimal SpatiaLite database with Python&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/python/safe-output-json"&gt;Safely outputting JSON&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/python/annotated-dataklasses"&gt;Annotated explanation of David Beazley's dataklasses&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/cloudflare/robots-txt-cloudflare-workers"&gt;Adding a robots.txt using Cloudflare workers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/github/transfer-issue-private-to-public"&gt;Transferring a GitHub issue from a private to a public repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/geospatial"&gt;geospatial&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tiddlywiki"&gt;tiddlywiki&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="geospatial"/><category term="performance"/><category term="plugins"/><category term="projects"/><category term="tiddlywiki"/><category term="datasette"/><category term="weeknotes"/></entry><entry><title>Apply conversion functions to data in SQLite columns with the sqlite-utils CLI tool</title><link href="https://simonwillison.net/2021/Aug/6/sqlite-utils-convert/#atom-tag" rel="alternate"/><published>2021-08-06T06:05:15+00:00</published><updated>2021-08-06T06:05:15+00:00</updated><id>https://simonwillison.net/2021/Aug/6/sqlite-utils-convert/#atom-tag</id><summary type="html">
    &lt;p&gt;Earlier this week I released &lt;a href="https://sqlite-utils.datasette.io/en/stable/changelog.html#v3-14"&gt;sqlite-utils 3.14&lt;/a&gt; with a powerful new command-line tool: &lt;code&gt;sqlite-utils convert&lt;/code&gt;, which applies a conversion function to data stored in a SQLite column.&lt;/p&gt;
&lt;p&gt;Anyone who works with data will tell you that 90% of the work is cleaning it up. Running command-line conversions against data in a SQLite file turns out to be a really productive way to do that.&lt;/p&gt;
&lt;h4&gt;Transforming a column&lt;/h4&gt;
&lt;p&gt;Here's a simple example. Say someone gave you data with numbers that are formatted with commas - like &lt;code&gt;3,044,502&lt;/code&gt; - in a &lt;code&gt;count&lt;/code&gt; column in a &lt;code&gt;states&lt;/code&gt; table.&lt;/p&gt;
&lt;p&gt;You can strip those commas out like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sqlite-utils convert states.db states count \
    'value.replace(",", "")'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;convert&lt;/code&gt; command takes four arguments: the database file, the name of the table, the name of the column and a string containing a fragment of Python code that defines the conversion to be applied.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Animated demo using sqlite-utils convert to strip out commas" src="https://static.simonwillison.net/static/2021/sqlite-convert-demo.gif" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;The conversion function can be anything you can express with Python. If you want to import extra modules you can do so using &lt;code&gt;--import module&lt;/code&gt; - here's an example that wraps text using the &lt;a href=""&gt;textwrap&lt;/a&gt; module from the Python standard library:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sqlite-utils convert content.db articles content \
    '"\n".join(textwrap.wrap(value, 100))' \
    --import=textwrap
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can consider this analogous to using &lt;code&gt;Array.map()&lt;/code&gt; in JavaScript, or running a transformation using a list comprehension in Python.&lt;/p&gt;
&lt;h4&gt;Custom functions in SQLite&lt;/h4&gt;
&lt;p&gt;Under the hood, the tool takes advantage of a powerful SQLite feature: the ability to &lt;a href="https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.create_function"&gt;register custom functions&lt;/a&gt; written in Python (or other languages) and call them from SQL.&lt;/p&gt;
&lt;p&gt;The text wrapping example above works by executing the following SQL:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;update&lt;/span&gt; articles &lt;span class="pl-k"&gt;set&lt;/span&gt; content &lt;span class="pl-k"&gt;=&lt;/span&gt; convert_value(content)&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;convert_value(value)&lt;/code&gt; is a custom SQL function, compiled as Python code and then made available to the database connection.&lt;/p&gt;
&lt;p&gt;The equivalent code using just the Python standard library would look like this:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;sqlite3&lt;/span&gt;
&lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;textwrap&lt;/span&gt;

&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;convert_value&lt;/span&gt;(&lt;span class="pl-s1"&gt;value&lt;/span&gt;):
    &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-s"&gt;"&lt;span class="pl-cce"&gt;\n&lt;/span&gt;"&lt;/span&gt;.&lt;span class="pl-en"&gt;join&lt;/span&gt;(&lt;span class="pl-s1"&gt;textwrap&lt;/span&gt;.&lt;span class="pl-en"&gt;wrap&lt;/span&gt;(&lt;span class="pl-s1"&gt;value&lt;/span&gt;, &lt;span class="pl-c1"&gt;100&lt;/span&gt;))

&lt;span class="pl-s1"&gt;conn&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;sqlite3&lt;/span&gt;.&lt;span class="pl-en"&gt;connect&lt;/span&gt;(&lt;span class="pl-s"&gt;"content.db"&lt;/span&gt;)
&lt;span class="pl-s1"&gt;conn&lt;/span&gt;.&lt;span class="pl-en"&gt;create_function&lt;/span&gt;(&lt;span class="pl-s"&gt;"convert_value"&lt;/span&gt;, &lt;span class="pl-c1"&gt;1&lt;/span&gt;, &lt;span class="pl-s1"&gt;convert_value&lt;/span&gt;)
&lt;span class="pl-s1"&gt;conn&lt;/span&gt;.&lt;span class="pl-en"&gt;execute&lt;/span&gt;(&lt;span class="pl-s"&gt;"update articles set content = convert_value(content)"&lt;/span&gt;)&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;sqlite-utils convert&lt;/code&gt; works by &lt;a href="https://github.com/simonw/sqlite-utils/blob/cc90745f4e8bb1ac57d8ee973863cfe00c2e4fe5/sqlite_utils/cli.py#L2019-L2028"&gt;compiling the code argument&lt;/a&gt; to a Python function, registering it with the connection and executing the above SQL query.&lt;/p&gt;
&lt;h4&gt;Splitting columns into multiple other columns&lt;/h4&gt;
&lt;p&gt;Sometimes when I'm working with a table I find myself wanting to split a column into multiple other columns.&lt;/p&gt;
&lt;p&gt;A classic example is locations - if a &lt;code&gt;location&lt;/code&gt; column contains &lt;code&gt;latitude,longitude&lt;/code&gt; values I'll often want to split that into separate &lt;code&gt;latitude&lt;/code&gt; and &lt;code&gt;longitude&lt;/code&gt; columns, so I can visualize the data with &lt;a href="https://datasette.io/plugins/datasette-cluster-map"&gt;datasette-cluster-map&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;--multi&lt;/code&gt; option lets you do that using &lt;code&gt;sqlite-utils convert&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sqlite-utils convert data.db places location '
latitude, longitude = value.split(",")
return {
    "latitude": float(latitude),
    "longitude": float(longitude),
}' --multi
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;--multi&lt;/code&gt; tells the command to expect the Python code to return dictionaries. It will then create new columns in the database corresponding to the keys in those dictionaries and populate them using the results of the transformation.&lt;/p&gt;
&lt;p&gt;If the &lt;code&gt;places&lt;/code&gt; table started with just a &lt;code&gt;location&lt;/code&gt; column, after running the above command the new table schema will look like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;CREATE TABLE [places] (
    [location] &lt;span class="pl-k"&gt;TEXT&lt;/span&gt;,
    [latitude] FLOAT,
    [longitude] FLOAT
);&lt;/pre&gt;&lt;/div&gt;
&lt;h4&gt;Common recipes&lt;/h4&gt;
&lt;p&gt;This new feature in &lt;code&gt;sqlite-utils&lt;/code&gt; actually started life as a separate tool entirely, called &lt;a href="https://github.com/simonw/sqlite-transform"&gt;sqlite-transform&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Part of the rationale for adding it to &lt;code&gt;sqlite-utils&lt;/code&gt; was to avoid confusion between what that tool did and the &lt;a href="https://simonwillison.net/2020/Sep/23/sqlite-advanced-alter-table/"&gt;sqlite-utils transform&lt;/a&gt; tool, which does something completely different (applies table transformations that aren't possible using SQLite's default &lt;code&gt;ALTER TABLE&lt;/code&gt; statement). Somewhere along the line I messed up with the naming of the two tools!&lt;/p&gt;
&lt;p&gt;&lt;code&gt;sqlite-transform&lt;/code&gt; bundles a number of useful &lt;a href="https://github.com/simonw/sqlite-transform/blob/main/README.md#parsedate-and-parsedatetime"&gt;default transformation recipes&lt;/a&gt;, in addition to allowing arbitrary Python code. I ended up making these available in &lt;code&gt;sqlite-utils convert&lt;/code&gt; by exposing them as functions that can be called from the command-line code argument like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sqlite-utils convert my.db articles created_at \
    'r.parsedate(value)'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Implementing them as Python functions in this way meant I didn't need to invent a new command-line mechanism for passing in additional options to the individual recipes - instead, parameters are passed like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sqlite-utils convert my.db articles created_at \
    'r.parsedate(value, dayfirst=True)'
&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;Also available in the sqlite_utils Python library&lt;/h4&gt;
&lt;p&gt;Almost every feature that is exposed by the &lt;a href="https://sqlite-utils.datasette.io/en/stable/cli.html"&gt;sqlite-utils command-line tool&lt;/a&gt; has a matching API in the &lt;a href="https://sqlite-utils.datasette.io/en/stable/python-api.html"&gt;sqlite_utils Python library&lt;/a&gt;. &lt;code&gt;convert&lt;/code&gt; is no exception.&lt;/p&gt;
&lt;p&gt;The Python API lets you perform operations like the following:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-s1"&gt;db&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;sqlite_utils&lt;/span&gt;.&lt;span class="pl-v"&gt;Database&lt;/span&gt;(&lt;span class="pl-s"&gt;"dogs.db"&lt;/span&gt;)

&lt;span class="pl-s1"&gt;db&lt;/span&gt;[&lt;span class="pl-s"&gt;"dogs"&lt;/span&gt;].&lt;span class="pl-en"&gt;convert&lt;/span&gt;(&lt;span class="pl-s"&gt;"name"&lt;/span&gt;, &lt;span class="pl-k"&gt;lambda&lt;/span&gt; &lt;span class="pl-s1"&gt;value&lt;/span&gt;: &lt;span class="pl-s1"&gt;value&lt;/span&gt;.&lt;span class="pl-en"&gt;upper&lt;/span&gt;())&lt;/pre&gt;
&lt;p&gt;Any Python callable can be passed to &lt;code&gt;convert&lt;/code&gt;, and it will be applied to every value in the specified column - again, like using &lt;code&gt;map()&lt;/code&gt; to apply a transformation to every item in an array.&lt;/p&gt;
&lt;p&gt;You can also use the Python API to perform more complex operations like the following two examples:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-c"&gt;# Convert title to upper case only for rows with id &amp;gt; 20&lt;/span&gt;
&lt;span class="pl-s1"&gt;table&lt;/span&gt;.&lt;span class="pl-en"&gt;convert&lt;/span&gt;(
    &lt;span class="pl-s"&gt;"title"&lt;/span&gt;,
    &lt;span class="pl-k"&gt;lambda&lt;/span&gt; &lt;span class="pl-s1"&gt;v&lt;/span&gt;: &lt;span class="pl-s1"&gt;v&lt;/span&gt;.&lt;span class="pl-en"&gt;upper&lt;/span&gt;(),
    &lt;span class="pl-s1"&gt;where&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"id &amp;gt; :id"&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;where_args&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;{&lt;span class="pl-s"&gt;"id"&lt;/span&gt;: &lt;span class="pl-c1"&gt;20&lt;/span&gt;}
)

&lt;span class="pl-c"&gt;# Create two new columns, "upper" and "lower",&lt;/span&gt;
&lt;span class="pl-c"&gt;# and populate them from the converted title&lt;/span&gt;
&lt;span class="pl-s1"&gt;table&lt;/span&gt;.&lt;span class="pl-en"&gt;convert&lt;/span&gt;(
    &lt;span class="pl-s"&gt;"title"&lt;/span&gt;,
    &lt;span class="pl-k"&gt;lambda&lt;/span&gt; &lt;span class="pl-s1"&gt;v&lt;/span&gt;: {
        &lt;span class="pl-s"&gt;"upper"&lt;/span&gt;: &lt;span class="pl-s1"&gt;v&lt;/span&gt;.&lt;span class="pl-en"&gt;upper&lt;/span&gt;(),
        &lt;span class="pl-s"&gt;"lower"&lt;/span&gt;: &lt;span class="pl-s1"&gt;v&lt;/span&gt;.&lt;span class="pl-en"&gt;lower&lt;/span&gt;()
    }, &lt;span class="pl-s1"&gt;multi&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;True&lt;/span&gt;
)&lt;/pre&gt;
&lt;p&gt;See the &lt;a href="https://sqlite-utils.datasette.io/en/stable/python-api.html#converting-data-in-columns"&gt;full documentation for table.convert()&lt;/a&gt; for more options.&lt;/p&gt;
&lt;h4 id="blog-performance"&gt;A more sophisticated example: analyzing log files&lt;/h4&gt;
&lt;p&gt;I used the new &lt;code&gt;sqlite-utils convert&lt;/code&gt; command earlier today, to debug a performance issue with my blog.&lt;/p&gt;
&lt;p&gt;Most of my blog traffic is served via Cloudflare with a 15 minute cache timeout - but occasionally I'll hit an uncached page, and they had started to feel not quite as snappy as I would expect.&lt;/p&gt;
&lt;p&gt;So I dipped into the Heroku dashboard, and saw this pretty sad looking graph:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Performance graph showing 95th percentile of 17s and max of 23s" src="https://static.simonwillison.net/static/2021/sad-performance.png" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Somehow my 50th percentile was nearly 10 seconds, and my maximum page response time was 23 seconds! Something was clearly very wrong.&lt;/p&gt;
&lt;p&gt;I use NGINX as part of my Heroku setup to buffer responses (see &lt;a href="https://simonwillison.net/2017/Oct/2/nginx-heroku/"&gt;Running gunicorn behind nginx on Heroku for buffering and logging&lt;/a&gt;), and I have custom NGINX configuration to write to the Heroku logs - mainly to work around a limitation in Heroku's default logging where it fails to record full user-agents or referrer headers.&lt;/p&gt;
&lt;p&gt;I extended that configuration to record the NGINX &lt;code&gt;request_time&lt;/code&gt;, &lt;code&gt;upstream_response_time&lt;/code&gt;, &lt;code&gt;upstream_connect_time&lt;/code&gt; and &lt;code&gt;upstream_header_time&lt;/code&gt; variables, which I hoped would help me figure out what was going on.&lt;/p&gt;
&lt;p&gt;After &lt;a href="https://github.com/simonw/simonwillisonblog/commit/dd0faaa64c0e361ae1d760894e201cac7b0224a4"&gt;applying that change&lt;/a&gt; I started seeing Heroku log lines that looked like this:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;2021-08-05T17:58:28.880469+00:00 app[web.1]: measure#nginx.service=4.212 request="GET /search/?type=blogmark&amp;amp;page=2&amp;amp;tag=highavailability HTTP/1.1" status_code=404 request_id=25eb296e-e970-4072-b75a-606e11e1db5b remote_addr="10.1.92.174" forwarded_for="114.119.136.88, 172.70.142.28" forwarded_proto="http" via="1.1 vegur" body_bytes_sent=179 referer="-" user_agent="Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)" request_time="4.212" upstream_response_time="4.212" upstream_connect_time="0.000" upstream_header_time="4.212";&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Next step: analyze those log lines.&lt;/p&gt;
&lt;p&gt;I ran this command for a few minutes to gather some logs:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;heroku logs -a simonwillisonblog --tail | grep 'measure#nginx.service' &amp;gt; /tmp/log.txt&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Having collected 488 log lines, the next step was to load them into SQLite.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;sqlite-utils insert&lt;/code&gt; command likes to work with JSON, but I just had raw log lines. I used &lt;code&gt;jq&lt;/code&gt; to convert each line into a &lt;code&gt;{"line": "raw log line"}&lt;/code&gt; JSON object, then piped that as newline-delimited JSON into &lt;code&gt;sqlite-utils insert&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cat /tmp/log.txt | \
    jq --raw-input '{line: .}' --compact-output | \
    sqlite-utils insert /tmp/logs.db log - --nl
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;jq --raw-input&lt;/code&gt; accepts input that is just raw lines of text, not yet valid JSON. &lt;code&gt;'{line: .}'&lt;/code&gt; is a tiny &lt;code&gt;jq&lt;/code&gt; program that builds &lt;code&gt;{"line": "raw input"}&lt;/code&gt; objects. &lt;code&gt;--compact-output&lt;/code&gt; causes &lt;code&gt;jq&lt;/code&gt; to output newline-delimited JSON.&lt;/p&gt;
&lt;p&gt;Then &lt;code&gt;sqlite-utils insert /tmp/logs.db log - --nl&lt;/code&gt; reads that newline-delimited JSON into a new SQLite &lt;code&gt;log&lt;/code&gt; table in a &lt;code&gt;logs.db&lt;/code&gt; database file (&lt;a href="https://sqlite-utils.datasette.io/en/stable/cli.html#inserting-newline-delimited-json"&gt;full documentation here&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Update 6th January 2022:&lt;/strong&gt; &lt;a href="https://sqlite-utils.datasette.io/en/stable/changelog.html#v3-20"&gt;sqlite-utils 3.20&lt;/a&gt; introduced a new &lt;code&gt;sqlite-utils insert ... --lines&lt;/code&gt; option for importing raw lines, so you can now achieve this without using &lt;code&gt;jq&lt;/code&gt; at all. See 
&lt;a href="https://sqlite-utils.datasette.io/en/stable/cli.html#inserting-unstructured-data-with-lines-and-text"&gt;Inserting unstructured data with --lines and --text&lt;/a&gt; for details.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Now I had a SQLite table with a single column, &lt;code&gt;line&lt;/code&gt;. Next step: parse that nasty log format.&lt;/p&gt;
&lt;p&gt;To my surprise I couldn't find an existing Python library for parsing &lt;code&gt;key=value key2="quoted value"&lt;/code&gt; log lines. Instead I had to figure out a regular expression:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;([^\s=]+)=(?:"(.*?)"|(\S+))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here's that expression visualized using &lt;a href="https://www.debuggex.com/"&gt;Debuggex&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of the regex visualized with debuggex" src="https://static.simonwillison.net/static/2021/debuggex-log-parser-regex.png" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I used that regular expression as part of a custom function passed in to the &lt;code&gt;sqlite-utils convert&lt;/code&gt; tool:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sqlite-utils convert /tmp/logs.db log line --import re --multi "$(cat &amp;lt;&amp;lt;EOD
    r = re.compile(r'([^\s=]+)=(?:"(.*?)"|(\S+))')
    pairs = {}
    for key, value1, value2 in r.findall(value):
        pairs[key] = value1 or value2
    return pairs
EOD
)"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(This uses a &lt;code&gt;cat &amp;lt;&amp;lt;EOD&lt;/code&gt; trick to avoid having to figure out how to escape the single and double quotes in the Python code for usage in a zsh shell command.)&lt;/p&gt;
&lt;p&gt;Using &lt;code&gt;--multi&lt;/code&gt; here created new columns for each of the key/value pairs seen in that log file.&lt;/p&gt;
&lt;p&gt;One last step: convert the types. The new columns are all of type &lt;code&gt;text&lt;/code&gt; but I want to do sorting and arithmetic on them so I need to convert them to integers and floats. I used &lt;a href="https://sqlite-utils.datasette.io/en/stable/cli.html#transforming-tables"&gt;sqlite-utils transform&lt;/a&gt; for that:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sqlite-utils transform /tmp/logs.db log \
    --type 'measure#nginx.service' float \
    --type 'status_code' integer \
    --type 'body_bytes_sent' integer \
    --type 'request_time' float \
    --type 'upstream_response_time' float \
    --type 'upstream_connect_time' float \
    --type 'upstream_header_time' float
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here's the &lt;a href="https://lite.datasette.io/?url=https://gist.githubusercontent.com/simonw/3454951e23cab709da42d25520dd78cf/raw/3383a16cd1f423d39c9c6923b6b37a3e74c4f148/logs.db#/logs/log"&gt;resulting log table&lt;/a&gt; (in Datasette Lite).&lt;/p&gt;
&lt;p&gt;&lt;img alt="Datasette showing the log table" src="https://static.simonwillison.net/static/2021/performance-logs.png" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Once the logs were in Datasette, the problem quickly became apparent when I &lt;a href="https://lite.datasette.io/?url=https://gist.githubusercontent.com/simonw/3454951e23cab709da42d25520dd78cf/raw/3383a16cd1f423d39c9c6923b6b37a3e74c4f148/logs.db#/logs/log?_sort_desc=request_time"&gt;sorted by request_time&lt;/a&gt;: an army of search engine crawlers were hitting deep linked filters in &lt;a href="https://simonwillison.net/2017/Oct/5/django-postgresql-faceted-search/"&gt;my faceted search engine&lt;/a&gt;, like &lt;code&gt;/search/?tag=geolocation&amp;amp;tag=offlineresources&amp;amp;tag=canvas&amp;amp;tag=javascript&amp;amp;tag=performance&amp;amp;tag=dragndrop&amp;amp;tag=crossdomain&amp;amp;tag=mozilla&amp;amp;tag=video&amp;amp;tag=tracemonkey&amp;amp;year=2009&amp;amp;type=blogmark&lt;/code&gt;. These are expensive pages to generate! They're also very unlikely to be in my Cloudflare cache.&lt;/p&gt;
&lt;p&gt;Could the answer be as simple as a &lt;code&gt;robots.txt&lt;/code&gt; rule blocking access to &lt;code&gt;/search/&lt;/code&gt;?&lt;/p&gt;
&lt;p&gt;I &lt;a href="https://github.com/simonw/simonwillisonblog/commit/4c0de5b9f01bb16fc89c587128a276055b0033bb"&gt;shipped that change&lt;/a&gt; and waited a few hours to see what the impact would be:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Heroku metrics showing a dramatic improvement after the deploy, and especially about 8 hours later" src="https://static.simonwillison.net/static/2021/robots-txt-effect.png" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;It took a while for the crawlers to notice that my &lt;code&gt;robots.txt&lt;/code&gt; had changed, but by 8 hours later my site performance was dramatically improved - I'm now seeing 99th percentile of around 450ms, compared to 25 seconds before I shipped the &lt;code&gt;robots.txt&lt;/code&gt; change!&lt;/p&gt;
&lt;p&gt;With this latest addition, &lt;a href="https://sqlite-utils.datasette.io/"&gt;sqlite-utils&lt;/a&gt; has evolved into a powerful tool for importing, cleaning and re-shaping data - especially when coupled with Datasette in order to explore, analyze and publish the results.&lt;/p&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/vscode/vs-code-regular-expressions"&gt;Search and replace with regular expressions in VS Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/python/codespell"&gt;Check spelling using codespell&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/imagemagick/set-a-gif-to-loop"&gt;Set a GIF to loop using ImageMagick&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/sqlite/sqlite-aggregate-filter-clauses"&gt;SQLite aggregate filter clauses&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/imagemagick/compress-animated-gif"&gt;Compressing an animated GIF with ImageMagick mogrify&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/sqlite-transform"&gt;sqlite-transform&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/sqlite-transform/releases/tag/1.2.1"&gt;1.2.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/sqlite-transform/releases"&gt;10 releases total&lt;/a&gt;) - 2021-08-02
&lt;br /&gt;Tool for running transformations on columns in a SQLite database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/sqlite-utils/releases/tag/3.14"&gt;3.14&lt;/a&gt; - (&lt;a href="https://github.com/simonw/sqlite-utils/releases"&gt;82 releases total&lt;/a&gt;) - 2021-08-02
&lt;br /&gt;Python CLI utility and library for manipulating SQLite databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-json-html"&gt;datasette-json-html&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-json-html/releases/tag/1.0.1"&gt;1.0.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-json-html/releases"&gt;6 releases total&lt;/a&gt;) - 2021-07-31
&lt;br /&gt;Datasette plugin for rendering HTML based on JSON values&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-publish-fly"&gt;datasette-publish-fly&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-publish-fly/releases/tag/1.0.2"&gt;1.0.2&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-publish-fly/releases"&gt;5 releases total&lt;/a&gt;) - 2021-07-30
&lt;br /&gt;Datasette plugin for publishing data using Fly&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cli"&gt;cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/data-science"&gt;data-science&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="cli"/><category term="performance"/><category term="projects"/><category term="sqlite"/><category term="datasette"/><category term="data-science"/><category term="weeknotes"/><category term="sqlite-utils"/></entry><entry><title>Quoting Brendan Gregg</title><link href="https://simonwillison.net/2021/Jun/8/observability/#atom-tag" rel="alternate"/><published>2021-06-08T19:33:16+00:00</published><updated>2021-06-08T19:33:16+00:00</updated><id>https://simonwillison.net/2021/Jun/8/observability/#atom-tag</id><summary type="html">
    &lt;blockquote cite="http://www.brendangregg.com/blog/2021-05-23/what-is-observability.html"&gt;&lt;p&gt;When I was a performance consultant I'd show up to random companies who wanted me to fix their computer performance issues. If they trusted me with a login to their production servers, I could help them a lot quicker. To get that trust I knew which tools looked but didn't touch: Which were observability tools and which were experimental tools. "I'll start with observability tools only" is something I'd say at the start of every engagement.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="http://www.brendangregg.com/blog/2021-05-23/what-is-observability.html"&gt;Brendan Gregg&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/observability"&gt;observability&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/brendan-gregg"&gt;brendan-gregg&lt;/a&gt;&lt;/p&gt;



</summary><category term="performance"/><category term="observability"/><category term="brendan-gregg"/></entry><entry><title>Cleaning Up Your Postgres Database</title><link href="https://simonwillison.net/2021/Feb/3/cleaning-your-postgres-database/#atom-tag" rel="alternate"/><published>2021-02-03T07:32:33+00:00</published><updated>2021-02-03T07:32:33+00:00</updated><id>https://simonwillison.net/2021/Feb/3/cleaning-your-postgres-database/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://info.crunchydata.com/blog/cleaning-up-your-postgres-database"&gt;Cleaning Up Your Postgres Database&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Craig Kerstiens provides some invaluable tips on running an initial check of the health of a PostgreSQL database, by using queries against the pg_statio_user_indexes table to find the memory cache hit ratio and the pg_stat_user_tables table to see what percentage of queries to your tables are using an index.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/craigkerstiens/status/1356707553980284928"&gt;@craigkerstiens&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/databases"&gt;databases&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/craig-kerstiens"&gt;craig-kerstiens&lt;/a&gt;&lt;/p&gt;



</summary><category term="databases"/><category term="performance"/><category term="postgresql"/><category term="craig-kerstiens"/></entry></feed>