<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: documentation</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/documentation.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2025-07-16T13:19:19+00:00</updated><author><name>Simon Willison</name></author><entry><title>Documenting what you're willing to support (and not)</title><link href="https://simonwillison.net/2025/Jul/16/documenting/#atom-tag" rel="alternate"/><published>2025-07-16T13:19:19+00:00</published><updated>2025-07-16T13:19:19+00:00</updated><id>https://simonwillison.net/2025/Jul/16/documenting/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://rachelbythebay.com/w/2025/07/07/support/"&gt;Documenting what you&amp;#x27;re willing to support (and not)&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Devious company culture hack from Rachel Kroll:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;At some point, I realized that if I wrote a wiki page and documented the things that we were willing to support, I could wait about six months and then it would be like it had always been there. Enough people went through the revolving doors of that place such that six months' worth of employee turnover was sufficient to make it look like a whole other company. All I had to do was write it, wait a bit, then start citing it when needed.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You can have an unreasonable amount of influence by being the person who &lt;em&gt;writes stuff down&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;This reminded me of Charity Majors' &lt;a href="https://simonwillison.net/2018/Dec/2/the-golden-path/"&gt;Golden Path&lt;/a&gt; process.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=44543386"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rachel-kroll"&gt;rachel-kroll&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="rachel-kroll"/></entry><entry><title>No docs, no bugs</title><link href="https://simonwillison.net/2025/May/22/no-docs-no-bugs/#atom-tag" rel="alternate"/><published>2025-05-22T01:55:31+00:00</published><updated>2025-05-22T01:55:31+00:00</updated><id>https://simonwillison.net/2025/May/22/no-docs-no-bugs/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;strong&gt;If your library doesn't have any documentation, it can't have any bugs.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Documentation specifies what your code is supposed to do. Your tests specify what it actually does.&lt;/p&gt;
&lt;p&gt;Bugs exist when your test-enforced implementation fails to match the behavior described in your documentation.  Without documentation a bug is just undefined behavior.&lt;/p&gt;
&lt;p&gt;If you aim to follow &lt;a href="https://semver.org/"&gt;semantic versioning&lt;/a&gt; you bump your major version when you release a backwards incompatible change. Such changes cannot exist if your code is not comprehensively documented!&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Inspired by a half-remembered conversation I had with &lt;a href="https://movieos.org/"&gt;Tom Insam&lt;/a&gt; many years ago. &lt;/small&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/semantic-versioning"&gt;semantic-versioning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/testing"&gt;testing&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="semantic-versioning"/><category term="testing"/></entry><entry><title>o3-mini is really good at writing internal documentation</title><link href="https://simonwillison.net/2025/Feb/5/o3-mini-documentation/#atom-tag" rel="alternate"/><published>2025-02-05T06:07:40+00:00</published><updated>2025-02-05T06:07:40+00:00</updated><id>https://simonwillison.net/2025/Feb/5/o3-mini-documentation/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://gist.github.com/simonw/4a13c4b10176d7b8e3d1260f5dcc9de3"&gt;o3-mini is really good at writing internal documentation&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I wanted to refresh my knowledge of how the Datasette permissions system works today. I already have &lt;a href="https://docs.datasette.io/en/latest/authentication.html"&gt;extensive hand-written documentation&lt;/a&gt; for that, but I thought it would be interesting to see if I could derive any insights from running an LLM against the codebase.&lt;/p&gt;
&lt;p&gt;o3-mini has an input limit of 200,000 tokens. I used &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; and my &lt;a href="https://github.com/simonw/files-to-prompt"&gt;files-to-prompt&lt;/a&gt; tool to generate the documentation like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;&lt;span class="pl-c1"&gt;cd&lt;/span&gt; /tmp
git clone https://github.com/simonw/datasette
&lt;span class="pl-c1"&gt;cd&lt;/span&gt; datasette
files-to-prompt datasette -e py -c &lt;span class="pl-k"&gt;|&lt;/span&gt; \
  llm -m o3-mini -s \
  &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;write extensive documentation for how the permissions system works, as markdown&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;files-to-prompt&lt;/code&gt; command is fed the &lt;a href="https://github.com/simonw/datasette/tree/main/datasette"&gt;datasette&lt;/a&gt; subdirectory, which contains just the source code for the application - omitting tests (in &lt;code&gt;tests/&lt;/code&gt;) and documentation (in &lt;code&gt;docs/&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;-e py&lt;/code&gt; option causes it to only include files with a &lt;code&gt;.py&lt;/code&gt; extension - skipping all of the HTML and JavaScript files in that hierarchy.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;-c&lt;/code&gt; option causes it to output Claude's XML-ish format - a format that works great with other LLMs too.&lt;/p&gt;
&lt;p&gt;You can see the output of that command &lt;a href="https://gist.github.com/simonw/1922544763b08c76f0b904e2ece364ea"&gt;in this Gist&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Then I pipe that result into LLM, requesting the &lt;code&gt;o3-mini&lt;/code&gt; OpenAI model and passing the following system prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;write extensive documentation for how the permissions system works, as markdown&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Specifically requesting Markdown &lt;a href="https://simonwillison.net/2025/Feb/2/openai-reasoning-models-advice-on-prompting/"&gt;is important&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The prompt used 99,348 input tokens and produced 3,118 output tokens (320 of those were invisible reasoning tokens). That's &lt;a href="https://tools.simonwillison.net/llm-prices"&gt;a cost&lt;/a&gt; of 12.3 cents.&lt;/p&gt;
&lt;p&gt;Honestly, &lt;a href="https://gist.github.com/simonw/4a13c4b10176d7b8e3d1260f5dcc9de3"&gt;the results&lt;/a&gt; are fantastic. I had to double-check that I hadn't accidentally fed in the documentation by mistake.&lt;/p&gt;
&lt;p&gt;(It's possible that the model is picking up additional information about Datasette in its training set, but I've seen similar &lt;a href="https://gist.github.com/simonw/adf64108d65cd5c10ac9fce953ab437e"&gt;high quality results&lt;/a&gt; from other, newer libraries so I don't think that's a significant factor.)&lt;/p&gt;
&lt;p&gt;In this case I already had extensive written documentation of my own, but this was still a useful refresher to help confirm that the code matched my mental model of how everything works.&lt;/p&gt;
&lt;p&gt;Documentation of project internals as a category is notorious for going out of date. Having tricks like this to derive usable how-it-works documentation from existing codebases in just a few seconds and at a cost of a few cents is wildly valuable.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/o3"&gt;o3&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/files-to-prompt"&gt;files-to-prompt&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="ai"/><category term="datasette"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="llm"/><category term="llm-reasoning"/><category term="o3"/><category term="files-to-prompt"/></entry><entry><title>OpenAI reasoning models: Advice on prompting</title><link href="https://simonwillison.net/2025/Feb/2/openai-reasoning-models-advice-on-prompting/#atom-tag" rel="alternate"/><published>2025-02-02T20:56:27+00:00</published><updated>2025-02-02T20:56:27+00:00</updated><id>https://simonwillison.net/2025/Feb/2/openai-reasoning-models-advice-on-prompting/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://platform.openai.com/docs/guides/reasoning#advice-on-prompting"&gt;OpenAI reasoning models: Advice on prompting&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OpenAI's documentation for their o1 and o3 "reasoning models" includes some interesting tips on how to best prompt them:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Developer messages are the new system messages:&lt;/strong&gt; Starting with &lt;code&gt;o1-2024-12-17&lt;/code&gt;, reasoning models support &lt;code&gt;developer&lt;/code&gt; messages rather than &lt;code&gt;system&lt;/code&gt; messages, to align with the &lt;a href="https://cdn.openai.com/spec/model-spec-2024-05-08.html#follow-the-chain-of-command"&gt;chain of command behavior described in the model spec&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This appears to be a purely aesthetic change made for consistency with their &lt;a href="https://simonwillison.net/2024/Apr/23/the-instruction-hierarchy/"&gt;instruction hierarchy&lt;/a&gt; concept. As far as I can tell the old &lt;code&gt;system&lt;/code&gt; prompts continue to work exactly as before - you're encouraged to use the new &lt;code&gt;developer&lt;/code&gt; message type but it has no impact on what actually happens.&lt;/p&gt;
&lt;p&gt;Since my &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; tool already bakes in a &lt;code&gt;llm --system "system prompt"&lt;/code&gt; option which works across multiple different models from different providers I'm not going to rush to adopt this new language!&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Use delimiters for clarity:&lt;/strong&gt; Use delimiters like markdown, XML tags, and section titles to clearly indicate distinct parts of the input, helping the model interpret different sections appropriately.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Anthropic have been encouraging &lt;a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags"&gt;XML-ish delimiters&lt;/a&gt; for a while (I say -ish because there's no requirement that the resulting prompt is valid XML). My &lt;a href="https://github.com/simonw/files-to-prompt"&gt;files-to-prompt&lt;/a&gt; tool has a &lt;code&gt;-c&lt;/code&gt; option which outputs Claude-style XML, and in my experiments this same option works great with o1 and o3 too:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;git clone https://github.com/tursodatabase/limbo
&lt;span class="pl-c1"&gt;cd&lt;/span&gt; limbo/bindings/python

files-to-prompt &lt;span class="pl-c1"&gt;.&lt;/span&gt; -c &lt;span class="pl-k"&gt;|&lt;/span&gt; llm -m o3-mini \
  -o reasoning_effort high \
  --system &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;Write a detailed README with extensive usage examples&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Limit additional context in retrieval-augmented generation (RAG):&lt;/strong&gt; When providing additional context or documents, include only the most relevant information to prevent the model from overcomplicating its response.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This makes me thing that o1/o3 are not good models to implement RAG on at all - with RAG I like to be able to dump as much extra context into the prompt as possible and leave it to the models to figure out what's relevant.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Try zero shot first, then few shot if needed:&lt;/strong&gt; Reasoning models often don't need few-shot examples to produce good results, so try to write prompts without examples first. If you have more complex requirements for your desired output, it may help to include a few examples of inputs and desired outputs in your prompt. Just ensure that the examples align very closely with your prompt instructions, as discrepancies between the two may produce poor results.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Providing examples remains the single most powerful prompting tip I know, so it's interesting to see advice here to only switch to examples if zero-shot doesn't work out.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Be very specific about your end goal:&lt;/strong&gt; In your instructions, try to give very specific parameters for a successful response, and encourage the model to keep reasoning and iterating until it matches your success criteria.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This makes sense: reasoning models "think" until they reach a conclusion, so making the goal as unambiguous as possible leads to better results.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Markdown formatting:&lt;/strong&gt; Starting with &lt;code&gt;o1-2024-12-17&lt;/code&gt;, reasoning models in the API will avoid generating responses with markdown formatting. To signal to the model when you &lt;strong&gt;do&lt;/strong&gt; want markdown formatting in the response, include the string &lt;code&gt;Formatting re-enabled&lt;/code&gt; on the first line of your &lt;code&gt;developer&lt;/code&gt; message.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This one was a &lt;em&gt;real shock&lt;/em&gt; to me! I noticed that o3-mini was outputting &lt;code&gt;•&lt;/code&gt; characters instead of Markdown &lt;code&gt;*&lt;/code&gt; bullets and initially thought &lt;a href="https://twitter.com/simonw/status/1886121477822648441"&gt;that was a bug&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I first saw this while running this prompt against &lt;a href="https://github.com/tursodatabase/limbo/tree/main/bindings/python"&gt;limbo/bindings/python&lt;/a&gt; using &lt;a href="https://github.com/simonw/files-to-prompt"&gt;files-to-prompt&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;git clone https://github.com/tursodatabase/limbo
&lt;span class="pl-c1"&gt;cd&lt;/span&gt; limbo/bindings/python

files-to-prompt &lt;span class="pl-c1"&gt;.&lt;/span&gt; -c &lt;span class="pl-k"&gt;|&lt;/span&gt; llm -m o3-mini \
  -o reasoning_effort high \
  --system &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;Write a detailed README with extensive usage examples&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here's the &lt;a href="https://gist.github.com/simonw/f8283d68e9bd7ad3f140d52cad6874a7"&gt;full result&lt;/a&gt;, which includes text like this (note the weird bullets):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Features
--------
• High‑performance, in‑process database engine written in Rust  
• SQLite‑compatible SQL interface  
• Standard Python DB‑API 2.0–style connection and cursor objects
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I ran it again with this modified prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Formatting re-enabled. Write a detailed README with extensive usage examples.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And this time got back &lt;a href="https://gist.github.com/simonw/adf64108d65cd5c10ac9fce953ab437e"&gt;proper Markdown, rendered in this Gist&lt;/a&gt;. That did a really good job, and included bulleted lists using this valid Markdown syntax instead:&lt;/p&gt;
&lt;div class="highlight highlight-text-md"&gt;&lt;pre&gt;&lt;span class="pl-v"&gt;-&lt;/span&gt; &lt;span class="pl-s"&gt;**&lt;/span&gt;&lt;span class="pl-s"&gt;`&lt;/span&gt;&lt;span class="pl-c1"&gt;make test&lt;/span&gt;&lt;span class="pl-s"&gt;`&lt;/span&gt;&lt;span class="pl-s"&gt;**&lt;/span&gt;: Run tests using pytest.
&lt;span class="pl-v"&gt;-&lt;/span&gt; &lt;span class="pl-s"&gt;**&lt;/span&gt;&lt;span class="pl-s"&gt;`&lt;/span&gt;&lt;span class="pl-c1"&gt;make lint&lt;/span&gt;&lt;span class="pl-s"&gt;`&lt;/span&gt;&lt;span class="pl-s"&gt;**&lt;/span&gt;: Run linters (via &lt;span class="pl-s"&gt;[&lt;/span&gt;ruff&lt;span class="pl-s"&gt;]&lt;/span&gt;&lt;span class="pl-s"&gt;(&lt;/span&gt;&lt;span class="pl-corl"&gt;https://github.com/astral-sh/ruff&lt;/span&gt;&lt;span class="pl-s"&gt;)&lt;/span&gt;).
&lt;span class="pl-v"&gt;-&lt;/span&gt; &lt;span class="pl-s"&gt;**&lt;/span&gt;&lt;span class="pl-s"&gt;`&lt;/span&gt;&lt;span class="pl-c1"&gt;make check-requirements&lt;/span&gt;&lt;span class="pl-s"&gt;`&lt;/span&gt;&lt;span class="pl-s"&gt;**&lt;/span&gt;: Validate that the &lt;span class="pl-s"&gt;`&lt;/span&gt;&lt;span class="pl-c1"&gt;requirements.txt&lt;/span&gt;&lt;span class="pl-s"&gt;`&lt;/span&gt; files are in sync with &lt;span class="pl-s"&gt;`&lt;/span&gt;&lt;span class="pl-c1"&gt;pyproject.toml&lt;/span&gt;&lt;span class="pl-s"&gt;`&lt;/span&gt;.
&lt;span class="pl-v"&gt;-&lt;/span&gt; &lt;span class="pl-s"&gt;**&lt;/span&gt;&lt;span class="pl-s"&gt;`&lt;/span&gt;&lt;span class="pl-c1"&gt;make compile-requirements&lt;/span&gt;&lt;span class="pl-s"&gt;`&lt;/span&gt;&lt;span class="pl-s"&gt;**&lt;/span&gt;: Compile the &lt;span class="pl-s"&gt;`&lt;/span&gt;&lt;span class="pl-c1"&gt;requirements.txt&lt;/span&gt;&lt;span class="pl-s"&gt;`&lt;/span&gt; files using pip-tools.&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img alt="Py-Limbo. Py-Limbo is a lightweight, in-process, OLTP (Online Transaction Processing) database management system built as a Python extension module on top of Rust. It is designed to be compatible with SQLite in both usage and API, while offering an opportunity to experiment with Rust-backed database functionality. Note: Py-Limbo is a work-in-progress (Alpha stage) project. Some features (e.g. transactions, executemany, fetchmany) are not yet supported. Table of Contents - then a hierarchical nested table of contents." src="https://static.simonwillison.net/static/2025/pylimbo-docs.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;(Using LLMs like this to get me off the ground with under-documented libraries is a trick I use several times a month.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: &lt;a href="https://twitter.com/nikunjhanda/status/1886169547197264226"&gt;OpenAI's Nikunj Handa&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;we agree this is weird! fwiw, it’s a temporary thing we had to do for the existing o-series models. we’ll fix this in future releases so that you can go back to naturally prompting for markdown or no-markdown.&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/harjotsgill/status/1886122316767379540"&gt;@harjotsgill&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/markdown"&gt;markdown&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rag"&gt;rag&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/o1"&gt;o1&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/o3"&gt;o3&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/limbo"&gt;limbo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/files-to-prompt"&gt;files-to-prompt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/system-prompts"&gt;system-prompts&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="markdown"/><category term="ai"/><category term="openai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="llm"/><category term="rag"/><category term="o1"/><category term="llm-reasoning"/><category term="o3"/><category term="limbo"/><category term="files-to-prompt"/><category term="system-prompts"/></entry><entry><title>Introducing Limbo: A complete rewrite of SQLite in Rust</title><link href="https://simonwillison.net/2024/Dec/10/introducing-limbo/#atom-tag" rel="alternate"/><published>2024-12-10T19:25:21+00:00</published><updated>2024-12-10T19:25:21+00:00</updated><id>https://simonwillison.net/2024/Dec/10/introducing-limbo/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://turso.tech/blog/introducing-limbo-a-complete-rewrite-of-sqlite-in-rust"&gt;Introducing Limbo: A complete rewrite of SQLite in Rust&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This looks absurdly ambitious:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Our goal is to build a reimplementation of SQLite from scratch, fully compatible at the language and file format level,  with the same or higher reliability SQLite is known for, but with full memory safety and on a new, modern architecture.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The Turso team behind it have been maintaining their &lt;a href="https://github.com/tursodatabase/libsql"&gt;libSQL&lt;/a&gt; fork for two years now, so they're well equipped to take on a challenge of this magnitude.&lt;/p&gt;
&lt;p&gt;SQLite is justifiably famous for its &lt;a href="https://www.sqlite.org/testing.html"&gt;meticulous approach to testing&lt;/a&gt;. Limbo plans to take an entirely different approach based on "Deterministic Simulation Testing" - a modern technique &lt;a href="https://antithesis.com/blog/is_something_bugging_you/"&gt;pioneered by FoundationDB&lt;/a&gt; and now spearheaded by &lt;a href="https://antithesis.com/"&gt;Antithesis&lt;/a&gt;, the company Turso have been working with on their previous testing projects.&lt;/p&gt;
&lt;p&gt;Another bold claim (emphasis mine):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We have both added DST facilities to the core of the database, and partnered with Antithesis to achieve a level of reliability in the database that lives up to SQLite’s reputation.&lt;/p&gt;
&lt;p&gt;[...] With DST, &lt;strong&gt;we believe we can achieve an even higher degree of robustness than SQLite&lt;/strong&gt;, since it is easier to simulate unlikely scenarios in a simulator, test years of execution with different event orderings, and upon finding issues, reproduce them 100% reliably.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The two most interesting features that Limbo is planning to offer are first-party WASM support and fully asynchronous I/O:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;SQLite itself has a synchronous interface, meaning driver authors who want asynchronous behavior need to have the extra complication of using helper threads. Because SQLite queries tend to be fast, since no network round trips are involved, a lot of those drivers just settle for a synchronous interface. [...]&lt;/p&gt;
&lt;p&gt;Limbo is designed to be asynchronous from the ground up. It extends &lt;code&gt;sqlite3_step&lt;/code&gt;, the main entry point API to SQLite, to be asynchronous, allowing it to return to the caller if data is not ready to consume immediately.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; provides an &lt;a href="https://docs.datasette.io/en/stable/internals.html#await-db-execute-sql"&gt;async API&lt;/a&gt; for executing SQLite queries which is backed by all manner of complex thread management - I would be very interested in a native asyncio Python library for talking to SQLite database files.&lt;/p&gt;
&lt;p&gt;I successfully tried out Limbo's &lt;a href="https://github.com/tursodatabase/limbo/tree/main/bindings/python"&gt;Python bindings&lt;/a&gt; against a demo SQLite test database using &lt;code&gt;uv&lt;/code&gt; like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uv run --with pylimbo python
&amp;gt;&amp;gt;&amp;gt; import limbo
&amp;gt;&amp;gt;&amp;gt; conn = limbo.connect("/tmp/demo.db")
&amp;gt;&amp;gt;&amp;gt; cursor = conn.cursor()
&amp;gt;&amp;gt;&amp;gt; print(cursor.execute("select * from foo").fetchall())
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It crashed when I tried against a more complex SQLite database that included SQLite FTS tables.&lt;/p&gt;
&lt;p&gt;The Python bindings aren't yet documented, so I piped them through &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; and had the new &lt;code&gt;google-exp-1206&lt;/code&gt; model write &lt;a href="https://gist.github.com/simonw/bd1822f372c406d17ed24772f8b93eea"&gt;this initial documentation&lt;/a&gt; for me:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;files-to-prompt limbo/bindings/python -c | llm -m gemini-exp-1206 -s 'write extensive usage documentation in markdown, including realistic usage examples'
&lt;/code&gt;&lt;/pre&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=42378843"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/limbo"&gt;limbo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/files-to-prompt"&gt;files-to-prompt&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="open-source"/><category term="python"/><category term="sqlite"/><category term="rust"/><category term="ai-assisted-programming"/><category term="llm"/><category term="uv"/><category term="limbo"/><category term="files-to-prompt"/></entry><entry><title>Generating documentation from tests using files-to-prompt and LLM</title><link href="https://simonwillison.net/2024/Nov/5/docs-from-tests/#atom-tag" rel="alternate"/><published>2024-11-05T22:37:20+00:00</published><updated>2024-11-05T22:37:20+00:00</updated><id>https://simonwillison.net/2024/Nov/5/docs-from-tests/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://til.simonwillison.net/llms/docs-from-tests"&gt;Generating documentation from tests using files-to-prompt and LLM&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I was experimenting with the &lt;a href="https://github.com/bytecodealliance/wasmtime-py"&gt;wasmtime-py&lt;/a&gt; Python library today (for executing WebAssembly programs from inside CPython) and I found the existing &lt;a href="https://bytecodealliance.github.io/wasmtime-py/"&gt;API docs&lt;/a&gt; didn't quite show me what I wanted to know.&lt;/p&gt;
&lt;p&gt;The project has a &lt;a href="https://github.com/bytecodealliance/wasmtime-py/tree/main/tests"&gt;comprehensive test suite&lt;/a&gt; so I tried seeing if I could generate documentation using that:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cd /tmp
git clone https://github.com/bytecodealliance/wasmtime-py
files-to-prompt -e py wasmtime-py/tests -c | \
  llm -m claude-3.5-sonnet -s \
  'write detailed usage documentation including realistic examples'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;More &lt;a href="https://til.simonwillison.net/llms/docs-from-tests"&gt;notes in my TIL&lt;/a&gt;. You can see the &lt;a href="https://gist.github.com/simonw/351cffbd254af5cbf329377fb95fcc13"&gt;full Claude transcript here&lt;/a&gt; - I think this worked really well!


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-3-5-sonnet"&gt;claude-3-5-sonnet&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/files-to-prompt"&gt;files-to-prompt&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="ai"/><category term="webassembly"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="llm"/><category term="claude"/><category term="claude-3-5-sonnet"/><category term="files-to-prompt"/></entry><entry><title>docs.jina.ai - the Jina meta-prompt</title><link href="https://simonwillison.net/2024/Oct/30/jina-meta-prompt/#atom-tag" rel="alternate"/><published>2024-10-30T17:07:42+00:00</published><updated>2024-10-30T17:07:42+00:00</updated><id>https://simonwillison.net/2024/Oct/30/jina-meta-prompt/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://docs.jina.ai/"&gt;docs.jina.ai - the Jina meta-prompt&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
From &lt;a href="https://twitter.com/jinaai_/status/1851651702635847729"&gt;Jina AI on Twitter&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;curl docs.jina.ai&lt;/code&gt; - This is our &lt;strong&gt;Meta-Prompt&lt;/strong&gt;. It allows LLMs to understand our Reader, Embeddings, Reranker, and Classifier APIs for improved codegen. Using the meta-prompt is straightforward. Just copy the prompt into your preferred LLM interface like ChatGPT, Claude, or whatever works for you, add your instructions, and you're set.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The page is served using content negotiation. If you hit it with &lt;code&gt;curl&lt;/code&gt; you get plain text, but a browser with &lt;code&gt;text/html&lt;/code&gt; in the &lt;code&gt;accept:&lt;/code&gt; header gets an explanation along with a convenient copy to clipboard button.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/jina-docs.jpg" alt="Screenshot of an API documentation page for Jina AI with warning message, access instructions, and code sample. Contains text: Note: This content is specifically designed for LLMs and not intended for human reading. For human-readable content, please visit Jina AI. For LLMs/programmatic access, you can fetch this content directly: curl docs.jina.ai/v2 # or wget docs.jina.ai/v2 # or fetch docs.jina.ai/v2 You only see this as a HTML when you access docs.jina.ai via browser. If you access it via code/program, you will get a text/plain response as below. You are an AI engineer designed to help users use Jina AI Search Foundation API's for their specific use case. # Core principles..." style="max-width:90%;" class="blogmark-image"&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jina"&gt;jina&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="llm"/><category term="jina"/></entry><entry><title>simonw/docs cookiecutter template</title><link href="https://simonwillison.net/2024/Sep/23/docs-cookiecutter/#atom-tag" rel="alternate"/><published>2024-09-23T21:45:15+00:00</published><updated>2024-09-23T21:45:15+00:00</updated><id>https://simonwillison.net/2024/Sep/23/docs-cookiecutter/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/docs"&gt;simonw/docs cookiecutter template&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Over the last few years I’ve settled on the combination of &lt;a href="https://www.sphinx-doc.org/"&gt;Sphinx&lt;/a&gt;, the &lt;a href="https://github.com/pradyunsg/furo"&gt;Furo&lt;/a&gt; theme and the &lt;a href="https://myst-parser.readthedocs.io/en/latest/"&gt;myst-parser&lt;/a&gt; extension (enabling Markdown in place of reStructuredText) as my documentation toolkit of choice, maintained in GitHub and hosted using &lt;a href="https://about.readthedocs.com/"&gt;ReadTheDocs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; and &lt;a href="https://shot-scraper.datasette.io/"&gt;shot-scraper&lt;/a&gt; projects are two examples of that stack in action.&lt;/p&gt;
&lt;p&gt;Today I wanted to spin up a new documentation site so I finally took the time to construct a &lt;a href="https://cookiecutter.readthedocs.io/"&gt;cookiecutter&lt;/a&gt; template for my preferred configuration. You can use it like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pipx install cookiecutter
cookiecutter gh:simonw/docs
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Or with &lt;a href="https://docs.astral.sh/uv/"&gt;uv&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uv tool run cookiecutter gh:simonw/docs
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Answer a few questions:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[1/3] project (): shot-scraper
[2/3] author (): Simon Willison
[3/3] docs_directory (docs):
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And it creates a &lt;code&gt;docs/&lt;/code&gt; directory ready for you to start editing docs:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cd docs
pip install -r requirements.txt
make livehtml
&lt;/code&gt;&lt;/pre&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/markdown"&gt;markdown&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cookiecutter"&gt;cookiecutter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sphinx-docs"&gt;sphinx-docs&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/read-the-docs"&gt;read-the-docs&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="projects"/><category term="python"/><category term="markdown"/><category term="cookiecutter"/><category term="sphinx-docs"/><category term="read-the-docs"/><category term="uv"/></entry><entry><title>Quoting Nicolas Bouliane</title><link href="https://simonwillison.net/2024/Jan/5/nicolas-bouliane/#atom-tag" rel="alternate"/><published>2024-01-05T22:32:16+00:00</published><updated>2024-01-05T22:32:16+00:00</updated><id>https://simonwillison.net/2024/Jan/5/nicolas-bouliane/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://nicolasbouliane.com/blog/duty-to-document"&gt;&lt;p&gt;If you learn something the hard way, share your findings with others. You have blazed a new trail; now you must mark it for your fellow travellers. Sharing knowledge is an unreasonably effective way of helping others.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://nicolasbouliane.com/blog/duty-to-document"&gt;Nicolas Bouliane&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/></entry><entry><title>PostgreSQL Lock Conflicts</title><link href="https://simonwillison.net/2023/Aug/23/postgresql-lock-conflicts/#atom-tag" rel="alternate"/><published>2023-08-23T03:08:54+00:00</published><updated>2023-08-23T03:08:54+00:00</updated><id>https://simonwillison.net/2023/Aug/23/postgresql-lock-conflicts/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://pglocks.org/"&gt;PostgreSQL Lock Conflicts&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I absolutely love how extremely specific and niche this documentation site is. It details every single lock that PostgreSQL implements, and shows exactly which commands acquire that lock. That’s everything. I can imagine this becoming absurdly useful at extremely infrequent intervals for advanced PostgreSQL work.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=37229030"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="postgresql"/></entry><entry><title>Coping strategies for the serial project hoarder</title><link href="https://simonwillison.net/2022/Nov/26/productivity/#atom-tag" rel="alternate"/><published>2022-11-26T15:47:02+00:00</published><updated>2022-11-26T15:47:02+00:00</updated><id>https://simonwillison.net/2022/Nov/26/productivity/#atom-tag</id><summary type="html">
    &lt;p&gt;I gave a talk at DjangoCon US 2022 in San Diego last month about productivity on personal projects, titled "Massively increase your productivity on personal projects with comprehensive documentation and automated tests".&lt;/p&gt;
&lt;p&gt;The alternative title for the talk was &lt;em&gt;Coping strategies for the serial project hoarder&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;I'm maintaining a &lt;em&gt;lot&lt;/em&gt; of different projects at the moment. Somewhat unintuitively, the way I'm handling this is by scaling down techniques that I've seen working for large engineering teams spread out across multiple continents.&lt;/p&gt;
&lt;p&gt;The key trick is to ensure that every project has comprehensive documentation and automated tests. This scales my productivity horizontally, by freeing me up from needing to remember all of the details of all of the different projects I'm working on at the same time.&lt;/p&gt;
&lt;p&gt;You can watch the talk &lt;a href="https://www.youtube.com/watch?v=GLkRK2rJGB0"&gt;on YouTube&lt;/a&gt; (25 minutes). Alternatively, I've included a detailed annotated version of the slides and notes below.&lt;/p&gt;
&lt;div class="resp-container"&gt;
  &lt;iframe allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="allowfullscreen" frameborder="0" height="315" src="https://www.youtube-nocookie.com/embed/GLkRK2rJGB0" width="560"&gt; &lt;/iframe&gt;
&lt;/div&gt;
&lt;!-- cutoff --&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.001.jpeg" alt="Title slide: Massively increase your productivity on personal projects with comprehensive documentation and automated tests - Simon Willison, DjangoCon US 2022" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;This was the title I originally submitted to the conference. But I realized a better title was probably...&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.003.jpeg" alt="Same title slide, but the title has been replaced" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Coping strategies for the serial project hoarder&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.004.jpeg" alt="A static frame from a video: a monkey sits on some steps stuffing itself with several pastries. In the longer video the monkey is handed more and more pastries and can't resist trying to hold and eat all of them at once, no matter how many it receives." style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;&lt;a href="https://twitter.com/devisridhar/status/1576170527882121217"&gt;This video&lt;/a&gt; is a neat representation of my approach to personal projects: I always have a few on the go, but I can never resist the temptation to add even more.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.005.jpeg" alt="A screenshot of my profile on PyPI - my join date is Oct 26, 2017 and I have 185 pojects listed." style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;&lt;a href="https://pypi.org/user/simonw/"&gt;My PyPI profile&lt;/a&gt; (which is only five years old) lists 185 Python packages that I've released. Technically I'm actively maintaining all of them, in that if someone reports a bug I'll push out a fix. Many of them receive new releases at least once a year.&lt;/p&gt;
&lt;p&gt;Aside: I took this screenshot using &lt;a href="https://shot-scraper.datasette.io/"&gt;shot-scraper&lt;/a&gt; with a little bit of extra JavaScript to hide a notification bar at the top of the page:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;shot-scraper &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;https://pypi.org/user/simonw/&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; \
--javascript &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;    document.body.style.paddingTop = 0;&lt;/span&gt;
&lt;span class="pl-s"&gt;    document.querySelector(&lt;/span&gt;
&lt;span class="pl-s"&gt;        '#sticky-notifications'&lt;/span&gt;
&lt;span class="pl-s"&gt;    ).style.display = 'none';&lt;/span&gt;
&lt;span class="pl-s"&gt;  &lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; --height 1000&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.006.jpeg" alt="A map of the world with the Eventbrite logo overlaid on it. There are pins on San Francisco, Nashville, Mendoza and Madrid." style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;How can one individual maintain 185 projects?&lt;/p&gt;
&lt;p&gt;Surprisingly, I'm using techniques that I've scaled down from working at a company with hundreds of engineers.&lt;/p&gt;
&lt;p&gt;I spent seven years at Eventbrite, during which time the engineering team grew to span three different continents. We had major engineering centers in San Francisco, Nashville, Mendoza in Argentina and Madrid in Spain.&lt;/p&gt;
&lt;p&gt;Consider timezones: engineers in Madrid and engineers in San Francisco had almost no overlap in their working hours. Good asynchronous communication was essential.&lt;/p&gt;
&lt;p&gt;Over time, I noticed that the teams that were most effective at this scale were the teams that had a strong culture of documentation and automated testing.&lt;/p&gt;
&lt;p&gt;As I started to work on my own array of smaller personal projects, I found that the same discipline that worked for large teams somehow sped me up, when intuitively I would have expected it to slow me down.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.007.jpeg" alt="The perfect commit: Implementation + tests + documentation and a link to an issue thread" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;I wrote an extended description of this in &lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/"&gt;The Perfect Commit&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I've started structuring the majority of my work in terms of what I think of as "the perfect commit" - a commit that combines implementation, tests, documentation and a link to an issue thread.&lt;/p&gt;
&lt;p&gt;As software engineers, it's important to note that our job generally isn't to write new software: it's to make changes to existing software.&lt;/p&gt;
&lt;p&gt;As such, the commit is our unit of work. It's worth us paying attention to how we can make our commits as useful as possible.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.008.jpeg" alt="Screenshot of a commit on GitHub: the title is Async support for prepare_jinja2_environment, closes #1809" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/simonw/datasette/commit/ddc999ad1296e8c69cffede3e367dda059b8adad"&gt;a recent example&lt;/a&gt; from one of my projects, Datasette.&lt;/p&gt;
&lt;p&gt;It's a single commit which bundles together the implementation, some related documentation improvements and the tests that show it works. And it links back to an issue thread from the commit message.&lt;/p&gt;
&lt;p&gt;Let's talk about each component in turn.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.009.jpeg" alt="Implementation: it should just do one thing (thing here is deliberately vague)" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;There's not much to be said about the implementation: your commit should change something!&lt;/p&gt;
&lt;p&gt;It should only change one thing, but what that actually means varies on a case by case basis.&lt;/p&gt;
&lt;p&gt;It should be a single change that can be documented, tested and explained independently of other changes.&lt;/p&gt;
&lt;p&gt;(Being able to cleanly revert it is a useful property too.)&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.010.jpeg" alt="Tests: prove that the implementation works. Pass if the new implementation is correct, fail otherwise." style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;The goals of the tests that accompany a commit are to prove that the new implementation works.&lt;/p&gt;
&lt;p&gt;If you apply the implementation the new tests should pass. If you revert it the tests should fail.&lt;/p&gt;
&lt;p&gt;I often use &lt;code&gt;git stash&lt;/code&gt; to try this out.&lt;/p&gt;
&lt;p&gt;If you tell people they need to write tests for &lt;em&gt;every single change&lt;/em&gt; they'll often push back that this is too much of a burden, and will harm their productivity.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.011.jpeg" alt="Every project should start with a test. assert 1 + 1 == 2 is fine! Adding tests to an existing test suite is SO MUCH less work than starting a new test suite from scratch." style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;But I find that the incremental cost of adding a test to an existing test suite keeps getting lower over time.&lt;/p&gt;
&lt;p&gt;The hard bit of testing is getting a testing framework setup in the first place - with a test runner, and fixtures, and objects under test and suchlike.&lt;/p&gt;
&lt;p&gt;Once that's in place, adding new tests becomes really easy.&lt;/p&gt;
&lt;p&gt;So my personal rule is that every new project starts with a test. It doesn't really matter what that test does - what matters is that you can run &lt;code&gt;pytest&lt;/code&gt; to run the tests, and you have an obvious place to start building more of them.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.012.jpeg" alt="Cookiecutter repo templates: simonw/python-lib, simonw/click-app, simonw/datasette-plugin" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;I maintain three &lt;a href="https://cookiecutter.readthedocs.io/"&gt;cookiecutter&lt;/a&gt; templates to help with this, for the three kinds of projects I most frequently create:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/python-lib"&gt;simonw/python-lib&lt;/a&gt; for Python libraries&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/click-app"&gt;simonw/click-app&lt;/a&gt; for command line tools&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette-plugin"&gt;simonw/datasette-plugin&lt;/a&gt; for Datasette plugins&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each of these templates creates a project with a &lt;code&gt;setup.py&lt;/code&gt; file, a README, a test suite and GitHub Actions workflows to run those tests and ship tagged releases to PyPI.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.013.jpeg" alt="Screenshot of the GitHub page to create a new repsoitory from python-lib-template-repository, which asks for a repository name, a description string and if the new repo should be public or private." style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;I have a trick for running &lt;code&gt;cookiecutter&lt;/code&gt; as part of creating a brand new repository on GitHub. I described that in &lt;a href="https://simonwillison.net/2021/Aug/28/dynamic-github-repository-templates/"&gt;Dynamic content for GitHub repository templates using cookiecutter and GitHub Actions&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.014.jpeg" alt="Documentation: Same repository as the code! Document changes that impact external developers. Update the docs in the same commit as the change. Catch missing documentation updates in PR / code review" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;This is a hill that I will die on: your documentation must live in the same repository as your code!&lt;/p&gt;
&lt;p&gt;You often see projects keep their documentation somewhere else, like in a wiki.&lt;/p&gt;
&lt;p&gt;Inevitably it goes out of date. And my experience is that if your documentation is out of date people will lose trust in it, which means they'll stop reading it and stop contributing to it.&lt;/p&gt;
&lt;p&gt;The gold standard of documentation has to be that it's reliably up to date with the code.&lt;/p&gt;
&lt;p&gt;The only way you can do that is if the documentation and code are in the same repository.&lt;/p&gt;
&lt;p&gt;This gives you versioned snapshots of the documentation that exactly match the code at that time.&lt;/p&gt;
&lt;p&gt;More importantly, it means you can enforce it through code review. You can say in a PR "this is great, but don't forget to update this paragraph on this page of the documentation to reflect the change you're making".&lt;/p&gt;
&lt;p&gt;If you do this you can finally get documentation that people learn to trust over time.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.015.jpeg" alt="Bonus trick: documentation unit tests" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Another trick I like to use is something I call documentation unit tests.&lt;/p&gt;
&lt;p&gt;The idea here is to use unit tests to enforce that concepts introspected from your code are at least mentioned in your documentation.&lt;/p&gt;
&lt;p&gt;I wrote more about that in &lt;a href="https://simonwillison.net/2018/Jul/28/documentation-unit-tests/"&gt;Documentation unit tests&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.016.jpeg" alt="Screenshot showing pytest running 26 passing tests, each with a name like test_plugin_hook_are_documented[filters_from_request]" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Here's an example. Datasette has &lt;a href="https://github.com/simonw/datasette/blob/0.63.1/tests/test_docs.py#L41-L53"&gt;a test&lt;/a&gt; that scans through each of the Datasette plugin hooks and checks that there is a heading for each one in the documentation.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.017.jpeg" alt="Screenshot of the code linked to above" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;The test itself is pretty simple: it uses &lt;code&gt;pytest&lt;/code&gt; parametrization to look through every introspected plugin hook name, and for each one checks that it has a matching heading in the documentation.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="issue-thread"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.018.jpeg" alt="Everything links to an issue thread" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;The final component of my perfect commit is this: every commit must link to an issue thread.&lt;/p&gt;
&lt;p&gt;I'll usually have these open in advance but  sometimes I'll open an issue thread just so I can close it with a commit a few seconds later!&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.019.jpeg" alt="A screenshot of the issue titled prepare_jinja_enviroment() hook should take datasette argument - it has 11 comments" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/simonw/datasette/issues/1809"&gt;the issue&lt;/a&gt; for the commit I showed earlier. It has 11 comments, and every single one of those comments is by me.&lt;/p&gt;
&lt;p&gt;I have literally thousands of issues on GitHub that look like this: issue threads that are effectively me talking to myself about the changes that I'm making.&lt;/p&gt;
&lt;p&gt;It turns out this a fantastic form of additional documentation.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.020.jpeg" alt="What goes in an issue?" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;What goes in an issue?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Background: the reasons for the change. In six months time you'll want to know why you did this.&lt;/li&gt;
&lt;li&gt;State of play before-hand: embed existing code, link to existing docs. I like to start my issues with "I'm going to change this code right here" - that way if I come back the next day I don't have to repeat that little piece of research.&lt;/li&gt;
&lt;li&gt;Links to things! Documentation, inspiration, clues found on StackOverflow. The idea is to capture all of the loose information floating around that topic.&lt;/li&gt;
&lt;li&gt;Code snippets illustrating potential designs and false-starts.&lt;/li&gt;
&lt;li&gt;Decisions. What did you consider? What did you decide? As programmers we make decisions constantly, all day, about everything. That work doesn't have to be invisible. Writing them down also avoids having to re-litigate them several months later when you've forgotten your original  reasoning.&lt;/li&gt;
&lt;li&gt;Screenshots - of everything! Animated screenshots even better. I even take screenshots of things like the AWS console to remind me what I did there.&lt;/li&gt;
&lt;li&gt;When you close it: a link to the updated documentation and demo&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.021.jpeg" alt="Temporal documentation. It's timestamped and contextual. You don't need to commit to keeping it up-to-date in the future (but you can add more comments if you like)" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;The reason I love issues is that they're a form of documentation that I think of as &lt;em&gt;temporal documentation&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Regular documentation comes with a big commitment: you have to keep it up to date in the future.&lt;/p&gt;
&lt;p&gt;Issue comments skip that commitment entirely. They're displayed with a timestamp, in the context of the work you were doing at the time.&lt;/p&gt;
&lt;p&gt;No-one will be upset or confused if you fail to keep them updated to match future changes.&lt;/p&gt;
&lt;p&gt;So it's a commitment-free form of documentation, which I for one find incredibly liberating.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.022.jpeg" alt="Issue driven development" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;I think of this approach as &lt;em&gt;issue driven development&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Everything you are doing is issue-first, and from that you drive the rest of the development process.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.023.jpeg" alt="Don't remember anything: you can go back to a project in six months and pick up right where you left off" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;This is how it relates back to maintaining 185 projects at the same time.&lt;/p&gt;
&lt;p&gt;With issue driven development you &lt;em&gt;don't have to remember anything&lt;/em&gt; about any of these projects at all.&lt;/p&gt;
&lt;p&gt;I've had issues where I did a bunch of design work in issue comments, then dropped it, then came back 12 months later and implemented that design - without having to rethink it.&lt;/p&gt;
&lt;p&gt;I've had projects where I forgot that the project existed entirely! But I've found it again, and there's been an open issue, and I've been able to pick up work again.&lt;/p&gt;
&lt;p&gt;It's a way of working where you treat it like every project is going to be maintained by someone else, and it's the classic cliche here that the somebody else is you in the future.&lt;/p&gt;
&lt;p&gt;It horizontally scales you and lets you tackle way more interesting problems.&lt;/p&gt;
&lt;p&gt;Programmers always complain when you interrupt them - there's this idea of "flow state" and that interrupting a programmer for a moment costs them half an hour in getting back up to speed.&lt;/p&gt;
&lt;p&gt;This fixes that! It's much easier to get back to what you are doing if you have an issue thread that records where you've got to.&lt;/p&gt;
&lt;p&gt;Issue driven development is my key productivity hack for taking on much more ambitious projects in much larger quantities.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.024.jpeg" alt="Laboratory notebooks - and a picture of a page from one by Leonardo da Vinci" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Another way to think about this is to compare it to laboratory notebooks.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://en.wikipedia.org/wiki/Studies_of_the_Fetus_in_the_Womb"&gt;a page&lt;/a&gt; from one by Leonardo da Vinci.&lt;/p&gt;
&lt;p&gt;Great scientists and great engineers have always kept detailed notes.&lt;/p&gt;
&lt;p&gt;We can use GitHub issues as a really quick and easy way to do the same thing!&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.025.jpeg" alt="Issue: Figure out how to deploy Datasette to AWS lambda using function URLs and Mangum" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Another thing I like to use these for is deep research tasks.&lt;/p&gt;
&lt;p&gt;Here's an example, from when I was trying to figure out how to run my Python web application in an AWS Lambda function:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/public-notes/issues/6"&gt;Figure out how to deploy Datasette to AWS Lambda using function URLs and Mangum&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This took me 65 comments over the course of a few days... but by the end of that thread I'd figured out how to do it!&lt;/p&gt;
&lt;p&gt;Here's the follow-up, with another 77 comments, in which I &lt;a href="https://github.com/simonw/public-notes/issues/1"&gt;figure out how to serve an AWS Lambda function with a Function URL from a custom subdomain&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I will never have to figure this out ever again! That's a huge win.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.026.jpeg" alt="simonw/public-notes/issues" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/public-notes"&gt;https://github.com/simonw/public-notes&lt;/a&gt; is a public repository where I keep some of these issue threads, transferred from my private notes repos &lt;a href="https://til.simonwillison.net/github/transfer-issue-private-to-public"&gt;using this trick&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.027.jpeg" alt="Tell people what you did! (It's so easy to skip this step)" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;The last thing I want to encourage you to do is this: if you do project, tell people what it is you did!&lt;/p&gt;
&lt;p&gt;This counts for both personal and work projects. It's so easy to skip this step.&lt;/p&gt;
&lt;p&gt;Once you've shipped a feature or built a project, it's so tempting to skip the step of spending half an hour or more writing about the work you have done.&lt;/p&gt;
&lt;p&gt;But you are missing out on &lt;em&gt;so much&lt;/em&gt; of the value of your work if you don't give other people a chance to understand what you did.&lt;/p&gt;
&lt;p&gt;I wrote more about this here: &lt;a href="https://simonwillison.net/2022/Nov/6/what-to-blog-about/"&gt;What to blog about&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.028.jpeg" alt="Release notes (with dates)" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;For projects with releases, release notes are a really good way to do this.&lt;/p&gt;
&lt;p&gt;I like using GitHub releases for this - they're quick and easy to write, and I have automation setup for my projects such that creating release notes in GitHub triggers a build and release to PyPI.&lt;/p&gt;
&lt;p&gt;I've done over 1,000 releases in this way. Having them automated is crucial, and having automation makes it really easy to ship releases more often.&lt;/p&gt;
&lt;p&gt;Please make sure your release notes have dates on them. I need to know when your change went out, because if it's only a week old it's unlikely people will have upgraded to it yet, whereas a change from five years ago is probably safe to depend on.&lt;/p&gt;
&lt;p&gt;I wrote more about &lt;a href="https://simonwillison.net/2022/Jan/31/release-notes/"&gt;writing better release notes&lt;/a&gt; here.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.029.jpeg" alt="Expand your definition of done to include writing about what you did" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;This is a mental trick which works really well for me. "No project of mine is finished until I've told people about it in some way" is a really useful habit to form.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.030.jpeg" alt="Twitter threads (embed images + links + videos)" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Twitter threads are (or were) a great low-effort way to write about a project. Build a quick thread with some links and images, and maybe even a video.&lt;/p&gt;
&lt;p&gt;Get a little unit about your project out into the world, and then you can stop thinking about it.&lt;/p&gt;
&lt;p&gt;(I'm trying to do this &lt;a href="https://simonwillison.net/2022/Nov/5/mastodon/"&gt;on Mastodon now&lt;/a&gt; instead.)&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.031.jpeg" alt="Get a blog" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;Even better: get a blog! Having your own corner of the internet to write about the work that you are doing is a small investment that will pay off many times over.&lt;/p&gt;
&lt;p&gt;("Nobody blogs anymore" I said in the talk... Phil Gyford disagrees with that meme so much that he launched &lt;a href="https://ooh.directory/blog/2022/welcome/"&gt;a new blog directory&lt;/a&gt; to show how wrong it is.)&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.032.jpeg" alt="GUILT is the enemy of projects" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;The enemy of projects, especially personal projects, is &lt;em&gt;guilt&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The more projects you have, the more guilty you feel about working on any one of them - because you're not working on the others, and those projects haven't yet achieved their goals.&lt;/p&gt;
&lt;p&gt;You have to overcome guilt if you're going to work on 185 projects at once!&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="avoid-user-accounts"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.033.jpeg" alt="Avoid side projects with user accounts. If i has user accounts it's not a side-project, it's an unpaid job." style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;This is the most important tip: avoid side projects with user accounts.&lt;/p&gt;
&lt;p&gt;If you build something that people can sign into, that's not a side-project, it's an unpaid job. It's a very big responsibility, avoid at all costs!&lt;/p&gt;
&lt;p&gt;Almost all of my projects right now are open source things that people can run on their own machines, because that's about as far away from user accounts as I can get.&lt;/p&gt;
&lt;p&gt;I still have a responsibility for shipping security updates and things like that, but at least I'm not holding onto other people's data for them.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.034.jpeg" alt="If your project is tested and documented, you have nothing to feel guilty about. That's what I tell myself anyway!" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;I feel like if your project is tested and documented, &lt;em&gt;you have nothing to feel guilty about&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;You have put a thing out into the world, and it has tests to show that it works, and it has documentation that explains what it is.&lt;/p&gt;
&lt;p&gt;This means I can step back and say that it's OK for me to work on other things. That thing there is a unit that makes sense to people.&lt;/p&gt;
&lt;p&gt;That's what I tell myself anyway! It's OK to have 185 projects provided they all have documentation and they all have tests.&lt;/p&gt;
&lt;p&gt;Do that and the guilt just disappears. You can live guilt free!&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide"&gt;
&lt;img loading="lazy" src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.035.jpeg" alt="Thank you - simonwillison.net - twitter.com/simonw / github.com/simonw" style="max-width: 100%;" width="450" height="253" /&gt;&lt;div&gt;
&lt;p&gt;You can follow me on Mastodon at &lt;a href="https://fedi.simonwillison.net/@simon"&gt;@simon@simonwillison.net&lt;/a&gt; or on GitHub at &lt;a href="https://github.com/simonw"&gt;github.com/simonw&lt;/a&gt;. Or subscribe to my blog at &lt;a href="https://simonwillison.net/"&gt;simonwillison.net&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;From the Q&amp;amp;A:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You've tweeted about using GitHub Projects. Could you talk about that?
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.github.com/en/issues/planning-and-tracking-with-projects/learning-about-projects/about-projects"&gt;GitHub Projects V2&lt;/a&gt; is the perfect TODO list for me, because it lets me bring together issues from different repositories. I use a project called "Everything" on a daily basis (it's my browser default window) - I add issues to it that I plan to work on, including personal TODO list items as well as issues from my various public and private repositories. It's kind of like a cross between Trello and Airtable and I absolutely love it.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;How did you move notes from the private to the public repo?
&lt;ul&gt;
&lt;li&gt;GitHub doesn't let you do this. But there's a trick I use involving a &lt;code&gt;temp&lt;/code&gt; repo which I switch between public and private to help transfer notes. More in &lt;a href="https://til.simonwillison.net/github/transfer-issue-private-to-public"&gt;this TIL&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Question about the perfect commit: do you commit your failing tests?
&lt;ul&gt;
&lt;li&gt;I don't: I try to keep the commits that land on my &lt;code&gt;main&lt;/code&gt; branch always passing. I'll sometimes write the failing test before the implementation and  then commit them together. For larger projects I'll work in a branch and then squash-merge the final result into a perfect commit to main later on.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;/div&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/djangocon"&gt;djangocon&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/productivity"&gt;productivity&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/my-talks"&gt;my-talks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/testing"&gt;testing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/annotated-talks"&gt;annotated-talks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-issues"&gt;github-issues&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="djangocon"/><category term="documentation"/><category term="productivity"/><category term="my-talks"/><category term="testing"/><category term="annotated-talks"/><category term="github-issues"/></entry><entry><title>The Perfect Commit</title><link href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#atom-tag" rel="alternate"/><published>2022-10-29T20:41:01+00:00</published><updated>2022-10-29T20:41:01+00:00</updated><id>https://simonwillison.net/2022/Oct/29/the-perfect-commit/#atom-tag</id><summary type="html">
    &lt;p&gt;For the last few years I've been trying to center my work around creating what I consider to be the &lt;em&gt;Perfect Commit&lt;/em&gt;. This is a single commit that contains all of the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;implementation&lt;/strong&gt;: a single, focused change&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tests&lt;/strong&gt; that demonstrate the implementation works&lt;/li&gt;
&lt;li&gt;Updated &lt;strong&gt;documentation&lt;/strong&gt; reflecting the change&lt;/li&gt;
&lt;li&gt;A link to an &lt;strong&gt;issue thread&lt;/strong&gt; providing further context&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Our job as software engineers generally isn't to write new software from scratch: we spend the majority of our time adding features and fixing bugs in existing software.&lt;/p&gt;
&lt;p&gt;The commit is our principle unit of work. It deserves to be treated thoughtfully and with care.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Update 26th November 2022&lt;/strong&gt;: My 25 minute talk &lt;a href="https://simonwillison.net/2022/Nov/26/productivity/"&gt;Massively increase your productivity on personal projects with comprehensive documentation and automated tests&lt;/a&gt; describes this approach to software development in detail.&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#implementation"&gt;Implementation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#tests"&gt;Tests&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#documentation"&gt;Documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#link-to-an-issue"&gt;A link to an issue&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#issue-over-commit-message"&gt;An issue is more valuable than a commit message&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#not-all-perfect"&gt;Not every commit needs to be "perfect"&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#scrappy-branches"&gt;Write scrappy commits in a branch&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/29/the-perfect-commit/#examples"&gt;Some examples&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="implementation"&gt;Implementation&lt;/h4&gt;
&lt;p&gt;Each commit should change a single thing.&lt;/p&gt;
&lt;p&gt;The definition of "thing" here is left deliberately vague!&lt;/p&gt;
&lt;p&gt;The goal is have something that can be easily reviewed, and that can be clearly understood in the future when revisited using tools like &lt;code&gt;git blame&lt;/code&gt; or &lt;a href="https://til.simonwillison.net/git/git-bisect"&gt;git bisect&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I like to keep my commit history linear, as I find that makes it much easier to comprehend later. This further reinforces the value of each commit being a single, focused change.&lt;/p&gt;
&lt;p&gt;Atomic commits are also much easier to cleanly revert if something goes wrong - or to cherry-pick into other branches.&lt;/p&gt;
&lt;p&gt;For things like web applications that can be deployed to production, a commit should be a unit that can be deployed. Aiming to keep the main branch in a deployable state is a good rule of thumb for deciding if a commit is a sensible atomic change or not.&lt;/p&gt;
&lt;h4 id="tests"&gt;Tests&lt;/h4&gt;
&lt;p&gt;The ultimate goal of tests is to &lt;em&gt;increase&lt;/em&gt; your productivity. If your testing practices are slowing you down, you should consider ways to improve them.&lt;/p&gt;
&lt;p&gt;In the longer term, this productivity improvement comes from gaining the freedom to make changes and stay confident that your change hasn't broken something else.&lt;/p&gt;
&lt;p&gt;But tests can help increase productivity in the immediate short term as well.&lt;/p&gt;
&lt;p&gt;How do you know when the change you have made is finished and ready to commit? It's ready when the new tests pass.&lt;/p&gt;
&lt;p&gt;I find this reduces the time I spend second-guessing myself and questioning whether I've done enough and thought through all of the edge cases.&lt;/p&gt;
&lt;p&gt;Without tests, there's a very strong possibility that your change will have broken some other, potentially unrelated feature. Your commit could be held up by hours of tedious manual testing. Or you could &lt;abbr title="You Only Live Once"&gt;YOLO&lt;/abbr&gt; it and learn that you broke something important later!&lt;/p&gt;
&lt;p&gt;Writing tests becomes far less time consuming if you already have good testing practices in place.&lt;/p&gt;
&lt;p&gt;Adding a new test to a project with a lot of existing tests is easy: you can often find an existing test that has 90% of the pattern you need already worked out for you.&lt;/p&gt;
&lt;p&gt;If your project has no tests at all, adding a test for your change will be a lot more work.&lt;/p&gt;
&lt;p&gt;This is why I start every single one of my projects with a passing test. It doesn't matter what this test is - &lt;code&gt;assert 1 + 1 == 2&lt;/code&gt; is fine! The key thing is to get a testing framework in place, such that you can run a command (for me that's usually &lt;code&gt;pytest&lt;/code&gt;) to execute the test suite - and you have an obvious place to add new tests in the future.&lt;/p&gt;
&lt;p&gt;I use &lt;a href="https://simonwillison.net/2021/Aug/28/dynamic-github-repository-templates/"&gt;these cookiecutter templates&lt;/a&gt; for almost all of my new projects. They configure a testing framework with a single passing test and GitHub Actions workflows to exercise it all from the very start.&lt;/p&gt;
&lt;p&gt;I'm not a huge advocate of test-first development, where tests are written before the code itself. What I care about is tests-included development, where the final commit bundles the tests and the implementation together. I wrote more about my approach to testing in &lt;a href="https://simonwillison.net/2020/Feb/11/cheating-at-unit-tests-pytest-black/"&gt;How to cheat at unit tests with pytest and Black&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="documentation"&gt;Documentation&lt;/h4&gt;
&lt;p&gt;If your project defines APIs that are meant to be used outside of your project, they need to be documented. In my work these projects are usually one of the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Python APIs (modules, functions and classes) that provide code designed to be imported into other projects.&lt;/li&gt;
&lt;li&gt;Web APIs - usually JSON over HTTP these days - that provide functionality to be consumed by other applications.&lt;/li&gt;
&lt;li&gt;Command line interface tools, such as those implemented using &lt;a href="https://click.palletsprojects.com/"&gt;Click&lt;/a&gt; or &lt;a href="https://typer.tiangolo.com/"&gt;Typer&lt;/a&gt; or &lt;a href="https://docs.python.org/3/library/argparse.html"&gt;argparse&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is critical that this documentation &lt;strong&gt;must live in the same repository as the code itself&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This is important for a number of reasons.&lt;/p&gt;
&lt;p&gt;Documentation is only valuable &lt;strong&gt;if people trust it&lt;/strong&gt;. People will only trust it if they know that it is kept up to date.&lt;/p&gt;
&lt;p&gt;If your docs live in a separate wiki somewhere it's easy for them to get out of date - but more importantly it's hard for anyone to quickly confirm if the documentation is being updated in sync with the code or not.&lt;/p&gt;
&lt;p&gt;Documentation should be &lt;strong&gt;versioned&lt;/strong&gt;. People need to be able to find the docs for the specific version of your software that they are using. Keeping it in the same repository as the code gives you synchronized versioning for free.&lt;/p&gt;
&lt;p&gt;Documentation changes should be &lt;strong&gt;reviewed&lt;/strong&gt; in the same way as your code. If they live in the same repository you can catch changes that need to be reflected in the documentation as part of your code review process.&lt;/p&gt;
&lt;p&gt;And ideally, documentation should be &lt;strong&gt;tested&lt;/strong&gt;. I wrote about my approach to doing this using &lt;a href="https://simonwillison.net/2018/Jul/28/documentation-unit-tests/"&gt;Documentation unit tests&lt;/a&gt;. Executing example code in the documentation using a testing framework is a great idea too.&lt;/p&gt;
&lt;p&gt;As with tests, writing documentation from scratch is much more work than incrementally modifying existing documentation.&lt;/p&gt;
&lt;p&gt;Many of my commits include documentation that is just a sentence or two. This doesn't take very long to write, but it adds up to something very comprehensive over time.&lt;/p&gt;
&lt;p&gt;How about end-user facing documentation? I'm still figuring that out myself. I created my &lt;a href="https://simonwillison.net/2022/Mar/10/shot-scraper/"&gt;shot-scraper tool&lt;/a&gt; to help automate the process of keeping screenshots up-to-date, but I've not yet found personal habits and styles for end-user documentation that I'm confident in.&lt;/p&gt;
&lt;h4 id="link-to-an-issue"&gt;A link to an issue&lt;/h4&gt;
&lt;p&gt;Every perfect commit should include a link to an issue thread that accompanies that change.&lt;/p&gt;
&lt;p&gt;Sometimes I'll even open an issue seconds before writing the commit message, just to give myself something I can link to from the commit itself!&lt;/p&gt;
&lt;p&gt;The reason I like issue threads is that they provide effectively unlimited space for commentary and background for the change that is being made.&lt;/p&gt;
&lt;p&gt;Most of my issue threads are me talking to myself - sometimes with dozens of issue comments, all written by me.&lt;/p&gt;
&lt;p&gt;Things that can go in an issue thread include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Background&lt;/strong&gt;: the reason for the change. I try to include this in the opening comment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State of play&lt;/strong&gt; before the change. I'll often link to the current version of the code and documentation. This is great for if I return to an open issue a few days later, as it saves me from having to repeat that initial research.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Links to things&lt;/strong&gt;. So many links! Inspiration for the change, relevant documentation, conversations on Slack or Discord, clues found on StackOverflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code snippets&lt;/strong&gt; illustrating potential designs and false-starts. Use &lt;code&gt;```python ... ```&lt;/code&gt; blocks to get syntax highlighting in your issue comments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decisions&lt;/strong&gt;. What did you consider? What did you decide? As programmers we make hundreds of tiny decisions a day. Write them down! Then you'll never find yourself relitigating them in the future having forgotten your original reasoning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Screenshots&lt;/strong&gt;. What it looked like before, what it looked like after. Animated screenshots are even better! I use &lt;a href="https://www.cockos.com/licecap/"&gt;LICEcap&lt;/a&gt; to generate quick GIF screen captures or QuickTime to capture videos - both of which can be dropped straight into a GitHub issue comment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prototypes&lt;/strong&gt;. I'll often paste a few lines of code copied from a Python console session. Sometimes I'll even paste in a block of HTML and CSS, or add a screenshot of a UI prototype.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After I've closed my issues I like to add one last comment that links to the updated documentation and ideally a live demo of the new feature.&lt;/p&gt;
&lt;h4 id="issue-over-commit-message"&gt;An issue is more valuable than a commit message&lt;/h4&gt;
&lt;p&gt;I went through a several year phase of writing essays in my commit messages, trying to capture as much of the background context and thinking as possible.&lt;/p&gt;
&lt;p&gt;My commit messages grew a lot shorter when I started bundling the updated documentation in the commit - since often much of the material I'd previously included in the commit message was now in that documentation instead.&lt;/p&gt;
&lt;p&gt;As I extended my practice of writing issue threads, I found that they were a better place for most of this context than the commit messages themselves. They supported embedded media, were more discoverable and I could continue to extend them even after the commit had landed.&lt;/p&gt;
&lt;p&gt;Today many of my commit messages are a single line summary and a link to an issue!&lt;/p&gt;
&lt;p&gt;The biggest benefit of lengthy commit messages is that they are guaranteed to survive for as long as the repository itself. If you're going to use issue threads in the way I describe here it is critical that you consider their long term archival value.&lt;/p&gt;
&lt;p&gt;I expect this to be controversial! I'm advocating for abandoning one of the core ideas of Git here - that each repository should incorporate a full, decentralized record of its history that is copied in its entirety when someone clones a repo.&lt;/p&gt;
&lt;p&gt;I understand that philosophy. All I'll say here is that my own experience has been that dropping that requirement has resulted in a net increase in my overall productivity. Other people may reach a different conclusion.&lt;/p&gt;
&lt;p&gt;If this offends you too much, you're welcome to construct an &lt;em&gt;even more perfect commit&lt;/em&gt; that incorporates background information and additional context in an extended commit message as well.&lt;/p&gt;
&lt;p&gt;One of the reasons I like GitHub Issues is that it includes a comprehensive API, which can be used to extract all of that data. I use my &lt;a href="https://github.com/dogsheep/github-to-sqlite"&gt;github-to-sqlite tool&lt;/a&gt; to maintain an ongoing archive of my issues and issue comments as a SQLite database file.&lt;/p&gt;
&lt;h4 id="not-all-perfect"&gt;Not every commit needs to be "perfect"&lt;/h4&gt;
&lt;p&gt;I find that the vast majority of my work fits into this pattern, but there are exceptions.&lt;/p&gt;
&lt;p&gt;Typo fix for some documentation or a comment? Just ship it, it's fine.&lt;/p&gt;
&lt;p&gt;Bug fix that doesn't deserve documentation? Still bundle the implementation and the test plus a link to an issue, but no need to update the docs - especially if they already describe the expected bug-free behaviour.&lt;/p&gt;
&lt;p&gt;Generally though, I find that aiming for implementation, tests, documentation and an issue link covers almost all of my work. It's a really good default model.&lt;/p&gt;
&lt;h4 id="scrappy-branches"&gt;Write scrappy commits in a branch&lt;/h4&gt;
&lt;p&gt;If I'm writing more exploratory or experimental code it often doesn't make sense to work in this strict way. For those instances I'll usually work in a branch, where I can ship "WIP" commit messages and failing tests with abandon. I'll then squash-merge them into a single perfect commit (sometimes via a self-closed GitHub pull request) to keep my main branch as tidy as possible.&lt;/p&gt;
&lt;h4 id="examples"&gt;Some examples&lt;/h4&gt;
&lt;p&gt;Here are some examples of my commits that follow this pattern:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette/commit/9676b2deb07cff20247ba91dad3e84a4ab0b00d1"&gt;Upgrade Docker images to Python 3.11&lt;/a&gt; for &lt;a href="https://github.com/simonw/datasette/issues/1853"&gt;datasette #1853&lt;/a&gt; - a pretty tiny change, but still includes tests, docs and an issue link.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/sqlite-utils/commit/ab8d4aad0c42f905640981f6f24bc1e37205ae62"&gt;sqlite-utils schema now takes optional tables&lt;/a&gt; for &lt;a href="https://github.com/simonw/sqlite-utils/issues/299"&gt;sqlite-utils #299&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/shot-scraper/commit/5048e21a1ca5accedfeca6ac25a16a38dc240b81"&gt;shot-scraper html command&lt;/a&gt; for &lt;a href="https://github.com/simonw/shot-scraper/issues/96"&gt;shot-scraper #96&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/s3-credentials/commit/c7bb7268c4a124349bb511f7ec3ee3f28f9581ad"&gt;s3-credentials put-objects command&lt;/a&gt; for &lt;a href="https://github.com/simonw/s3-credentials/issues/68"&gt;s3-credentials #68&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette-gunicorn/commit/0d561d7a94f76079b1eb7779b3e944c163d2539e"&gt;Initial implementation&lt;/a&gt; for &lt;a href="https://github.com/simonw/datasette-gunicorn/issues/1"&gt;datasette-gunicorn #1&lt;/a&gt; - this was the first commit to this repository, but I still bundled the tests, docs, implementation and a link to an issue.&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/code-review"&gt;code-review&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/git"&gt;git&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/software-engineering"&gt;software-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/testing"&gt;testing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-issues"&gt;github-issues&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="code-review"/><category term="definitions"/><category term="documentation"/><category term="git"/><category term="github"/><category term="software-engineering"/><category term="testing"/><category term="github-issues"/></entry><entry><title>Automating screenshots for the Datasette documentation using shot-scraper</title><link href="https://simonwillison.net/2022/Oct/14/automating-screenshots/#atom-tag" rel="alternate"/><published>2022-10-14T23:44:03+00:00</published><updated>2022-10-14T23:44:03+00:00</updated><id>https://simonwillison.net/2022/Oct/14/automating-screenshots/#atom-tag</id><summary type="html">
    &lt;p&gt;I released &lt;a href="https://shot-scraper.datasette.io/"&gt;shot-scraper&lt;/a&gt; back &lt;a href="https://simonwillison.net/2022/Mar/10/shot-scraper/"&gt;in March&lt;/a&gt; as a tool for keeping screenshots in documentation up-to-date.&lt;/p&gt;
&lt;p&gt;It's very easy for feature screenshots in documentation for a web application to drift out-of-date with the latest design of the software itself.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;shot-scraper&lt;/code&gt; is a command-line tool that aims to solve this.&lt;/p&gt;
&lt;p&gt;You can use it to take one-off screenshots like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper https://latest.datasette.io/ --height 800
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Or you can define multiple screenshots in a single YAML file - let's call this &lt;code&gt;shots.yml&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://latest.datasette.io/&lt;/span&gt;
  &lt;span class="pl-ent"&gt;height&lt;/span&gt;: &lt;span class="pl-c1"&gt;800&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;index.png&lt;/span&gt;
- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://latest.datasette.io/fixtures&lt;/span&gt;
  &lt;span class="pl-ent"&gt;height&lt;/span&gt;: &lt;span class="pl-c1"&gt;800&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;database.png&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And run them all at once like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper multi shots.yml
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This morning I used &lt;code&gt;shot-scraper&lt;/code&gt; to replace all of the existing screenshots in the &lt;a href="https://docs.datasette.io/en/latest/"&gt;Datasette documentation&lt;/a&gt; with up-to-date, automated equivalents.&lt;/p&gt;
&lt;p&gt;I decided to use this as an opportunity to create a more detailed tutorial for how to use &lt;code&gt;shot-scraper&lt;/code&gt; for this kind of screenshot automation project.&lt;/p&gt;
&lt;h4&gt;Four screenshots to replace&lt;/h4&gt;
&lt;p&gt;Datasette's documentation included four screenshots that I wanted to replace with automated equivalents.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/datasette/blob/0.62/docs/full_text_search.png"&gt;full_text_search.png&lt;/a&gt; illustrates the full-text search feature:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/simonw/datasette/0.62/docs/full_text_search.png" alt="A search for cherry running against the Street_Tree_List table, returning 14,663 rows" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://raw.githubusercontent.com/simonw/datasette/0.62/docs/advanced_export.png"&gt;advanced_export.png&lt;/a&gt; displays Datasette's "advanced export" dialog:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/simonw/datasette/0.62/docs/advanced_export.png" alt="Advanced export dialog, with four links 3 checkboxes and an Export CSV button" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/datasette/blob/0.62/docs/binary_data.png"&gt;binary_data.png&lt;/a&gt; displays just a small fragment of a table with binary download links:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/simonw/datasette/0.62/docs/binary_data.png" alt="A small screenshot showing binary data download links" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/datasette/blob/0.62/docs/facets.png"&gt;facets.png&lt;/a&gt; demonstrates faceting against a table:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://github.com/simonw/datasette/raw/0.62/docs/facets.png?raw=true" alt="Datasette's facet interface, showing one suggested facet and three facet lists" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I'll walk through each screenshot in turn.&lt;/p&gt;
&lt;h4&gt;full_text_search.png&lt;/h4&gt;
&lt;p&gt;I decided to use a different example for the new screenshot, because I don't currently have a live instance for that table running against the most recent Datasette release.&lt;/p&gt;
&lt;p&gt;I went with &lt;a href="https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&amp;amp;_sort_desc=date"&gt;https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&amp;amp;_sort_desc=date&lt;/a&gt; - a search against the UK register of members interests for "hamper" (see &lt;a href="https://simonwillison.net/2018/Apr/25/register-members-interests/"&gt;Exploring the UK Register of Members Interests with SQL and Datasette&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The existing image in the documentation was 960 pixels wide, so I stuck with that and tried a few iterations until I found a height that I liked.&lt;/p&gt;
&lt;p&gt;I &lt;a href="https://shot-scraper.datasette.io/en/stable/installation.html"&gt;installed shot-scraper&lt;/a&gt; and ran the following, in my &lt;code&gt;/tmp&lt;/code&gt; directory:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper 'https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&amp;amp;_sort_desc=date' \
  -h 585 \
  -w 960
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This produced a &lt;code&gt;register-of-members-interests-datasettes-com-regmem-items.png&lt;/code&gt; file which looked good when I opened it in Preview.&lt;/p&gt;
&lt;p&gt;I turned that into the following YAML in my &lt;code&gt;shots.yml&lt;/code&gt; file:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&amp;amp;_sort_desc=date&lt;/span&gt;
  &lt;span class="pl-ent"&gt;height&lt;/span&gt;: &lt;span class="pl-c1"&gt;585&lt;/span&gt;
  &lt;span class="pl-ent"&gt;width&lt;/span&gt;: &lt;span class="pl-c1"&gt;960&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;regmem-search.png&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Running &lt;code&gt;shot-scraper multi shots.yml&lt;/code&gt; against that file produced this &lt;code&gt;regmem-search.png&lt;/code&gt; image:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/non-retina/regmem-search.png" alt="A screenshot of that search, with the most recent design for Datasette" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4&gt;advanced_export.png&lt;/h4&gt;
&lt;p&gt;This next image isn't a full page screenshot - it's just a small fragment of the page.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;shot-scraper&lt;/code&gt; can take partial screenshots based on one or more CSS selectors. Given a CSS selector the tool draws a box around just that element and uses that to take the screenshot - adding optional padding.&lt;/p&gt;
&lt;p&gt;Here's the recipe for the advanced export box - I used the same &lt;code&gt;register-of-members-interests.datasettes.com&lt;/code&gt; example for it as this had enough rows to trigger all of the advanced options to be displayed:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper 'https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper' \
  -s '#export' \
  -p 10
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;-p 10&lt;/code&gt; here specifies 10px of padding, needed to capture the drop shadow on the box.&lt;/p&gt;
&lt;p&gt;Here's the equivalent YAML:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&lt;/span&gt;
  &lt;span class="pl-ent"&gt;selector&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;#export&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;advanced-export.png&lt;/span&gt;
  &lt;span class="pl-ent"&gt;padding&lt;/span&gt;: &lt;span class="pl-c1"&gt;10&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And the result:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/advanced-export.png" alt="A screenshot of the advanced export box" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4&gt;binary_data.png&lt;/h4&gt;
&lt;p&gt;This screenshot required a different trick.&lt;/p&gt;
&lt;p&gt;I wanted to take a screenshot of the table &lt;a href="https://latest.datasette.io/fixtures/binary_data"&gt;on this page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The full table looks like this, with three rows:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/shot-scraper-binary-table.png" alt="A table with three rows - two containing binary data and one that is empty" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I only wanted the first two of these to be shown in the screenshot though.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;shot-scraper&lt;/code&gt; has the ability to execute JavaScript on the page before the screenshot is taken. This can be used to remove elements first.&lt;/p&gt;
&lt;p&gt;Here's the JavaScript I came up with to remove all but the first two rows (actually the first three, because the table header counts as a row too):&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-v"&gt;Array&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;from&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
  &lt;span class="pl-smi"&gt;document&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;querySelectorAll&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;'tr:nth-child(n+3)'&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-s1"&gt;el&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="pl-s1"&gt;el&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;parentNode&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;removeChild&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;el&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;
&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I did it this way so that if I add any more rows to that test table in the future the code will still remove everything but the first two.&lt;/p&gt;
&lt;p&gt;The CSS selector &lt;code&gt;tr:nth-child(n+3)&lt;/code&gt; selects all rows that are not the first three (one header plus two content rows).&lt;/p&gt;
&lt;p&gt;Here's how to run that from the command-line, and then take a 10 pixel padded screenshot of just the table on the page after it has been modified by the JavaScript:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper 'https://latest.datasette.io/fixtures/binary_data' \
  -j 'Array.from(document.querySelectorAll("tr:nth-child(n+3)"), el =&amp;gt; el.parentNode.removeChild(el));' \
  -s table -p 10
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The YAML I added to &lt;code&gt;shots.yml&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://latest.datasette.io/fixtures/binary_data&lt;/span&gt;
  &lt;span class="pl-ent"&gt;selector&lt;/span&gt;: &lt;span class="pl-s"&gt;table&lt;/span&gt;
  &lt;span class="pl-ent"&gt;javascript&lt;/span&gt;: &lt;span class="pl-s"&gt;|-&lt;/span&gt;
&lt;span class="pl-s"&gt;    Array.from(&lt;/span&gt;
&lt;span class="pl-s"&gt;      document.querySelectorAll('tr:nth-child(n+3)'),&lt;/span&gt;
&lt;span class="pl-s"&gt;      el =&amp;gt; el.parentNode.removeChild(el)&lt;/span&gt;
&lt;span class="pl-s"&gt;    );&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;  &lt;span class="pl-ent"&gt;padding&lt;/span&gt;: &lt;span class="pl-c1"&gt;10&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;binary-data.png&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And the resulting image:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/binary-data.png" alt="A screenshot of the binary data table, with just the first two rows" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4&gt;facets.png&lt;/h4&gt;
&lt;p&gt;I left the most complex screenshot to last.&lt;/p&gt;
&lt;p&gt;For the faceting screenshot, I wanted to include the "suggested facet" links at the top of the page, a set of active facets and then the first three rows of the following table.&lt;/p&gt;
&lt;p&gt;But... the table has quite a lot of columns. For a neater screenshot I only wanted to include a subset of columns in the final shot.&lt;/p&gt;
&lt;p&gt;Here's the screenshot I ended up taking:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/non-retina/faceting-details.png" alt="A screenshot of the suggested facet,s facets and first three rows and ten columns of the following table" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;And the YAML recipe:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://congress-legislators.datasettes.com/legislators/legislator_terms?_facet=type&amp;amp;_facet=party&amp;amp;_facet=state&amp;amp;_facet_size=10&lt;/span&gt;
  &lt;span class="pl-ent"&gt;selectors_all&lt;/span&gt;:
  - &lt;span class="pl-s"&gt;.suggested-facets a&lt;/span&gt;
  - &lt;span class="pl-s"&gt;tr:not(tr:nth-child(n+4)) td:not(:nth-child(n+11))&lt;/span&gt;
  &lt;span class="pl-ent"&gt;padding&lt;/span&gt;: &lt;span class="pl-c1"&gt;10&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;faceting-details.png&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The key trick I'm using here is that &lt;code&gt;selectors_all&lt;/code&gt; list.&lt;/p&gt;
&lt;p&gt;The usual &lt;code&gt;shot-scraper&lt;/code&gt; selector option finds the first element on the page matching the specified CSS selector and takes a screenshot of that.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;--selector-all&lt;/code&gt; - or the YAML equivalent &lt;code&gt;selectors_all&lt;/code&gt; - instead finds EVERY element that matches any of the specified selectors and draws a bounding box containing all of them.&lt;/p&gt;
&lt;p&gt;I wanted that bounding box to surround a subset of the table cells on the page. I used this CSS selector to indicate that subset:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;tr:not(tr:nth-child(n+4)) td:not(:nth-child(n+11))&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Here's what GPT-3 says if you &lt;a href="https://simonwillison.net/2022/Jul/9/gpt-3-explain-code/"&gt;ask it to explain&lt;/a&gt; the selector:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Explain this CSS selector:&lt;/p&gt;
&lt;p&gt;tr:not(tr:nth-child(n+4)) td:not(:nth-child(n+11))&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;This selector is selecting all table cells in rows that are not the fourth row or greater, and are not in columns that are the 11th column or greater.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;(See also &lt;a href="https://til.simonwillison.net/shot-scraper/subset-of-table-columns"&gt;this TIL&lt;/a&gt;.)&lt;/p&gt;
&lt;h4&gt;Automating everything using GitHub Actions&lt;/h4&gt;
&lt;p&gt;Here's the full &lt;code&gt;shots.yml&lt;/code&gt; YAML needed to generate all four of these screenshots:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&amp;amp;_sort_desc=date&lt;/span&gt;
  &lt;span class="pl-ent"&gt;height&lt;/span&gt;: &lt;span class="pl-c1"&gt;585&lt;/span&gt;
  &lt;span class="pl-ent"&gt;width&lt;/span&gt;: &lt;span class="pl-c1"&gt;960&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;regmem-search.png&lt;/span&gt;
- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://register-of-members-interests.datasettes.com/regmem/items?_search=hamper&lt;/span&gt;
  &lt;span class="pl-ent"&gt;selector&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;#export&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;advanced-export.png&lt;/span&gt;
  &lt;span class="pl-ent"&gt;padding&lt;/span&gt;: &lt;span class="pl-c1"&gt;10&lt;/span&gt;
- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://congress-legislators.datasettes.com/legislators/legislator_terms?_facet=type&amp;amp;_facet=party&amp;amp;_facet=state&amp;amp;_facet_size=10&lt;/span&gt;
  &lt;span class="pl-ent"&gt;selectors_all&lt;/span&gt;:
  - &lt;span class="pl-s"&gt;.suggested-facets a&lt;/span&gt;
  - &lt;span class="pl-s"&gt;tr:not(tr:nth-child(n+4)) td:not(:nth-child(n+11))&lt;/span&gt;
  &lt;span class="pl-ent"&gt;padding&lt;/span&gt;: &lt;span class="pl-c1"&gt;10&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;faceting-details.png&lt;/span&gt;
- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://latest.datasette.io/fixtures/binary_data&lt;/span&gt;
  &lt;span class="pl-ent"&gt;selector&lt;/span&gt;: &lt;span class="pl-s"&gt;table&lt;/span&gt;
  &lt;span class="pl-ent"&gt;javascript&lt;/span&gt;: &lt;span class="pl-s"&gt;|-&lt;/span&gt;
&lt;span class="pl-s"&gt;    Array.from(&lt;/span&gt;
&lt;span class="pl-s"&gt;      document.querySelectorAll('tr:nth-child(n+3)'),&lt;/span&gt;
&lt;span class="pl-s"&gt;      el =&amp;gt; el.parentNode.removeChild(el)&lt;/span&gt;
&lt;span class="pl-s"&gt;    );&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;  &lt;span class="pl-ent"&gt;padding&lt;/span&gt;: &lt;span class="pl-c1"&gt;10&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;binary-data.png&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Running &lt;code&gt;shot-scraper shots shots.yml&lt;/code&gt; against this file takes all four screenshots.&lt;/p&gt;
&lt;p&gt;But I want this to be fully automated! So I turned to &lt;a href="https://github.com/features/actions"&gt;GitHub Actions&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A while ago I created a template repository for setting up GitHub Actions to take screenshots using &lt;code&gt;shot-scraper&lt;/code&gt; and write them back to the same repo. I wrote about that in &lt;a href="https://simonwillison.net/2022/Mar/14/shot-scraper-template/"&gt;Instantly create a GitHub repository to take screenshots of a web page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I had previously used that recipe to create my &lt;a href="https://github.com/simonw/datasette-screenshots"&gt;datasette-screenshots&lt;/a&gt; repository - with its own &lt;code&gt;shots.yml&lt;/code&gt; file.&lt;/p&gt;
&lt;p&gt;So I added the new YAML to that existing file, committed the change, waited a minute and the result was all four images stored in that repository!&lt;/p&gt;
&lt;p&gt;My &lt;code&gt;datasette-screenshots&lt;/code&gt; &lt;a href="https://github.com/simonw/datasette-screenshots/blob/main/.github/workflows/shots.yml"&gt;workflow&lt;/a&gt; actually has two key changes from my default template. First, it takes every screenshot twice - once as a retina image and once as a regular image:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Take retina shots&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;        shot-scraper multi shots.yml --retina&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Take non-retina shots&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;        mkdir -p non-retina&lt;/span&gt;
&lt;span class="pl-s"&gt;        cd non-retina&lt;/span&gt;
&lt;span class="pl-s"&gt;        shot-scraper multi ../shots.yml&lt;/span&gt;
&lt;span class="pl-s"&gt;        cd ..&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This provides me with both a high quality image and a smaller, faster-loading image for each screenshot.&lt;/p&gt;
&lt;p&gt;Secondly, it runs &lt;code&gt;oxipng&lt;/code&gt; to optimize the PNGs before committing them to the repo:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Optimize PNGs&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|-&lt;/span&gt;
&lt;span class="pl-s"&gt;        oxipng -o 4 -i 0 --strip safe *.png&lt;/span&gt;
&lt;span class="pl-s"&gt;        oxipng -o 4 -i 0 --strip safe non-retina/*.png&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;a href="https://shot-scraper.datasette.io/en/stable/github-actions.html#optimizing-pngs-using-oxipng"&gt;shot-scraper documentation&lt;/a&gt; describes this pattern in more detail.&lt;/p&gt;
&lt;p&gt;With all of that in place, simply committing a change to the &lt;code&gt;shots.yml&lt;/code&gt; file is enough to generate and store the new screenshots.&lt;/p&gt;
&lt;h4&gt;Linking to the images&lt;/h4&gt;
&lt;p&gt;One last problem to solve: I want to include these images in my documentation, which means I need a way to link to them.&lt;/p&gt;
&lt;p&gt;I decided to use GitHub to host these directly, via the &lt;code&gt;raw.githubusercontent.com&lt;/code&gt; domain - which is fronted by the Fastly CDN.&lt;/p&gt;
&lt;p&gt;I care about up-to-date images, but I also want different versions of the Datasette documentation to reflect the corresponding design in their screenshots - so I needed a way to snapshot those screenshots to a known version.&lt;/p&gt;
&lt;p&gt;Repository tags are one way to do this.&lt;/p&gt;
&lt;p&gt;I tagged the &lt;code&gt;datasette-screenshots&lt;/code&gt; repository with &lt;code&gt;0.62&lt;/code&gt;, since that's the version of Datasette that the screenshots were taken for.&lt;/p&gt;
&lt;p&gt;This gave me the following URLs for the images:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/advanced-export.png"&gt;https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/advanced-export.png&lt;/a&gt; (retina)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/non-retina/regmem-search.png"&gt;https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/non-retina/regmem-search.png&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/binary-data.png"&gt;https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/binary-data.png&lt;/a&gt; (retina)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/non-retina/faceting-details.png"&gt;https://raw.githubusercontent.com/simonw/datasette-screenshots/0.62/non-retina/faceting-details.png&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To save on page loading time I decided to use the non-retina URLs for the two larger images.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/simonw/datasette/commit/fdf9891c3f0313af9244778574c7ebaac9c3a438"&gt;the commit&lt;/a&gt; that updated the Datasette documentation to link to these new images (and deleted the old images from the repo).&lt;/p&gt;
&lt;p&gt;You can see the new images in the documentation on these pages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.datasette.io/en/latest/csv_export.html"&gt;https://docs.datasette.io/en/latest/csv_export.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.datasette.io/en/latest/binary_data.html"&gt;https://docs.datasette.io/en/latest/binary_data.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.datasette.io/en/latest/facets.html"&gt;https://docs.datasette.io/en/latest/facets.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.datasette.io/en/latest/full_text_search.html"&gt;https://docs.datasette.io/en/latest/full_text_search.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/shot-scraper"&gt;shot-scraper&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="documentation"/><category term="datasette"/><category term="github-actions"/><category term="shot-scraper"/></entry><entry><title>Software engineering practices</title><link href="https://simonwillison.net/2022/Oct/1/software-engineering-practices/#atom-tag" rel="alternate"/><published>2022-10-01T15:56:02+00:00</published><updated>2022-10-01T15:56:02+00:00</updated><id>https://simonwillison.net/2022/Oct/1/software-engineering-practices/#atom-tag</id><summary type="html">
    &lt;p&gt;Gergely Orosz &lt;a href="https://twitter.com/GergelyOrosz/status/1576161504260657152"&gt;started a Twitter conversation&lt;/a&gt; asking about recommended "software engineering practices" for development teams.&lt;/p&gt;
&lt;p&gt;(I really like his rejection of the term "best practices" here: I always feel it's prescriptive and misguiding to announce something as "best".)&lt;/p&gt;
&lt;p&gt;I decided to flesh some of my replies out into a longer post.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/1/software-engineering-practices/#docs-same-repo"&gt;Documentation in the same repo as the code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/1/software-engineering-practices/#create-test-data"&gt;Mechanisms for creating test data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/1/software-engineering-practices/#rock-solid-migrations"&gt;Rock solid database migrations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/1/software-engineering-practices/#new-project-templates"&gt;Templates for new projects and components&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/1/software-engineering-practices/#auto-formatting"&gt;Automated code formatting&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/1/software-engineering-practices/#tested-dev-environments"&gt;Tested, automated process for new development environments&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonwillison.net/2022/Oct/1/software-engineering-practices/#automated-previews"&gt;Automated preview environments&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="docs-same-repo"&gt;Documentation in the same repo as the code&lt;/h4&gt;
&lt;p&gt;The most important characteristic of internal documentation is trust: do people trust that documentation both exists and is up-to-date?&lt;/p&gt;
&lt;p&gt;If they don't, they won't read it or contribute to it.&lt;/p&gt;
&lt;p&gt;The best trick I know of for improving the trustworthiness of documentation is to put it in the same repository as the code it documents, for a few reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;You can enforce documentation updates as part of your code review process. If a PR changes code in a way that requires documentation updates, the reviewer can ask for those updates to be included.&lt;/li&gt;
&lt;li&gt;You get versioned documentation. If you're using an older version of a library you can consult the documentation for that version. If you're using the current main branch you can see documentation for that, without confusion over what corresponds to the most recent "stable" release.&lt;/li&gt;
&lt;li&gt;You can integrate your documentation with your automated tests! I wrote about this in &lt;a href="https://simonwillison.net/2018/Jul/28/documentation-unit-tests/"&gt;Documentation unit tests&lt;/a&gt;, which describes a pattern for introspecting code and then ensuring that the documentation at least has a section header that matches specific concepts, such as plugin hooks or configuration options.&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id="create-test-data"&gt;Mechanisms for creating test data&lt;/h4&gt;
&lt;p&gt;When you work on large products, your customers will inevitably find surprising ways to stress or break your system. They might create an event with over a hundred different types of ticket for example, or an issue thread with a thousand comments.&lt;/p&gt;
&lt;p&gt;These can expose performance issues that don't affect the majority of your users, but can still lead to service outages or other problems.&lt;/p&gt;
&lt;p&gt;Your engineers need a way to replicate these situations in their own development environments.&lt;/p&gt;
&lt;p&gt;One way to handle this is to provide tooling to import production data into local environments. This has privacy and security implications - what if a developer laptop gets stolen that happens to have a copy of your largest customer's data?&lt;/p&gt;
&lt;p&gt;A better approach is to have a robust system in place for generating test data, that covers a variety of different scenarios.&lt;/p&gt;
&lt;p&gt;You might have a button somewhere that creates an issue thread with a thousand fake comments, with a note referencing the bug that this helps emulate.&lt;/p&gt;
&lt;p&gt;Any time a new edge case shows up, you can add a new recipe to that system. That way engineers can replicate problems locally without needing copies of production data.&lt;/p&gt;
&lt;h4 id="rock-solid-migrations"&gt;Rock solid database migrations&lt;/h4&gt;
&lt;p&gt;The hardest part of large-scale software maintenance is inevitably the bit where you need to change your database schema.&lt;/p&gt;
&lt;p&gt;(I'm confident that one of the biggest reasons NoSQL databases became popular over the last decade was the pain people had associated with relational databases due to schema changes. Of course, NoSQL database schema modifications are still necessary, and often they're even more painful!)&lt;/p&gt;
&lt;p&gt;So you need to invest in a really good, version-controlled mechanism for managing schema changes. And a way to run them in production without downtime.&lt;/p&gt;
&lt;p&gt;If you do not have this your engineers will respond by being fearful of schema changes. Which means they'll come up with increasingly complex hacks to avoid them, which piles on technical debt.&lt;/p&gt;
&lt;p&gt;This is a deep topic. I mostly use Django for large database-backed applications, and Django has the best &lt;a href="https://docs.djangoproject.com/en/4.1/topics/migrations/"&gt;migration system&lt;/a&gt; I've ever personally experienced. If I'm working without Django I try to replicate its approach as closely as possible:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The database knows which migrations have already been applied. This means when you run the "migrate" command it can run just the ones that are still needed - important for managing multiple databases, e.g. production, staging, test and development environments.&lt;/li&gt;
&lt;li&gt;A single command that applies pending migrations, and updates the database rows that record which migrations have been run.&lt;/li&gt;
&lt;li&gt;Optional: rollbacks. Django migrations can be rolled back, which is great for iterating in a development environment but using that in production is actually quite rare: I'll often ship a new migration that reverses the change instead rather than using a rollback, partly to keep the record of the mistake in version control.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Even harder is the challenge of making schema changes without any downtime. I'm always interested in reading about new approaches for this - GitHub's &lt;a href="https://github.com/github/gh-ost"&gt;gh-ost&lt;/a&gt; is a neat solution for MySQL.&lt;/p&gt;
&lt;p&gt;An interesting consideration here is that it's rarely possible to have application code and database schema changes go out at the exact same instance in time. As a result, to avoid downtime you need to design every schema change with this in mind. The process needs to be:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Design a new schema change that can be applied without changing the application code that uses it.&lt;/li&gt;
&lt;li&gt;Ship that change to production, upgrading your database while keeping the old code working.&lt;/li&gt;
&lt;li&gt;Now ship new application code that uses the new schema.&lt;/li&gt;
&lt;li&gt;Ship a new schema change that cleans up any remaining work - dropping columns that are no longer used, for example.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This process is a pain. It's difficult to get right. The only way to get good at it is to practice it a lot over time.&lt;/p&gt;
&lt;p&gt;My rule is this: &lt;strong&gt;schema changes should be boring and common&lt;/strong&gt;, as opposed to being exciting and rare.&lt;/p&gt;
&lt;h4 id="new-project-templates"&gt;Templates for new projects and components&lt;/h4&gt;
&lt;p&gt;If you're working with microservices, your team will inevitably need to build new ones.&lt;/p&gt;
&lt;p&gt;If you're working in a monorepo, you'll still have elements of your codebase with similar structures - components and feature implementations of some sort.&lt;/p&gt;
&lt;p&gt;Be sure to have really good templates in place for creating these "the right way" - with the right directory structure, a README and a test suite with a single, dumb passing test.&lt;/p&gt;
&lt;p&gt;I like to use the Python &lt;a href="https://cookiecutter.readthedocs.io/"&gt;cookiecutter&lt;/a&gt; tool for this. I've also used GitHub template repositories, and I even have a neat trick for &lt;a href="https://simonwillison.net/2021/Aug/28/dynamic-github-repository-templates/"&gt;combining the two&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;These templates need to be maintained and kept up-to-date. The best way to do that is to make sure they are being used - every time a new project is created is a chance to revise the template and make sure it still reflects the recommended way to do things.&lt;/p&gt;
&lt;h4 id="auto-formatting"&gt;Automated code formatting&lt;/h4&gt;
&lt;p&gt;This one's easy. Pick a code formatting tool for your language - like &lt;a href="https://github.com/psf/black"&gt;Black&lt;/a&gt; for Python or &lt;a href="https://prettier.io/"&gt;Prettier&lt;/a&gt; for JavaScript (I'm so jealous of how Go has &lt;a href="https://pkg.go.dev/cmd/gofmt"&gt;gofmt&lt;/a&gt; built in) - and run its "check" mode in your CI flow.&lt;/p&gt;
&lt;p&gt;Don't argue with its defaults, just commit to them.&lt;/p&gt;
&lt;p&gt;This saves an incredible amount of time in two places:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;As an individual, you get back all of that mental energy you used to spend thinking about the best way to format your code and can spend it on something more interesting.&lt;/li&gt;
&lt;li&gt;As a team, your code reviews can entirely skip the pedantic arguments about code formatting. Huge productivity win!&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="tested-dev-environments"&gt;Tested, automated process for new development environments&lt;/h4&gt;
&lt;p&gt;The most painful part of any software project is inevitably setting up the initial development environment.&lt;/p&gt;
&lt;p&gt;The moment your team grows beyond a couple of people, you should invest in making this work better.&lt;/p&gt;
&lt;p&gt;At the very least, you need a documented process for creating a new environment - and it has to be known-to-work, so any time someone is onboarded using it they should be encouraged to fix any problems in the documentation or accompanying scripts as they encounter them.&lt;/p&gt;
&lt;p&gt;Much better is an automated process: a single script that gets everything up and running. Tools like Docker have made this a LOT easier over the past decade.&lt;/p&gt;
&lt;p&gt;I'm increasingly convinced that the best-in-class solution here is cloud-based development environments. The ability to click a button on a web page and have a fresh, working development environment running a few seconds later is a game-changer for large development teams.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.gitpod.io/"&gt;Gitpod&lt;/a&gt; and &lt;a href="https://github.com/features/codespaces"&gt;Codespaces&lt;/a&gt; are two of the most promising tools I've tried in this space.&lt;/p&gt;
&lt;p&gt;I've seen developers lose hours a week to issues with their development environment. Eliminating that across a large team is the equivalent of hiring several new full-time engineers!&lt;/p&gt;
&lt;h4 id="automated-previews"&gt;Automated preview environments&lt;/h4&gt;
&lt;p&gt;Reviewing a pull request is a lot easier if you can actually try out the changes.&lt;/p&gt;
&lt;p&gt;The best way to do this is with automated preview environments, directly linked to from the PR itself.&lt;/p&gt;
&lt;p&gt;These are getting increasingly easy to offer. &lt;a href="https://vercel.com/features/previews"&gt;Vercel&lt;/a&gt;, &lt;a href="https://www.netlify.com/products/deploy-previews/"&gt;Netlify&lt;/a&gt;, &lt;a href="https://render.com/docs/pull-request-previews"&gt;Render&lt;/a&gt; and &lt;a href="https://devcenter.heroku.com/articles/github-integration-review-apps"&gt;Heroku&lt;/a&gt; all have features that can do this. Building a custom system on top of something like &lt;a href="https://cloud.google.com/run"&gt;Google Cloud Run&lt;/a&gt; or &lt;a href="https://fly.io/blog/fly-machines/"&gt;Fly Machines&lt;/a&gt; is also possible with a bit of work.&lt;/p&gt;
&lt;p&gt;This is another one of those things which requires some up-front investment but will pay itself off many times over through increased productivity and quality of reviews.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/software-engineering"&gt;software-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/testing"&gt;testing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/version-control"&gt;version-control&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/zero-downtime"&gt;zero-downtime&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/technical-debt"&gt;technical-debt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gergely-orosz"&gt;gergely-orosz&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="documentation"/><category term="software-engineering"/><category term="testing"/><category term="version-control"/><category term="zero-downtime"/><category term="github-actions"/><category term="technical-debt"/><category term="gergely-orosz"/></entry><entry><title>How I’m a Productive Programmer With a Memory of a Fruit Fly</title><link href="https://simonwillison.net/2022/Sep/19/docsets/#atom-tag" rel="alternate"/><published>2022-09-19T16:19:02+00:00</published><updated>2022-09-19T16:19:02+00:00</updated><id>https://simonwillison.net/2022/Sep/19/docsets/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://hynek.me/articles/productive-fruit-fly-programmer/"&gt;How I’m a Productive Programmer With a Memory of a Fruit Fly&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Hynek Schlawack describes the value he gets from searchable offline developer documentation, and advocates for the Documentation Sets format which bundles docs, metadata and a SQLite search index. Hynek’s doc2dash command can convert documentation generated by tools like Sphinx into a docset that’s compatible with several offline documentation browser applications.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/hynek/status/1571875599064481798"&gt;@hynek&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sphinx-docs"&gt;sphinx-docs&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hynek-schlawack"&gt;hynek-schlawack&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="sqlite"/><category term="sphinx-docs"/><category term="hynek-schlawack"/></entry><entry><title>Quoting Ken Williams</title><link href="https://simonwillison.net/2022/Aug/4/ken-williams/#atom-tag" rel="alternate"/><published>2022-08-04T15:50:56+00:00</published><updated>2022-08-04T15:50:56+00:00</updated><id>https://simonwillison.net/2022/Aug/4/ken-williams/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://github.com/hackergrrl/art-of-readme/blob/master/README.md#no-readme-no-abstraction"&gt;&lt;p&gt;Your documentation is complete when someone can use your module without ever having to look at its code. This is very important. This makes it possible for you to separate your module's documented interface from its internal implementation (guts). This is good because it means that you are free to change the module's internals as long as the interface remains the same.&lt;/p&gt;
&lt;p&gt;Remember: the documentation, not the code, defines what a module does.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://github.com/hackergrrl/art-of-readme/blob/master/README.md#no-readme-no-abstraction"&gt;Ken Williams&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/></entry><entry><title>Cleaning data with sqlite-utils and Datasette</title><link href="https://simonwillison.net/2022/Jul/31/cleaning-data/#atom-tag" rel="alternate"/><published>2022-07-31T19:57:51+00:00</published><updated>2022-07-31T19:57:51+00:00</updated><id>https://simonwillison.net/2022/Jul/31/cleaning-data/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://datasette.io/tutorials/clean-data"&gt;Cleaning data with sqlite-utils and Datasette&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I wrote a new tutorial for the Datasette website, showing how to use sqlite-utils to import a CSV file, clean up the resulting schema, fix date formats and extract some of the columns into a separate table. It’s accompanied by a ten minute video originally recorded for the HYTRADBOI conference.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/simonw/status/1553821212257632256"&gt;@simonw&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tutorials"&gt;tutorials&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="tutorials"/><category term="datasette"/><category term="sqlite-utils"/></entry><entry><title>Weeknotes: Building Datasette Cloud on Fly Machines, Furo for documentation</title><link href="https://simonwillison.net/2022/May/26/weeknotes-building-datasette-cloud/#atom-tag" rel="alternate"/><published>2022-05-26T04:35:11+00:00</published><updated>2022-05-26T04:35:11+00:00</updated><id>https://simonwillison.net/2022/May/26/weeknotes-building-datasette-cloud/#atom-tag</id><summary type="html">
    &lt;p&gt;Hosting provider Fly released &lt;a href="https://fly.io/blog/fly-machines/"&gt;Fly Machines&lt;/a&gt; this week. I got an early preview and I've been working with it for a few days - it's a &lt;em&gt;fascinating&lt;/em&gt; new piece of technology. I'm using it to get my hosting service for Datasette ready for wider release.&lt;/p&gt;
&lt;h4&gt;Datasette Cloud&lt;/h4&gt;
&lt;p&gt;Datasette Cloud is the name I've given my forthcoming hosted SaaS version of Datasette. I'm building it for two reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;This is an obvious step towards building a sustainable business model for my open source project. It's a reasonably well-trodden path at this point: plenty of projects have demonstrated that offering paid hosting for an open source project can build a valuable business. &lt;a href="https://gitlab.com/"&gt;GitLab&lt;/a&gt; are an especially good example of this model.&lt;/li&gt;
&lt;li&gt;There are plenty of people who could benefit from Datasette, but the friction involved in hosting it prevents them from taking advantage of the software. I've tried to make it &lt;a href="https://docs.datasette.io/en/stable/deploying.html"&gt;as easy to host&lt;/a&gt; as possible, but without a SaaS hosted version I'm failing to deliver value to the people that I most want the software to help.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;My previous alpha was built directly on Docker, running everything on a single large VPS. Obviously it needed to scale beyond one machine, and I started experimenting with Kubernetes to make this happen.&lt;/p&gt;
&lt;p&gt;I also want to allow users to run their own plugins, without risk of malicious code causing problems for other accounts. Docker and Kubernetes containers don't offer the isolation that I need to feel comfortable doing this, so I started researching &lt;a href="https://firecracker-microvm.github.io/"&gt;Firecracker&lt;/a&gt; - constructed by AWS to power Lambda and Fargate, so very much designed with potentially malicious code in mind.&lt;/p&gt;
&lt;p&gt;Spinning up Firecracker on a Kubernetes cluster is no small lift!&lt;/p&gt;
&lt;p&gt;And then I heard about &lt;a href="https://fly.io/blog/fly-machines/"&gt;Fly Machines&lt;/a&gt;. And it looks like it's exactly what I need to get this project to the next milestone.&lt;/p&gt;
&lt;h4&gt;Fly Machines&lt;/h4&gt;
&lt;p&gt;Fly's core offering allows you to run Docker containers in regions around the world, compiled (automatically by Fly) to Firecracker containers with geo-load-balancing so users automatically get routed to an instance running near them.&lt;/p&gt;
&lt;p&gt;Their new Fly Machines product gives you a new way to run containers there: you get full control over when containers are created, updated, started, stopped and destroyed. It's the exact level of control I need to build Datasette Cloud.&lt;/p&gt;
&lt;p&gt;It also implements scale-to-zero: you can stop a container, and Fly will automatically start it back up again for you (generally in less than a second) when fresh traffic comes in.&lt;/p&gt;
&lt;p&gt;(I had built my own version of this for my Datasette Cloud alpha, but the spin up time took more like 10s and involved showing the user a custom progress bar to help them see what was going on.)&lt;/p&gt;
&lt;p&gt;Being able to programatically start and stop Firecracker containers was exactly what I'd been trying to piece together using Kubernetes - and the ability to control which global region they go in (with the potential for &lt;a href="https://tip.litestream.io/guides/read-replica/"&gt;Litestream replication&lt;/a&gt; between regions in the future) is a feature I hadn't expected to be able to offer for years.&lt;/p&gt;
&lt;p&gt;So I spent most of this week on a proof of concept. I've successfully demonstrated that the Fly Machines product has almost exactly the features that I need to ship Datasette Cloud on Fly Machines - and I've confirmed that the gaps I need to fill are on Fly's near-term roadmap.&lt;/p&gt;
&lt;p&gt;I don't have anything to demonstrate publicly just yet, but I do have &lt;a href="https://til.simonwillison.net/fly"&gt;several new TILs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If this sounds interesting to you or your organization and you'd like to try it out, drop me an email at &lt;code&gt;swillison&lt;/code&gt; @ Google's email service.&lt;/p&gt;
&lt;h4 id="furo-theme"&gt;The Furo theme for Sphinx&lt;/h4&gt;
&lt;p&gt;My &lt;a href="https://github.com/simonw/shot-scraper"&gt;shot-scraper&lt;/a&gt; automated screenshot tool's README had got a little too long, so I decided to upgrade it to &lt;a href="https://shot-scraper.datasette.io/"&gt;a full documentation website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I chose to use &lt;a href="https://myst-parser.readthedocs.io/"&gt;MyST&lt;/a&gt; and &lt;a href="https://www.sphinx-doc.org/"&gt;Sphinx&lt;/a&gt; for this, hosted on &lt;a href="https://www.readthedocs.org/"&gt;Read The Docs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;MyST adds Markdown syntax to Sphinx, which is easier to remember (and for people to contribute to) than reStructuredText.&lt;/p&gt;
&lt;p&gt;After putting the site live, Adam Johnson &lt;a href="https://twitter.com/AdamChainz/status/1527666472193081345"&gt;suggested&lt;/a&gt; I take a look at the &lt;a href="https://github.com/pradyunsg/furo"&gt;Furo theme&lt;/a&gt;. I'd previously found Sphinx themes hard to navigate because they had so much differing functionality, but a personal recommendation turned out to be exactly what I needed.&lt;/p&gt;
&lt;p&gt;Furo is really nice - it fixed a slight rendering complaint I had about nested lists in the theme I was using, and since it doesn't use web fonts it dropped the bytes transferred for a page of documentation by more than half!&lt;/p&gt;
&lt;p&gt;I switched &lt;code&gt;shot-scraper&lt;/code&gt; over to Furo, and liked it so much that I switched over &lt;a href="https://docs.datasette.io/en/latest/"&gt;Datasette&lt;/a&gt; and &lt;a href="https://sqlite-utils.datasette.io/en/latest/"&gt;sqlite-utils&lt;/a&gt; too.&lt;/p&gt;
&lt;p&gt;Here's what the &lt;code&gt;shot-scraper&lt;/code&gt; documentation looks like now:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/shot-scraper-docs.png" alt="A screenshot of the shot-scraper documentation, showing the table of contents" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Screenshot taken using &lt;code&gt;shot-scraper&lt;/code&gt; itself, like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper \
  https://shot-scraper.datasette.io/en/latest/ \
  --retina --height 1200
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Full details of those theme migrations (including more comparative screenshots) can be found in these issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/simonw/shot-scraper/issues/77"&gt;shot-scraper: Switch to Furo theme #77&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/simonw/datasette/issues/1746"&gt;datasette: Switch documentation theme to Furo #1746&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/simonw/sqlite-utils/issues/435"&gt;sqlite-utils:  Switch to Furo documentation theme #435&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-unsafe-expose-env"&gt;datasette-unsafe-expose-env&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-unsafe-expose-env/releases/tag/0.1"&gt;0.1&lt;/a&gt; - 2022-05-25
&lt;br /&gt;Datasette plugin to expose some environment variables at /-/env for debugging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/shot-scraper"&gt;shot-scraper&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/shot-scraper/releases/tag/0.14.1"&gt;0.14.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/shot-scraper/releases"&gt;16 releases total&lt;/a&gt;) - 2022-05-22
&lt;br /&gt;A comand-line utility for taking automated screenshots of websites&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/google-calendar-to-sqlite"&gt;google-calendar-to-sqlite&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/google-calendar-to-sqlite/releases/tag/0.1a0"&gt;0.1a0&lt;/a&gt; - 2022-05-21
&lt;br /&gt;Create a SQLite database containing your data from Google Calendar&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-upload-dbs"&gt;datasette-upload-dbs&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-upload-dbs/releases/tag/0.1.1"&gt;0.1.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-upload-dbs/releases"&gt;2 releases total&lt;/a&gt;) - 2022-05-17
&lt;br /&gt;Upload SQLite database files to Datasette&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-insert"&gt;datasette-insert&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-insert/releases/tag/0.7"&gt;0.7&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-insert/releases"&gt;7 releases total&lt;/a&gt;) - 2022-05-16
&lt;br /&gt;Datasette plugin for inserting and updating data&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/github-actions/job-summaries"&gt;GitHub Actions job summaries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/github-actions/oxipng"&gt;Optimizing PNGs in GitHub Actions using Oxipng&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/fly/fly-docker-registry"&gt;Using the Fly Docker registry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/fly/fly-logs-to-s3"&gt;Writing Fly logs to S3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/fly/wildcard-dns-ssl"&gt;Wildcard DNS and SSL on Fly&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-cloud"&gt;datasette-cloud&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fly"&gt;fly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/firecracker"&gt;firecracker&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="documentation"/><category term="projects"/><category term="datasette"/><category term="weeknotes"/><category term="datasette-cloud"/><category term="fly"/><category term="firecracker"/></entry><entry><title>GOV.UK Guidance: Documenting APIs</title><link href="https://simonwillison.net/2022/May/21/govuk-guidance-documenting-apis/#atom-tag" rel="alternate"/><published>2022-05-21T23:31:20+00:00</published><updated>2022-05-21T23:31:20+00:00</updated><id>https://simonwillison.net/2022/May/21/govuk-guidance-documenting-apis/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.gov.uk/guidance/how-to-document-apis"&gt;GOV.UK Guidance: Documenting APIs&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Characteristically excellent guide from GOV.UK on writing great API documentation. “Task-based guidance helps users complete the most common integration tasks, based on the user needs from your research.”

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/jamietanna/status/1528067067773075464"&gt;@jamietanna&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gov-uk"&gt;gov-uk&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="gov-uk"/></entry><entry><title>jq language description</title><link href="https://simonwillison.net/2022/Apr/26/jq-language-description/#atom-tag" rel="alternate"/><published>2022-04-26T19:04:09+00:00</published><updated>2022-04-26T19:04:09+00:00</updated><id>https://simonwillison.net/2022/Apr/26/jq-language-description/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/stedolan/jq/wiki/jq-Language-Description"&gt;jq language description&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I love jq but I’ve always found it difficult to remember how to use it, and the manual hasn’t helped me as much as I would hope. It turns out the jq wiki on GitHub offers an alternative, more detailed description of the language which fits the way my brain works a lot better.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=31166956"&gt;psacawa on Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/programming-languages"&gt;programming-languages&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jq"&gt;jq&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="programming-languages"/><category term="jq"/></entry><entry><title>Deno by example</title><link href="https://simonwillison.net/2022/Mar/17/deno-by-example/#atom-tag" rel="alternate"/><published>2022-03-17T01:02:00+00:00</published><updated>2022-03-17T01:02:00+00:00</updated><id>https://simonwillison.net/2022/Mar/17/deno-by-example/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://examples.deno.land/"&gt;Deno by example&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Interesting approach to documentation: a big list of annotated examples illustrating the Deno way of solving a bunch of common problems.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://blog.jim-nielsen.com/2022/deno-is-webby-pt-2/"&gt;Jim Nielsen: Deno is Webby (pt. 2)&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/deno"&gt;deno&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="deno"/></entry><entry><title>Instantly create a GitHub repository to take screenshots of a web page</title><link href="https://simonwillison.net/2022/Mar/14/shot-scraper-template/#atom-tag" rel="alternate"/><published>2022-03-14T16:52:27+00:00</published><updated>2022-03-14T16:52:27+00:00</updated><id>https://simonwillison.net/2022/Mar/14/shot-scraper-template/#atom-tag</id><summary type="html">
    &lt;p&gt;I just released &lt;strong&gt;&lt;a href="https://github.com/simonw/shot-scraper-template"&gt;shot-scraper-template&lt;/a&gt;&lt;/strong&gt;, a GitHub repository template that helps you start taking automated screenshots of a web page by filling out a form.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/shot-scraper"&gt;shot-scraper&lt;/a&gt; is my command line tool for &lt;a href="https://simonwillison.net/2022/Mar/10/shot-scraper/"&gt;taking screenshots&lt;/a&gt; of web pages and &lt;a href="https://simonwillison.net/2022/Mar/14/scraping-web-pages-shot-scraper/"&gt;scraping data from them&lt;/a&gt; using JavaScript.&lt;/p&gt;
&lt;p&gt;One of its uses is to help create and maintain screenshots for documentation, making it easy to update them to include changes to the design of the underlying pages.&lt;/p&gt;
&lt;p&gt;To make this as easy as possible, I've created a GitHub repository template that automates the process of setting up &lt;code&gt;shot-scraper&lt;/code&gt; to run against a URL.&lt;/p&gt;
&lt;p&gt;To try it out, start here:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/shot-scraper-template/generate"&gt;https://github.com/simonw/shot-scraper-template/generate&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/create-new-repository.png" alt="Screenshot of the 'create new repository from shot-scraper-template' page, which asks for a repository name and a description. The URL for the page you want to take screenshots of goes in the description." style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Pick a name for your new repository and paste the URL of the page you want to screenshot into the description field.&lt;/p&gt;
&lt;p&gt;Then click "Create repository from template".&lt;/p&gt;
&lt;p&gt;That's it! Your new repository will be created, a GitHub Actions automation script will run for a few seconds and your new screenshot will be added to the repository as a file called &lt;code&gt;shot.png&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Here's an example repository I created using the template: &lt;a href="https://github.com/simonw/simonwillison-net-shot"&gt;simonw/simonwillison-net-shot&lt;/a&gt; - and here's the &lt;code&gt;shot.png&lt;/code&gt; file from that repo:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://raw.githubusercontent.com/simonw/simonwillison-net-shot/main/shot.png" alt="A screenshot of simonwillison.net" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;You can re-take the screenshot any time you want by clicking the "Run workflow" button in the Actions tab:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/take-screenshots.png" alt="Click Actions, Take screenshots, Run workflow and then Run workflow" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Your repository will have a file in it called &lt;code&gt;shots.yml&lt;/code&gt; that initially looks like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://simonwillison.net/&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;shot.png&lt;/span&gt;
  &lt;span class="pl-ent"&gt;height&lt;/span&gt;: &lt;span class="pl-c1"&gt;800&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can edit that file to change the settings that apply to your screenshot, or to add further URLs to take shots of like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://simonwillison.net/&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;shot.png&lt;/span&gt;
  &lt;span class="pl-ent"&gt;height&lt;/span&gt;: &lt;span class="pl-c1"&gt;800&lt;/span&gt;
- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://www.example.com/&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;example.png&lt;/span&gt;
  &lt;span class="pl-ent"&gt;height&lt;/span&gt;: &lt;span class="pl-c1"&gt;800&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Further options are available here, as described in &lt;a href="https://github.com/simonw/shot-scraper#taking-multiple-screenshots"&gt;the shot-scraper README&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;How this works&lt;/h4&gt;
&lt;p&gt;This entire system is based around a single GitHub Actions workflow, in &lt;a href="https://github.com/simonw/shot-scraper-template/blob/main/.github/workflows/shots.yml"&gt;.github/workflows/shots.yml&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here's an annotated copy of that workflow showing how it all works.&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;&lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Take screnshots&lt;/span&gt;

&lt;span class="pl-ent"&gt;on&lt;/span&gt;:
  &lt;span class="pl-ent"&gt;push&lt;/span&gt;:
  &lt;span class="pl-ent"&gt;workflow_dispatch&lt;/span&gt;:&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The workflow triggers when a change is made to the repository (including edits to the &lt;code&gt;shots.yml&lt;/code&gt; file) or when the user manually clicks "Run workflow".&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;&lt;span class="pl-ent"&gt;jobs&lt;/span&gt;:
  &lt;span class="pl-ent"&gt;shot-scraper&lt;/span&gt;:
    &lt;span class="pl-ent"&gt;runs-on&lt;/span&gt;: &lt;span class="pl-s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="pl-ent"&gt;if&lt;/span&gt;: &lt;span class="pl-s"&gt;${{ github.repository != 'simonw/shot-scraper-template' }}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is the trick that makes everything else work, which I picked up &lt;a href="https://simonwillison.net/2021/Aug/28/dynamic-github-repository-templates/"&gt;from Bruno Rocha last year&lt;/a&gt;. It ensures that this workflow job only runs on copies of the template, not on the initial template repository itself.&lt;/p&gt;
&lt;p&gt;This is necessary because a later step creates a file in the repository if it doesn't yet exist based on the description URL provided by the user.&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;    &lt;span class="pl-ent"&gt;steps&lt;/span&gt;:
    - &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/checkout@v2&lt;/span&gt;
    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Set up Python 3.10&lt;/span&gt;
      &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/setup-python@v2&lt;/span&gt;
      &lt;span class="pl-ent"&gt;with&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;python-version&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;3.10&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
    - &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/cache@v2&lt;/span&gt;
      &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Configure pip caching&lt;/span&gt;
      &lt;span class="pl-ent"&gt;with&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;path&lt;/span&gt;: &lt;span class="pl-s"&gt;~/.cache/pip&lt;/span&gt;
        &lt;span class="pl-ent"&gt;key&lt;/span&gt;: &lt;span class="pl-s"&gt;${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}&lt;/span&gt;
        &lt;span class="pl-ent"&gt;restore-keys&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;          ${{ runner.os }}-pip-&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is boilerplate that I use in most of my GitHub Actions workflows: it sets up Python 3.10, and also &lt;a href="https://github.com/actions/cache"&gt;configures a cache&lt;/a&gt; such that Python requirements in a &lt;code&gt;requirements.txt&lt;/code&gt; file persist from one invocation to another without having to be re-downloaded from PyPI.&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Cache Playwright browsers&lt;/span&gt;
      &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/cache@v2&lt;/span&gt;
      &lt;span class="pl-ent"&gt;with&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;path&lt;/span&gt;: &lt;span class="pl-s"&gt;~/.cache/ms-playwright/&lt;/span&gt;
        &lt;span class="pl-ent"&gt;key&lt;/span&gt;: &lt;span class="pl-s"&gt;${{ runner.os }}-browsers&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;shot-scraper&lt;/code&gt; uses Microsoft's open source &lt;a href="https://playwright.dev/"&gt;Playwright&lt;/a&gt; browser automation tool. Playwright works by installing its own full Chromium browser. This line configures a cache for that browser, such that future invocations of the Action don't need to download another copy.&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Install dependencies&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;        pip install -r requirements.txt&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Install Playwright dependencies&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;        shot-scraper install&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;pip install&lt;/code&gt; line here installs the &lt;code&gt;shot-scraper&lt;/code&gt; CLI tool, which is written in Python.&lt;/p&gt;
&lt;p&gt;That &lt;code&gt;shot-scraper install&lt;/code&gt; line then triggers the Playwright mechanism to download and install the browser. This will do nothing if the browser has already been cached.&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;    - &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/github-script@v6&lt;/span&gt;
      &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Create shots.yml if missing on first run&lt;/span&gt;
      &lt;span class="pl-ent"&gt;with&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;script&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;          const fs = require('fs');&lt;/span&gt;
&lt;span class="pl-s"&gt;          if (!fs.existsSync('shots.yml')) {&lt;/span&gt;
&lt;span class="pl-s"&gt;              const desc = context.payload.repository.description;&lt;/span&gt;
&lt;span class="pl-s"&gt;              let line = '';&lt;/span&gt;
&lt;span class="pl-s"&gt;              if (desc &amp;amp;&amp;amp; (desc.startsWith('http://') || desc.startsWith('https://'))) {&lt;/span&gt;
&lt;span class="pl-s"&gt;                  line = `- url: ${desc}` + '\n  output: shot.png\n  height: 800';&lt;/span&gt;
&lt;span class="pl-s"&gt;              } else {&lt;/span&gt;
&lt;span class="pl-s"&gt;                  line = '# - url: https://www.example.com/\n#   output: shot.png\n#   height: 800';&lt;/span&gt;
&lt;span class="pl-s"&gt;              }&lt;/span&gt;
&lt;span class="pl-s"&gt;              fs.writeFileSync('shots.yml', line + '\n');&lt;/span&gt;
&lt;span class="pl-s"&gt;          }&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is the other key piece of magic. This uses GitHub's &lt;a href="https://github.com/actions/github-script"&gt;github-script&lt;/a&gt; action, which provides a Node.js environment with a &lt;code&gt;context&lt;/code&gt; object containing details about the actions run.&lt;/p&gt;
&lt;p&gt;It starts by reading the repository description from &lt;code&gt;context.payload.repository.description&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Then it creates a &lt;code&gt;shots.yml&lt;/code&gt; file based on that description - but only if the file does not exist already.&lt;/p&gt;
&lt;p&gt;If there's no repository description it creates one with a commented-out configuration instead, that looks like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; - url: https://www.example.com/&lt;/span&gt;
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt;   output: shot.png&lt;/span&gt;
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt;   height: 800&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The next step is to take the screenshots:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Take shots&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;        shot-scraper multi shots.yml&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;shot-scraper multi&lt;/code&gt; is &lt;a href="https://github.com/simonw/shot-scraper#taking-multiple-screenshots"&gt;documented here&lt;/a&gt; - it runs through the YAML file and takes each of the screenshots configured there in turn.&lt;/p&gt;
&lt;p&gt;Final step is to commit and push the new &lt;code&gt;shots.yml&lt;/code&gt; and &lt;code&gt;shot.png&lt;/code&gt; files to the repository:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;    - &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Commit and push&lt;/span&gt;
      &lt;span class="pl-ent"&gt;run&lt;/span&gt;: &lt;span class="pl-s"&gt;|-&lt;/span&gt;
&lt;span class="pl-s"&gt;        git config user.name "Automated"&lt;/span&gt;
&lt;span class="pl-s"&gt;        git config user.email "actions@users.noreply.github.com"&lt;/span&gt;
&lt;span class="pl-s"&gt;        git add -A&lt;/span&gt;
&lt;span class="pl-s"&gt;        timestamp=$(date -u)&lt;/span&gt;
&lt;span class="pl-s"&gt;        git commit -m "${timestamp}" || exit 0&lt;/span&gt;
&lt;span class="pl-s"&gt;        git pull --rebase&lt;/span&gt;
&lt;span class="pl-s"&gt;        git push&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This uses a pattern I describe &lt;a href="https://til.simonwillison.net/github-actions/commit-if-file-changed"&gt;in this TIL&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;GitHub Actions as a platform&lt;/h4&gt;
&lt;p&gt;I &lt;a href="https://twitter.com/simonw/status/1502838806113763329"&gt;tweeted this&lt;/a&gt; the other day, shortly before I came up with the idea for the &lt;code&gt;shot-scraper-template&lt;/code&gt; repository.&lt;/p&gt;
&lt;blockquote class="twitter-tweet"&gt;&lt;p lang="en" dir="ltr"&gt;Genuinely think GitHub Actions might be my favourite serverless platform right now&lt;/p&gt;- Simon Willison (@simonw) &lt;a href="https://twitter.com/simonw/status/1502838806113763329?ref_src=twsrc%5Etfw"&gt;March 13, 2022&lt;/a&gt;&lt;/blockquote&gt;
&lt;p&gt;This project demonstrates why. The amount of complex moving parts involved in &lt;code&gt;shot-scraper-template&lt;/code&gt; is pretty bewildering, but the end result is a free tool that anyone can use to start taking automated screenshots.&lt;/p&gt;
&lt;p&gt;And it doesn't cost me anything to provide the tool either!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/shot-scraper"&gt;shot-scraper&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="documentation"/><category term="projects"/><category term="github-actions"/><category term="shot-scraper"/></entry><entry><title>Weeknotes: Distracted by Playwright</title><link href="https://simonwillison.net/2022/Mar/12/weeknotes-playwright/#atom-tag" rel="alternate"/><published>2022-03-12T00:30:26+00:00</published><updated>2022-03-12T00:30:26+00:00</updated><id>https://simonwillison.net/2022/Mar/12/weeknotes-playwright/#atom-tag</id><summary type="html">
    &lt;p&gt;My goal for this week was to unblock progress on Datasette by finally finishing the dash encoding implementation I described last week. I was getting close, and then I got very distracted by &lt;a href="https://playwright.dev/"&gt;Playwright&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;Dash encoding v2&lt;/h4&gt;
&lt;p&gt;In &lt;a href="https://simonwillison.net/2022/Mar/5/dash-encoding/"&gt;Why I invented “dash encoding”, a new encoding scheme for URL paths&lt;/a&gt; I described a new mechanism I had invented for handling the gnarly problem of including table names with &lt;code&gt;/&lt;/code&gt; characters in the URL path on Datasette. The very short version: you can't use URL encoding in a path, because common proxies (including Apache and Nginx) will decode them before they get to your application.&lt;/p&gt;
&lt;p&gt;Thanks to feedback on that post I actually changed my design: I'm now using a variant of percent encoding that uses the &lt;code&gt;-&lt;/code&gt; instead of the &lt;code&gt;%&lt;/code&gt;. More &lt;a href="https://github.com/simonw/datasette/issues/1439#issuecomment-1059851259"&gt;details in the issue&lt;/a&gt; - and I'll write this up fully once I've finished landing the change.&lt;/p&gt;
&lt;h4&gt;shot-scraper and Playwright&lt;/h4&gt;
&lt;p&gt;I thoroughly &lt;a href="https://xkcd.com/356/"&gt;nerd-sniped&lt;/a&gt; myself with this one. I started investigating possibilities for automatically generating screeshots for documentation, and realized that &lt;a href="https://playwright.dev/"&gt;Playwright&lt;/a&gt; made this substantially easier than it has been in the past.&lt;/p&gt;
&lt;p&gt;The result was &lt;strong&gt;&lt;a href="https://simonwillison.net/2022/Mar/10/shot-scraper/"&gt;shot-scraper&lt;/a&gt;&lt;/strong&gt; - a new command-line utility for taking screenshots of web pages, or portions of web pages - and for running through a set of screenshots defined in a YAML file.&lt;/p&gt;
&lt;p&gt;I still can't quite believe how quickly this came together.&lt;/p&gt;
&lt;p&gt;Every now and then a tool comes along which adds a fundamental new set of capabilities to your toolbox, and can be multiplied against other tools to open up a huge range of possibilities.&lt;/p&gt;
&lt;p&gt;Playwright feels like one of those tools.&lt;/p&gt;
&lt;p&gt;A quick &lt;code&gt;pip install playwright&lt;/code&gt; is all it takes to start writing robust browser automation tools, using dedicated standalone headless instances of multiple browsers that are installed for you using &lt;code&gt;playwright install&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;It's easy to run in CI - getting it working in GitHub Actions was trivial.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;shot-scraper&lt;/code&gt; is my first project built on Playwright, but there will definitely be more.&lt;/p&gt;
&lt;h4&gt;shot-scraper accessibility&lt;/h4&gt;
&lt;p&gt;I started &lt;a href="https://twitter.com/simonw/status/1502044953836503048"&gt;a Twitter conversation&lt;/a&gt; asking for ways to write automated tests that exercise screen readers - not just running audit rules, but actually simulating what happens when a screen reader user attempts to navigate through a specific flow within an application.&lt;/p&gt;
&lt;p&gt;The most interesting answer I had was &lt;a href="https://twitter.com/bmustillrose/status/1502066504401141767"&gt;from Ben Mustill-Rose&lt;/a&gt;, who built a system for automating tests against an Android screen reader while working on BBC iPlayer - &lt;a href="https://youtu.be/-vEHOiIggss?t=253"&gt;demo here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;@fardarter &lt;a href="https://twitter.com/fardarter/status/1502045993667280905"&gt;pointed me&lt;/a&gt; back to Playwright again, which turns out to have an &lt;a href="https://playwright.dev/python/docs/api/class-accessibility"&gt;Accessibility snapshot&lt;/a&gt; mechanism that can dump out the current state of the Chromium accessibility tree.&lt;/p&gt;
&lt;p&gt;I couldn't resist &lt;a href="https://github.com/simonw/shot-scraper/issues/22"&gt;adding that to shot-scraper&lt;/a&gt; - so now you can run the following to see the accessibility tree for a web page:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;~ % shot-scraper accessibility https://datasette.io
{
    "role": "WebArea",
    "name": "Datasette: An open source multi-tool for exploring and publishing data",
    "children": [
        {
            "role": "link",
            "name": "Uses"
        },
        {
            "role": "link",
            "name": "Documentation"
        },
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href="https://gist.github.com/simonw/431e9075441463236850bab042b9d20d"&gt;Full output here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As a really fun bonus trick: since the output is JSON, you can pipe it into &lt;a href="https://sqlite-utils.datasette.io/en/stable/cli.html#inserting-json-data"&gt;sqlite-utils insert&lt;/a&gt; to get a SQLite database:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper accessibility https://datasette.io \
    | jq .children | sqlite-utils insert \
    /tmp/accessibility.db nodes - --alter
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And then open it in &lt;a href="https://datasette.io/desktop"&gt;Datasette Desktop&lt;/a&gt; and start faceting by role and heading level!&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/datasette-desktop-accessibility.jpg" alt="Datasette Desktop browsing the nodes table - it has text, link, heading, button and textbox roles and four different heading levels." style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;h4&gt;sqlite-utils documentation improvements&lt;/h4&gt;
&lt;p&gt;I complained on Twitter that the way type information was displayed in the Sphinx &lt;a href="https://sqlite-utils.datasette.io/en/stable/reference.html"&gt;sqlite-utils API reference documentation&lt;/a&gt; was ugly:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/docs-ugly.png" alt="Really long ugly type signatures" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Adam Johnson &lt;a href="https://twitter.com/AdamChainz/status/1502311047612575745"&gt;pointed me&lt;/a&gt; to the &lt;code&gt;autodoc_typehints = "description"&lt;/code&gt; option which fixes this. I spent a while tidying up the documentation to work better with this, mainly by adding a whole bunch of &lt;code&gt;:param name: description&lt;/code&gt; tags that I had previously omitted. That work happenen in &lt;a href="https://github.com/simonw/sqlite-utils/issues/413"&gt;this issue&lt;/a&gt;. I think it looks &lt;em&gt;much&lt;/em&gt; better now:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/docs-pretty.png" alt="Type signatures are much easier to read now, and there's a detailed list of parameters with descriptions." style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/image-diff"&gt;image-diff&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/image-diff/releases/tag/0.2.1"&gt;0.2.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/image-diff/releases"&gt;3 releases total&lt;/a&gt;) - 2022-03-11
&lt;br /&gt;CLI tool for comparing images&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/sqlite-utils/releases/tag/3.25.1"&gt;3.25.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/sqlite-utils/releases"&gt;98 releases total&lt;/a&gt;) - 2022-03-11
&lt;br /&gt;Python CLI utility and library for manipulating SQLite databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/shot-scraper"&gt;shot-scraper&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/shot-scraper/releases/tag/0.4"&gt;0.4&lt;/a&gt; - (&lt;a href="https://github.com/simonw/shot-scraper/releases"&gt;5 releases total&lt;/a&gt;) - 2022-03-10
&lt;br /&gt;Automated website screenshots using GitHub Actions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/django-sql-dashboard"&gt;django-sql-dashboard&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/django-sql-dashboard/releases/tag/1.0.2"&gt;1.0.2&lt;/a&gt; - (&lt;a href="https://github.com/simonw/django-sql-dashboard/releases"&gt;34 releases total&lt;/a&gt;) - 2022-03-08
&lt;br /&gt;Django app for building dashboards using raw SQL queries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/geojson-to-sqlite"&gt;geojson-to-sqlite&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/geojson-to-sqlite/releases/tag/1.0"&gt;1.0&lt;/a&gt; - (&lt;a href="https://github.com/simonw/geojson-to-sqlite/releases"&gt;8 releases total&lt;/a&gt;) - 2022-03-04
&lt;br /&gt;CLI tool for converting GeoJSON files to SQLite (with SpatiaLite)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/xml-analyser"&gt;xml-analyser&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/xml-analyser/releases/tag/1.3"&gt;1.3&lt;/a&gt; - (&lt;a href="https://github.com/simonw/xml-analyser/releases"&gt;4 releases total&lt;/a&gt;) - 2022-03-01
&lt;br /&gt;Simple command line tool for quickly analysing the structure of an arbitrary XML file&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-dateutil"&gt;datasette-dateutil&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-dateutil/releases/tag/0.3"&gt;0.3&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-dateutil/releases"&gt;4 releases total&lt;/a&gt;) - 2022-03-01
&lt;br /&gt;dateutil functions for Datasette&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/datasette/crawling-datasette-with-datasette"&gt;Crawling Datasette with Datasette&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/homebrew/latest-sqlite"&gt;Running the latest SQLite in Datasette using Homebrew&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/macos/python-installer-macos"&gt;Installing Python on macOS with the official Python installer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/gis/natural-earth-in-spatialite-and-datasette"&gt;Natural Earth in SpatiaLite and Datasette&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/pytest/coverage-with-context"&gt;pytest coverage with context&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/accessibility"&gt;accessibility&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sphinx-docs"&gt;sphinx-docs&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/playwright"&gt;playwright&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/shot-scraper"&gt;shot-scraper&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="accessibility"/><category term="documentation"/><category term="datasette"/><category term="weeknotes"/><category term="sphinx-docs"/><category term="playwright"/><category term="shot-scraper"/></entry><entry><title>shot-scraper: automated screenshots for documentation, built on Playwright</title><link href="https://simonwillison.net/2022/Mar/10/shot-scraper/#atom-tag" rel="alternate"/><published>2022-03-10T00:13:30+00:00</published><updated>2022-03-10T00:13:30+00:00</updated><id>https://simonwillison.net/2022/Mar/10/shot-scraper/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;a href="https://github.com/simonw/shot-scraper"&gt;shot-scraper&lt;/a&gt; is a new tool that I’ve built to help automate the process of keeping screenshots up-to-date in my documentation. It also doubles as a scraping tool - hence the name - which I picked as a complement to my &lt;a href="https://simonwillison.net/2020/Oct/9/git-scraping/"&gt;git scraping&lt;/a&gt; and &lt;a href="https://simonwillison.net/2022/Feb/2/help-scraping/"&gt;help scraping&lt;/a&gt; techniques.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update 13th March 2022:&lt;/strong&gt; The new &lt;code&gt;shot-scraper javascript&lt;/code&gt; command can now be used to &lt;a href="https://simonwillison.net/2022/Mar/14/scraping-web-pages-shot-scraper/"&gt;scrape web pages from the command line&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update 14th October 2022:&lt;/strong&gt; &lt;a href="https://simonwillison.net/2022/Oct/14/automating-screenshots/"&gt;Automating screenshots for the Datasette documentation using shot-scraper&lt;/a&gt; offers a tutorial introduction to using the tool.&lt;/p&gt;
&lt;h4&gt;The problem&lt;/h4&gt;
&lt;p&gt;I like to include screenshots in documentation. I recently &lt;a href="https://simonwillison.net/2022/Feb/27/datasette-tutorials/"&gt;started writing end-user tutorials&lt;/a&gt; for Datasette, which are particularly image heavy (&lt;a href="https://datasette.io/tutorials/explore"&gt;for example&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;As software changes over time, screenshots get out-of-date. I don't like the idea of stale screenshots, but I also don't want to have to manually recreate them every time I make the tiniest tweak to the visual appearance of my software.&lt;/p&gt;
&lt;h4&gt;Introducing shot-scraper&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;shot-scraper&lt;/code&gt; is a tool for automating this process. You can install it using &lt;code&gt;pip&lt;/code&gt; like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pip install shot-scraper
shot-scraper install
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That second &lt;code&gt;shot-scraper install&lt;/code&gt; line will install the browser it needs to do its job - more on that later.&lt;/p&gt;
&lt;p&gt;You can use it in two ways. To take a one-off screenshot, you can run it like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper https://simonwillison.net/ -o simonwillison.png
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Or if you want to take a set of screenshots in a repeatable way, you can define them in a YAML file that looks like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://simonwillison.net/&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;simonwillison.png&lt;/span&gt;
- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://www.example.com/&lt;/span&gt;
  &lt;span class="pl-ent"&gt;width&lt;/span&gt;: &lt;span class="pl-c1"&gt;400&lt;/span&gt;
  &lt;span class="pl-ent"&gt;height&lt;/span&gt;: &lt;span class="pl-c1"&gt;400&lt;/span&gt;
  &lt;span class="pl-ent"&gt;quality&lt;/span&gt;: &lt;span class="pl-c1"&gt;80&lt;/span&gt;
  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;example.jpg&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And then use &lt;code&gt;shot-scraper multi&lt;/code&gt; to execute every screenshot in one go:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;% shot-scraper multi shots.yml 
Screenshot of 'https://simonwillison.net/' written to 'simonwillison.png'
Screenshot of 'https://www.example.com/' written to 'example.jpg'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href="https://shot-scraper.datasette.io/en/stable/screenshots.html"&gt;The documentation&lt;/a&gt; describes all of the available options you can use when taking a screenshot.&lt;/p&gt;
&lt;p&gt;Each option can be provided to the &lt;code&gt;shot-scraper&lt;/code&gt; one-off tool, or can be embedded in the YAML file for use with &lt;code&gt;shot-scraper multi&lt;/code&gt;.&lt;/p&gt;
&lt;h4&gt;JavaScript and CSS selectors&lt;/h4&gt;
&lt;p&gt;The default behaviour for &lt;code&gt;shot-scraper&lt;/code&gt; is to take a full page screenshot, using a browser width of 1280px.&lt;/p&gt;
&lt;p&gt;For documentation screenshots you probably don't want the whole page though - you likely want to create an image of one specific part of the interface.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;--selector&lt;/code&gt; option allows you to specify an area of the page by CSS selector. The resulting image will consist just of that part of the page.&lt;/p&gt;
&lt;p&gt;What if you want to modify the page in addition to selecting a specific area?&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;--javascript&lt;/code&gt; option lets you pass in a block of JavaScript code which will be injected into the page and executed after the page has loaded, but before the screenshot is taken.&lt;/p&gt;
&lt;p&gt;The combination of these two options - also available as &lt;code&gt;javascript:&lt;/code&gt; and &lt;code&gt;selector:&lt;/code&gt; keys in the YAML file - should be flexible enough to cover the custom screenshot case for documentation.&lt;/p&gt;
&lt;h4 id="a-complex-example"&gt;A complex example&lt;/h4&gt;
&lt;p&gt;To prove to myself that the tool works, I decided to try replicating this screenshot from &lt;a href="https://datasette.io/tutorials/explore"&gt;my tutorial&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I made the original using &lt;a href="https://cleanshot.com/"&gt;CleanShot X&lt;/a&gt;, manually adding the two pink arrows:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/select-facets-original.jpg" alt="A screenshot of a portion of the table interface in Datasette, with a menu open and two pink arrows pointing to menu items" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;This is pretty tricky!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It's not &lt;a href="https://congress-legislators.datasettes.com/legislators/executive_terms?start__startswith=18&amp;amp;type=prez"&gt;this whole page&lt;/a&gt;, just a subset of the page&lt;/li&gt;
&lt;li&gt;The cog menu for one of the columns is open, which means the cog icon needs to be clicked before taking the screenshot&lt;/li&gt;
&lt;li&gt;There are two pink arrows superimposed on the image&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I decided to do use just one arrow for the moment, which should hopefully result in a clearer image.&lt;/p&gt;
&lt;p&gt;I started by &lt;a href="https://github.com/simonw/shot-scraper/issues/9#issuecomment-1063314278"&gt;creating my own pink arrow SVG&lt;/a&gt; using Figma:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/pink-arrow.png" alt="A big pink arrow, with a drop shadow" style="width: 200px; max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I then fiddled around in the Firefox developer console for quite a while, working out the JavaScript needed to trim the page down to the bit I wanted, open the menu and position the arrow.&lt;/p&gt;
&lt;p&gt;With the JavaScript figured out, I pasted it into a YAML file called &lt;code&gt;shot.yml&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;- &lt;span class="pl-ent"&gt;url&lt;/span&gt;: &lt;span class="pl-s"&gt;https://congress-legislators.datasettes.com/legislators/executive_terms?start__startswith=18&amp;amp;type=prez&lt;/span&gt;
  &lt;span class="pl-ent"&gt;javascript&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;    new Promise(resolve =&amp;gt; {&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Run in a promise so we can sleep 1s at the end&lt;/span&gt;
&lt;span class="pl-s"&gt;      function remove(el) { el.parentNode.removeChild(el);}&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Remove header and footer&lt;/span&gt;
&lt;span class="pl-s"&gt;      remove(document.querySelector('header'));&lt;/span&gt;
&lt;span class="pl-s"&gt;      remove(document.querySelector('footer'));&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Remove most of the children of .content&lt;/span&gt;
&lt;span class="pl-s"&gt;      Array.from(document.querySelectorAll('.content &amp;gt; *:not(.table-wrapper,.suggested-facets)')).map(remove)&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Bit of breathing room for the screenshot&lt;/span&gt;
&lt;span class="pl-s"&gt;      document.body.style.marginTop = '10px';&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Add a bit of padding to .content&lt;/span&gt;
&lt;span class="pl-s"&gt;      var content = document.querySelector('.content');&lt;/span&gt;
&lt;span class="pl-s"&gt;      content.style.width = '820px';&lt;/span&gt;
&lt;span class="pl-s"&gt;      content.style.padding = '10px';&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Open the menu - it's an SVG so we need to use dispatchEvent here&lt;/span&gt;
&lt;span class="pl-s"&gt;      document.querySelector('th.col-executive_id svg').dispatchEvent(new Event('click'));&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Remove all but table header and first 11 rows&lt;/span&gt;
&lt;span class="pl-s"&gt;      Array.from(document.querySelectorAll('tr')).slice(12).map(remove);&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Add a pink SVG arrow&lt;/span&gt;
&lt;span class="pl-s"&gt;      let div = document.createElement('div');&lt;/span&gt;
&lt;span class="pl-s"&gt;      div.innerHTML = `&amp;lt;svg width="104" height="60" fill="none" xmlns="http://www.w3.org/2000/svg"&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;        &amp;lt;g filter="url(#a)"&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;          &amp;lt;path fill-rule="evenodd" clip-rule="evenodd" d="m76.7 1 2 2 .2-.1.1.4 20 20a3.5 3.5 0 0 1 0 5l-20 20-.1.4-.3-.1-1.9 2a3.5 3.5 0 0 1-5.4-4.4l3.2-14.4H4v-12h70.6L71.3 5.4A3.5 3.5 0 0 1 76.7 1Z" fill="#FF31A0"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;        &amp;lt;/g&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;        &amp;lt;defs&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;          &amp;lt;filter id="a" x="0" y="0" width="104" height="59.5" filterUnits="userSpaceOnUse" color-interpolation-filters="sRGB"&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;              &amp;lt;feFlood flood-opacity="0" result="BackgroundImageFix"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;              &amp;lt;feColorMatrix in="SourceAlpha" values="0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 127 0" result="hardAlpha"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;              &amp;lt;feOffset dy="4"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;              &amp;lt;feGaussianBlur stdDeviation="2"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;              &amp;lt;feComposite in2="hardAlpha" operator="out"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;              &amp;lt;feColorMatrix values="0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.25 0"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;              &amp;lt;feBlend in2="BackgroundImageFix" result="effect1_dropShadow_2_26"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;              &amp;lt;feBlend in="SourceGraphic" in2="effect1_dropShadow_2_26" result="shape"/&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;          &amp;lt;/filter&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;        &amp;lt;/defs&amp;gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;      &amp;lt;/svg&amp;gt;`;&lt;/span&gt;
&lt;span class="pl-s"&gt;      let svg = div.firstChild;&lt;/span&gt;
&lt;span class="pl-s"&gt;      content.appendChild(svg);&lt;/span&gt;
&lt;span class="pl-s"&gt;      content.style.position = 'relative';&lt;/span&gt;
&lt;span class="pl-s"&gt;      svg.style.position = 'absolute';&lt;/span&gt;
&lt;span class="pl-s"&gt;      // Give the menu time to finish fading in&lt;/span&gt;
&lt;span class="pl-s"&gt;      setTimeout(() =&amp;gt; {&lt;/span&gt;
&lt;span class="pl-s"&gt;        // Position arrow pointing to the 'facet by this' menu item&lt;/span&gt;
&lt;span class="pl-s"&gt;        var pos = document.querySelector('.dropdown-facet').getBoundingClientRect();&lt;/span&gt;
&lt;span class="pl-s"&gt;        svg.style.left = (pos.left - pos.width) + 'px';&lt;/span&gt;
&lt;span class="pl-s"&gt;        svg.style.top = (pos.top - 20) + 'px';&lt;/span&gt;
&lt;span class="pl-s"&gt;        resolve();&lt;/span&gt;
&lt;span class="pl-s"&gt;      }, 1000);&lt;/span&gt;
&lt;span class="pl-s"&gt;    });&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;  &lt;span class="pl-ent"&gt;output&lt;/span&gt;: &lt;span class="pl-s"&gt;annotated-screenshot.png&lt;/span&gt;
  &lt;span class="pl-ent"&gt;selector&lt;/span&gt;: &lt;span class="pl-s"&gt;.content&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And ran this command to generate the screenshot:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;shot-scraper multi shot.yml
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The generated &lt;code&gt;annotated-screenshot.png&lt;/code&gt; image looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/annotated-screenshot.png" alt="A screenshot of the table with the menu open and a single pink arrow pointing to the 'facet by this' menu item" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I'm pretty happy with this! I think it works very well as a proof of concept for the process.&lt;/p&gt;
&lt;h4 id="how-it-works-playwright"&gt;How it works: Playwright&lt;/h4&gt;
&lt;p&gt;I built the &lt;a href="https://github.com/simonw/shot-scraper/tree/44995cd45ca6c56d34c5c3d131217f7b9170f6f7"&gt;first prototype&lt;/a&gt; of &lt;code&gt;shot-scraper&lt;/code&gt; using Puppeteer, because I had &lt;a href="https://simonwillison.net/2020/Sep/3/weeknotes-airtable-screenshots-dogsheep/"&gt;used that before&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Then I noticed that the &lt;a href="https://www.npmjs.com/package/puppeteer-cli"&gt;puppeteer-cli&lt;/a&gt; package I was using hadn't had an update in two years, which reminded me to check out Playwright.&lt;/p&gt;
&lt;p&gt;I've been looking for an excuse to learn &lt;a href="https://playwright.dev/"&gt;Playwright&lt;/a&gt; for a while now, and this project turned out to be ideal.&lt;/p&gt;
&lt;p&gt;Playwright is Microsoft's open source browser automation framework. They promote it as a testing tool, but it has plenty of applications outside of testing - screenshot automation and screen scraping being two of the most obvious.&lt;/p&gt;
&lt;p&gt;Playwright is comprehensive: it downloads its own custom browser builds, and can run tests across multiple different rendering engines.&lt;/p&gt;
&lt;p&gt;The second prototype used the &lt;a href="https://github.com/simonw/shot-scraper/tree/b3318b2f27ca1526d5a9f06de50cf9900dd4d8d0"&gt;Playwright CLI utility&lt;/a&gt; instead, &lt;a href="https://github.com/simonw/shot-scraper/blob/b3318b2f27ca1526d5a9f06de50cf9900dd4d8d0/shot_scraper/cli.py#L39-L50"&gt;executed via npx&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-s1"&gt;subprocess&lt;/span&gt;.&lt;span class="pl-en"&gt;run&lt;/span&gt;(
    [
        &lt;span class="pl-s"&gt;"npx"&lt;/span&gt;,
        &lt;span class="pl-s"&gt;"playwright"&lt;/span&gt;,
        &lt;span class="pl-s"&gt;"screenshot"&lt;/span&gt;,
        &lt;span class="pl-s"&gt;"--full-page"&lt;/span&gt;,
        &lt;span class="pl-s1"&gt;url&lt;/span&gt;,
        &lt;span class="pl-s1"&gt;output&lt;/span&gt;,
    ],
    &lt;span class="pl-s1"&gt;capture_output&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;True&lt;/span&gt;,
)&lt;/pre&gt;
&lt;p&gt;This could take a full page screenshot, but that CLI tool wasn't flexible enough to take screenshots of specific elements. So I needed to switch to the Playwright programmatic API.&lt;/p&gt;
&lt;p&gt;I started out trying to get Python to generate and pass JavaScript to the Node.js library... and then I spotted the official &lt;a href="https://playwright.dev/python/docs/intro"&gt;Playwright for Python&lt;/a&gt; package.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pip install playwright
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It's amazing! It has the exact same functionality as the JavaScript library - the same classes, the same methods. Everything just works, in both languages.&lt;/p&gt;
&lt;p&gt;I was curious how they pulled this off, so I dug inside the &lt;code&gt;playwright&lt;/code&gt; Python package in my &lt;code&gt;site-packages&lt;/code&gt; folder... and found it bundles a full Node.js binary executable and uses it to bridge the two worlds! What a wild hack.&lt;/p&gt;
&lt;p&gt;Thanks to Playwright, the entire implementation of &lt;code&gt;shot-scraper&lt;/code&gt; is currently just &lt;a href="https://github.com/simonw/shot-scraper/blob/0.3/shot_scraper/cli.py"&gt;181 lines of Python code&lt;/a&gt; - it's all glue code tying together a &lt;a href="https://click.palletsprojects.com/"&gt;Click&lt;/a&gt; CLI interface with some code that calls Playwright to do the actual work.&lt;/p&gt;
&lt;p&gt;I couldn't be more impressed with Playwright. I'll definitely be using it for other projects - for one thing, I think I'll finally be able to add automated tests to my &lt;a href="https://datasette.io/desktop"&gt;Datasette Desktop&lt;/a&gt; Electron application.&lt;/p&gt;
&lt;h4&gt;Hooking shot-scraper up to GitHub Actions&lt;/h4&gt;
&lt;p&gt;I built &lt;code&gt;shot-scraper&lt;/code&gt; very much with GitHub Actions in mind.&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://github.com/simonw/shot-scraper-demo"&gt;shot-scraper-demo&lt;/a&gt; repository is my first live demo of the tool.&lt;/p&gt;
&lt;p&gt;Once a day, it runs &lt;a href="https://github.com/simonw/shot-scraper-demo/blob/3fdd9d3e79f95d9d396aeefd5bf65e85a7700ef4/.github/workflows/shots.yml"&gt;this shots.yml&lt;/a&gt; file, generates two screenshots and commits them back to the repository.&lt;/p&gt;
&lt;p&gt;One of them is the tutorial screenshot described above.&lt;/p&gt;
&lt;p&gt;The other is a screenshot of the list of "recently spotted owls" from &lt;a href="https://www.owlsnearme.com/?place=127871"&gt;this page&lt;/a&gt; on &lt;a href="https://www.owlsnearme.com/"&gt;owlsnearme.com&lt;/a&gt;. I wanted a page that would change on an occasional basis, to demonstrate GitHub's neat image diffing interface.&lt;/p&gt;
&lt;p&gt;I may need to change that demo though! That page includes "spotted 5 hours ago" text, which means that there's almost always a tiny pixel difference, &lt;a href="https://github.com/simonw/shot-scraper-demo/commit/bc86510f49b6f8d6728c9f1880b999c83361dd5a#diff-897c3444fbbb2033cbba5840da4994d01c3f396e0cdf4b0613d7f410db9887e0"&gt;like this one&lt;/a&gt; (use the "swipe" comparison tool to watch 6 hours ago change to 7 hours ago under the top left photo).&lt;/p&gt;
&lt;p&gt;Storing image files that change frequently in a free repository on GitHub feels rude to me, so please use this tool cautiously there!&lt;/p&gt;
&lt;h4&gt;What's next?&lt;/h4&gt;
&lt;p&gt;I had ambitious plans to add utilities to the tool that would &lt;a href="https://github.com/simonw/shot-scraper/issues/9"&gt;help with annotations&lt;/a&gt;, such as adding pink arrows and drawing circles around different elements on the page.&lt;/p&gt;
&lt;p&gt;I've shelved those plans for the moment: as the demo above shows, the JavaScript hook is good enough. I may revisit this later once common patterns have started to emerge.&lt;/p&gt;
&lt;p&gt;So really, my next step is to start using this tool for my own projects - to generate screenshots for my documentation.&lt;/p&gt;
&lt;p&gt;I'm also very interested to see what kinds of things other people use this for.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cli"&gt;cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/scraping"&gt;scraping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/git-scraping"&gt;git-scraping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/puppeteer"&gt;puppeteer&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/playwright"&gt;playwright&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/shot-scraper"&gt;shot-scraper&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="cli"/><category term="documentation"/><category term="projects"/><category term="scraping"/><category term="github-actions"/><category term="git-scraping"/><category term="puppeteer"/><category term="playwright"/><category term="shot-scraper"/></entry><entry><title>Weeknotes: Datasette Tutorials</title><link href="https://simonwillison.net/2022/Feb/27/datasette-tutorials/#atom-tag" rel="alternate"/><published>2022-02-27T17:35:14+00:00</published><updated>2022-02-27T17:35:14+00:00</updated><id>https://simonwillison.net/2022/Feb/27/datasette-tutorials/#atom-tag</id><summary type="html">
    &lt;p&gt;I published &lt;a href="https://datasette.io/tutorials"&gt;two new tutorials&lt;/a&gt; for Datasette this week, both focused at end-users of the web application.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://datasette.io/tutorials/explore"&gt;Exploring a database with Datasette&lt;/a&gt;&lt;/strong&gt; shows how to use Datasette as an exploratory data analysis tool, using facets and filters to get a good feeling for a new database.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://datasette.io/tutorials/learn-sql"&gt;Learn SQL with Datasette&lt;/a&gt;&lt;/strong&gt; introduces Datasette's SQL query interface and uses it to teach basic SQL as well as a few more advanced tricks too.&lt;/p&gt;
&lt;p&gt;Datasette already has &lt;a href="https://docs.datasette.io/"&gt;a lot of documentation&lt;/a&gt;, but so far it's all been written to serve people who are administering or customizing Datasette instances. The user interface itself has been mostly undocumented.&lt;/p&gt;
&lt;p&gt;Daniele Procida's &lt;a href="https://diataxis.fr"&gt;Diátaxis documentation framework&lt;/a&gt; describes four categories of documentation: tutorials, how-to guides, technical reference and explanation. Datasette is heavy on  the last two but light on the first.&lt;/p&gt;
&lt;p&gt;These &lt;a href="https://datasette.io/tutorials"&gt;new tutorials&lt;/a&gt; are my initial attempt at redressing the balance. I adapted them from a workshop I presented on Friday at the &lt;a href="https://headlineclub.org/2022/01/28/foia-fest-returns-for-10th-annual-conference/"&gt;FOIA Fest&lt;/a&gt; data journalism conference in (virtual) Chicago.&lt;/p&gt;
&lt;p&gt;Writing documentation for end-users has been an interesting experience! I chose to lean heavily into screenshots, live examples and exercises. I'm &lt;a href="https://github.com/simonw/datasette/discussions/1643"&gt;eager for feedback&lt;/a&gt; from people to help me understand if what I've done is working, and I'm keen for suggestions on how to improve them and what to write next.&lt;/p&gt;
&lt;p&gt;The example database I used for the tutorial is pretty fun: &lt;a href="https://congress-legislators.datasettes.com"&gt;https://congress-legislators.datasettes.com&lt;/a&gt; - a database of USA senators, congresspeople, presidents and vice presidents built using CC0 data from the absolutely brilliant &lt;a href="https://github.com/unitedstates/congress-legislators"&gt;unitedstates/congress-legislators&lt;/a&gt; repository (which just accepted &lt;a href="https://github.com/unitedstates/congress-legislators/pull/819"&gt;my first PR&lt;/a&gt;!)&lt;/p&gt;
&lt;p&gt;This was my first attempt at writing end-user facing documentation for a personal project, and it turned out to have the same effect as writing developer documentation: the moment you try to describe how to use a feature the flaws in how that feature works from a usability perspective become strikingly evident!&lt;/p&gt;
&lt;p&gt;I'm convinced that writing comprehensive documentation is a massively underrated technique for better software design.&lt;/p&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-render-markdown"&gt;datasette-render-markdown&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-render-markdown/releases/tag/2.1"&gt;2.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-render-markdown/releases"&gt;9 releases total&lt;/a&gt;) - 2022-02-26
&lt;br /&gt;Datasette plugin for rendering Markdown&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-redirect-forbidden"&gt;datasette-redirect-forbidden&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-redirect-forbidden/releases/tag/0.1"&gt;0.1&lt;/a&gt; - 2022-02-23
&lt;br /&gt;Redirect forbidden requests to a login page&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/sqlite-diffable"&gt;sqlite-diffable&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/sqlite-diffable/releases/tag/0.2.1"&gt;0.2.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/sqlite-diffable/releases"&gt;3 releases total&lt;/a&gt;) - 2022-02-21
&lt;br /&gt;Tools for dumping/loading a SQLite database to diffable directory structure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/google-drive-to-sqlite"&gt;google-drive-to-sqlite&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/google-drive-to-sqlite/releases/tag/0.4"&gt;0.4&lt;/a&gt; - (&lt;a href="https://github.com/simonw/google-drive-to-sqlite/releases"&gt;6 releases total&lt;/a&gt;) - 2022-02-20
&lt;br /&gt;Create a SQLite database containing metadata from Google Drive&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/sqlite-utils/releases/tag/3.24"&gt;3.24&lt;/a&gt; - (&lt;a href="https://github.com/simonw/sqlite-utils/releases"&gt;96 releases total&lt;/a&gt;) - 2022-02-16
&lt;br /&gt;Python CLI utility and library for manipulating SQLite databases&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/sqlite/substr-instr"&gt;Combining substr and instr to extract text&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/googlecloud/google-oauth-cli-application"&gt;Google OAuth for a CLI application&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/googlecloud/recursive-fetch-google-drive"&gt;Recursively fetching metadata for all files in a Google Drive folder&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/graphql/graphql-with-curl"&gt;Using curl to run GraphQL queries from the command line&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="documentation"/><category term="datasette"/><category term="weeknotes"/></entry><entry><title>Making world-class docs takes effort</title><link href="https://simonwillison.net/2021/Sep/6/making-world-class-docs-takes-effort/#atom-tag" rel="alternate"/><published>2021-09-06T18:58:27+00:00</published><updated>2021-09-06T18:58:27+00:00</updated><id>https://simonwillison.net/2021/Sep/6/making-world-class-docs-takes-effort/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://daniel.haxx.se/blog/2021/09/04/making-world-class-docs-takes-effort/"&gt;Making world-class docs takes effort&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Curl maintainer Daniel Stenberg writes about his principles for good documentation. I agree with all of these: he emphasizes keeping docs in the repo, avoiding the temptation to exclusively generate them from code, featuring examples and ensuring every API you provide has documentation. Daniel describes an approach similar to the documentation unit tests I’ve been using for my own projects: he has scripts which scan the curl documentation to ensure not only that everything is documented but that each documentation area contains the same sections in the same order.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=28414308"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/curl"&gt;curl&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/daniel-stenberg"&gt;daniel-stenberg&lt;/a&gt;&lt;/p&gt;



</summary><category term="curl"/><category term="documentation"/><category term="daniel-stenberg"/></entry><entry><title>The Diátaxis documentation framework</title><link href="https://simonwillison.net/2021/Aug/21/diataxis/#atom-tag" rel="alternate"/><published>2021-08-21T22:59:53+00:00</published><updated>2021-08-21T22:59:53+00:00</updated><id>https://simonwillison.net/2021/Aug/21/diataxis/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://diataxis.fr/"&gt;The Diátaxis documentation framework&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Daniele Procida’s model of four types of technical documentation—tutorials, how-to guides, technical reference and explanation—now has a name: Diátaxis.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/diataxis"&gt;diataxis&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="diataxis"/></entry><entry><title>Datasette on Codespaces, sqlite-utils API reference documentation and other weeknotes</title><link href="https://simonwillison.net/2021/Aug/14/datasette-on-codespaces/#atom-tag" rel="alternate"/><published>2021-08-14T04:57:12+00:00</published><updated>2021-08-14T04:57:12+00:00</updated><id>https://simonwillison.net/2021/Aug/14/datasette-on-codespaces/#atom-tag</id><summary type="html">
    &lt;p&gt;This week I &lt;a href="https://datasette.substack.com/p/everything-new-in-datasette-since"&gt;broke my streak&lt;/a&gt; of &lt;em&gt;not&lt;/em&gt; sending out the Datasette newsletter, figured out how to use Sphinx for Python class documentation, worked out how to run Datasette on GitHub Codespaces, implemented Datasette column metadata and got tantalizingly close to a solution for an elusive Datasette feature.&lt;/p&gt;
&lt;h4&gt;API reference documentation for sqlite-utils using Sphinx&lt;/h4&gt;
&lt;p&gt;I've never been a big fan of Javadoc-style API documentation: I usually find that documentation structured around classes and methods fails to show me how to actually use those classes to solve real-world problems. I've tended to avoid it for my own projects.&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://sqlite-utils.datasette.io/"&gt;sqlite-utils Python library&lt;/a&gt; has a ton of functionality, but it mainly boils down to two classes: &lt;code&gt;Database&lt;/code&gt; and &lt;code&gt;Table&lt;/code&gt;. Since it  already has pretty comprehesive narrative documentation explaining the different problems it can solve, I decided to try experimenting with the &lt;a href="https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html"&gt;Sphinx autodoc&lt;/a&gt; module to produce some classic &lt;a href="https://sqlite-utils.datasette.io/en/stable/reference.html"&gt;API reference documentation&lt;/a&gt; for it:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of the new API reference documentation" src="https://static.simonwillison.net/static/2021/sqlite-utils-api-doc.png" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Since autodoc works from docstrings, this was also a great excuse to add more comprehensive docstrings and type hints to the library. This helps tools like Jupyter notebooks and VS Code display more useful inline help.&lt;/p&gt;
&lt;p&gt;This proved to be time well spent! Here's what &lt;code&gt;sqlite-utils&lt;/code&gt; looks like in VS Code now:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of VS Code showing inline help for the enable_fts() method" src="https://static.simonwillison.net/static/2021/vs-code-hints.png" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Running &lt;code&gt;mypy&lt;/code&gt; against the type hints also helped me identify and fix a couple of obscure edge-case bugs in the existing methods, detailed in &lt;a href="https://sqlite-utils.datasette.io/en/stable/changelog.html#v3-15-1"&gt;the 3.15.1 release notes&lt;/a&gt;. It's taken me a few years but I'm finally starting to come round to Python's optional typing as being worth the additional effort!&lt;/p&gt;
&lt;p&gt;Figuring out how to use autodoc in Sphinx, and then how to get the documentation to build correctly on &lt;a href="https://readthedocs.org/"&gt;Read The Docs&lt;/a&gt; took some effort. I wrote up what I learned in &lt;a href="https://til.simonwillison.net/sphinx/sphinx-autodoc"&gt;this TIL&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;Datasette on GitHub Codespaces&lt;/h4&gt;
&lt;p&gt;GitHub released their new &lt;a href="https://github.com/features/codespaces"&gt;Codespaces&lt;/a&gt; online development environments to general availability this week and I'm really excited about it. I ran a team at Eventbrite for a while resonsible for development environment tooling and it really was shocking how much time and money was lost to broken local development environments, even with a significant amount of engineering effort applied to the problem.&lt;/p&gt;
&lt;p&gt;Codespaces promises a fresh, working development environment on-demand any time you need it. That's a very exciting premise! Their detailed write-up of how they convinced GitHub's own internal engineers to move to it is full of &lt;a href="https://github.blog/2021-08-11-githubs-engineering-team-moved-codespaces/"&gt;intriguing details&lt;/a&gt; - getting an existing application working with it is no small feat, but the pay-off looks very promising indeed.&lt;/p&gt;
&lt;p&gt;So... I decided to try and get Datasette running on it. It works really well!&lt;/p&gt;
&lt;p&gt;You can run Datasette in any Codespace environment using the following steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Open the terminal. Three-bar-menu-icon, View, Terminal does the trick.&lt;/li&gt;
&lt;li&gt;In the terminal run &lt;code&gt;pip install datasette datasette-x-forwarded-host&lt;/code&gt; (more on this in a moment).&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;datasette&lt;/code&gt; - Codespaces will automatically setup port forwarding and give you a link to "Open in Browser" - click the link and you're done!&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;You can &lt;code&gt;pip install sqlite-utils&lt;/code&gt; and then use &lt;a href="https://sqlite-utils.datasette.io/en/stable/cli.html#inserting-csv-or-tsv-data"&gt;sqlite-utils insert&lt;/a&gt; to create SQLite databases to use with Datasette.&lt;/p&gt;
&lt;p&gt;There was one catch: the first time I ran Datasette, clicking on any of the internal links within the web application took me to &lt;code&gt;http://localhost/&lt;/code&gt; pages that broke with a 404.&lt;/p&gt;
&lt;p&gt;It turns out the Codespaces proxy sends a &lt;code&gt;host: localhost&lt;/code&gt; header - which Datasette then uses to incorrectly construct internal URLs.&lt;/p&gt;
&lt;p&gt;So I wrote a tiny ASGI plugin, &lt;a href="https://datasette.io/plugins/datasette-x-forwarded-host"&gt;datasette-x-forwarded-host&lt;/a&gt;, which takes the incoming &lt;code&gt;X-Forwarded-Host&lt;/code&gt; provided by Codespaces and uses that as the &lt;code&gt;Host&lt;/code&gt; header within Datasette itself. After that everything worked fine.&lt;/p&gt;
&lt;h4&gt;sqlite-utils insert --flatten&lt;/h4&gt;
&lt;p&gt;Early this week I finally figured out &lt;a href="https://cloud.google.com/run/docs/logging"&gt;Cloud Run logging&lt;/a&gt;. It's actually really good! In doing so, I worked out &lt;a href="https://til.simonwillison.net/cloudrun/tailing-cloud-run-request-logs"&gt;a convoluted recipe&lt;/a&gt; for tailing the JSON logs locally and piping them into a SQLite database so that I could analyze them with Datasette.&lt;/p&gt;
&lt;p&gt;Part of the reason it was convoluted is that Cloud Run logs feature nested JSON, but &lt;a href="https://sqlite-utils.datasette.io/en/stable/cli.html#inserting-json-data"&gt;sqlite-utils insert&lt;/a&gt; only works against an array of flat JSON objects. I had to use &lt;a href="https://til.simonwillison.net/jq/flatten-nested-json-objects-jq"&gt;this jq monstrosity&lt;/a&gt; to flatten the nested JSON into key/value pairs.&lt;/p&gt;
&lt;p&gt;Since I've had to solve this problem a few times now I decided to improve &lt;code&gt;sqlite-utils&lt;/code&gt; to have it do the work instead. You can now use the new &lt;code&gt;--flatten&lt;/code&gt; option like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sqlite-utils insert logs.db logs log.json --flatten
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To create a schema that flattens nested objects into a &lt;code&gt;topkey_nextkey&lt;/code&gt; structure like so:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;CREATE TABLE [logs] (
   [httpRequest_latency] &lt;span class="pl-k"&gt;TEXT&lt;/span&gt;,
   [httpRequest_requestMethod] &lt;span class="pl-k"&gt;TEXT&lt;/span&gt;,
   [httpRequest_requestSize] &lt;span class="pl-k"&gt;TEXT&lt;/span&gt;,
   [httpRequest_status] &lt;span class="pl-k"&gt;INTEGER&lt;/span&gt;,
   [insertId] &lt;span class="pl-k"&gt;TEXT&lt;/span&gt;,
   [labels_service] &lt;span class="pl-k"&gt;TEXT&lt;/span&gt;
);&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Full &lt;a href="https://sqlite-utils.datasette.io/en/stable/cli.html#flattening-nested-json-objects"&gt;documentation for --flatten&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;Datasette column metadata&lt;/h4&gt;
&lt;p&gt;I've been wanting to add this for a while: Datasette's main branch now includes an implementation of &lt;a href="https://docs.datasette.io/en/latest/metadata.html#column-descriptions"&gt;column descriptions metadata&lt;/a&gt; for Datasette tables. This is best illustrated by a screenshot (of &lt;a href="https://latest.datasette.io/fixtures/roadside_attractions"&gt;this live demo&lt;/a&gt;):&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot showing column metadata displayed both at the top of the Datasette table page and in the context menu that shows up for a column" src="https://static.simonwillison.net/static/2021/column-metadata.png" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;You can add the following to &lt;code&gt;metadata.yml&lt;/code&gt; (or &lt;code&gt;.json&lt;/code&gt;) to specify descriptions for the columns of a given table:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;&lt;span class="pl-ent"&gt;databases&lt;/span&gt;:
  &lt;span class="pl-ent"&gt;fixtures&lt;/span&gt;:
    &lt;span class="pl-ent"&gt;roadside_attractions&lt;/span&gt;:
      &lt;span class="pl-ent"&gt;columns&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;The name of the attraction&lt;/span&gt;
        &lt;span class="pl-ent"&gt;address&lt;/span&gt;: &lt;span class="pl-s"&gt;The street address for the attraction&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Column descriptions will be shown in a &lt;code&gt;&amp;lt;dl&amp;gt;&lt;/code&gt; at the top of the page, and will also be added to the menu that appears when you click on the cog icon at the top of a column.&lt;/p&gt;
&lt;h4 id="column-metadata"&gt;Getting closer to query column metadata, too&lt;/h4&gt;
&lt;p&gt;Datasette lets you execute arbitrary SQL queries, like this one:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;select&lt;/span&gt;
  &lt;span class="pl-c1"&gt;roadside_attractions&lt;/span&gt;.&lt;span class="pl-c1"&gt;name&lt;/span&gt;,
  &lt;span class="pl-c1"&gt;roadside_attractions&lt;/span&gt;.&lt;span class="pl-c1"&gt;address&lt;/span&gt;,
  &lt;span class="pl-c1"&gt;attraction_characteristic&lt;/span&gt;.&lt;span class="pl-c1"&gt;name&lt;/span&gt;
&lt;span class="pl-k"&gt;from&lt;/span&gt;
  roadside_attraction_characteristics
  &lt;span class="pl-k"&gt;join&lt;/span&gt; roadside_attractions &lt;span class="pl-k"&gt;on&lt;/span&gt; &lt;span class="pl-c1"&gt;roadside_attractions&lt;/span&gt;.&lt;span class="pl-c1"&gt;pk&lt;/span&gt; &lt;span class="pl-k"&gt;=&lt;/span&gt; &lt;span class="pl-c1"&gt;roadside_attraction_characteristics&lt;/span&gt;.&lt;span class="pl-c1"&gt;attraction_id&lt;/span&gt;
  &lt;span class="pl-k"&gt;join&lt;/span&gt; attraction_characteristic &lt;span class="pl-k"&gt;on&lt;/span&gt; &lt;span class="pl-c1"&gt;attraction_characteristic&lt;/span&gt;.&lt;span class="pl-c1"&gt;pk&lt;/span&gt; &lt;span class="pl-k"&gt;=&lt;/span&gt; &lt;span class="pl-c1"&gt;roadside_attraction_characteristics&lt;/span&gt;.&lt;span class="pl-c1"&gt;characteristic_id&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can &lt;a href="https://latest.datasette.io/fixtures?sql=select%0D%0A++roadside_attractions.name%2C%0D%0A++roadside_attractions.address%2C%0D%0A++attraction_characteristic.name%0D%0Afrom%0D%0A++roadside_attraction_characteristics%0D%0A++join+roadside_attractions+on+roadside_attractions.pk+%3D+roadside_attraction_characteristics.attraction_id%0D%0A++join+attraction_characteristic+on+attraction_characteristic.pk+%3D+roadside_attraction_characteristics.characteristic_id"&gt;try that here&lt;/a&gt;. It returns the following:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;name&lt;/th&gt;
&lt;th&gt;address&lt;/th&gt;
&lt;th&gt;name&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;The Mystery Spot&lt;/td&gt;
&lt;td&gt;465 Mystery Spot Road, Santa Cruz, CA 95065&lt;/td&gt;
&lt;td&gt;Paranormal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Winchester Mystery House&lt;/td&gt;
&lt;td&gt;525 South Winchester Boulevard, San Jose, CA 95128&lt;/td&gt;
&lt;td&gt;Paranormal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bigfoot Discovery Museum&lt;/td&gt;
&lt;td&gt;5497 Highway 9, Felton, CA 95018&lt;/td&gt;
&lt;td&gt;Paranormal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Burlingame Museum of PEZ Memorabilia&lt;/td&gt;
&lt;td&gt;214 California Drive, Burlingame, CA 94010&lt;/td&gt;
&lt;td&gt;Museum&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bigfoot Discovery Museum&lt;/td&gt;
&lt;td&gt;5497 Highway 9, Felton, CA 95018&lt;/td&gt;
&lt;td&gt;Museum&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The columns it returns have names... but I've long wanted to do more with these results. If I could derive &lt;em&gt;which&lt;/em&gt; source columns each of those output columns were, there are a bunch of interesting things I could do, most notably:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If the output column is a known foreign key relationship, I could turn it into a hyperlink (as seen on &lt;a href="https://latest.datasette.io/fixtures/roadside_attraction_characteristics"&gt;this table page&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;If the original table column has the new column metadata, I could display that as additional documentation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The challenge is: given an abitrary SQL query, how can I figure out what the resulting columns are going to be and how to tie those back to the original tables?&lt;/p&gt;
&lt;p&gt;Thanks to &lt;a href="https://sqlite.org/forum/forumpost/482abd2e0f119555?t=h"&gt;a hint&lt;/a&gt; from the SQLite forum I'm getting &lt;a href="https://github.com/simonw/datasette/issues/1293"&gt;tantalizingly close&lt;/a&gt; to a solution.&lt;/p&gt;
&lt;p&gt;The trick is to horribly abuse SQLite's &lt;code&gt;explain&lt;/code&gt; output. Here's &lt;a href="https://latest.datasette.io/fixtures?sql=explain+select%0D%0A++roadside_attractions.name%2C%0D%0A++roadside_attractions.address%2C%0D%0A++attraction_characteristic.name%0D%0Afrom%0D%0A++roadside_attraction_characteristics%0D%0A++join+roadside_attractions+on+roadside_attractions.pk+%3D+roadside_attraction_characteristics.attraction_id%0D%0A++join+attraction_characteristic+on+attraction_characteristic.pk+%3D+roadside_attraction_characteristics.characteristic_id"&gt;what it looks like&lt;/a&gt; for the example query above:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;addr&lt;/th&gt;
&lt;th&gt;opcode&lt;/th&gt;
&lt;th&gt;p1&lt;/th&gt;
&lt;th&gt;p2&lt;/th&gt;
&lt;th&gt;p3&lt;/th&gt;
&lt;th&gt;p4&lt;/th&gt;
&lt;th&gt;p5&lt;/th&gt;
&lt;th&gt;comment&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Init&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;OpenRead&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;OpenRead&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;OpenRead&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;46&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Rewind&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Column&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;SeekRowid&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Column&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;SeekRowid&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Column&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Column&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;Column&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;ResultRow&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;Next&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Halt&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Transaction&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;Goto&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The magic is on line 12: &lt;code&gt;ResultRow 3 3&lt;/code&gt; means "return a result that spans three columns, starting at register 3" - so that's register 3, 4 and 5. Those three registers are populated by the &lt;code&gt;Column&lt;/code&gt; operations on line 9, 10 and 11 (the register they write into is in the &lt;code&gt;p3&lt;/code&gt; column). Each &lt;code&gt;Column&lt;/code&gt; operation specifies the table (as &lt;code&gt;p1&lt;/code&gt;) and the column index within that table (&lt;code&gt;p2&lt;/code&gt;). And those table references map back to the &lt;code&gt;OpenRead&lt;/code&gt; lines at the start, where &lt;code&gt;p1&lt;/code&gt; is that table register (referered to by &lt;code&gt;Column&lt;/code&gt;) and &lt;code&gt;p1&lt;/code&gt; is the root page of the table within the schema.&lt;/p&gt;
&lt;p&gt;Running &lt;code&gt;select rootpage, name from sqlite_master where rootpage in (45, 46, 47)&lt;/code&gt; produces &lt;a href="https://latest.datasette.io/fixtures?sql=select+rootpage%2C+name+from+sqlite_master+where+rootpage+in+%2845%2C+46%2C+47%29"&gt;the following&lt;/a&gt;:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;rootpage&lt;/th&gt;
&lt;th&gt;name&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;roadside_attractions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;46&lt;/td&gt;
&lt;td&gt;attraction_characteristic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;td&gt;roadside_attraction_characteristics&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Tie all of this together, and it may be possible to use &lt;code&gt;explain&lt;/code&gt; to derive the original tables and columns for each of the outputs of an arbitrary query!&lt;/p&gt;
&lt;p&gt;I was almost ready to declare victory, until I tried running it against a query with an &lt;code&gt;order by column&lt;/code&gt; at the end... and the results no longer matched up.&lt;/p&gt;
&lt;p&gt;You can follow &lt;a href="https://github.com/simonw/datasette/issues/1293#issuecomment-898524057"&gt;my ongoing investigation here&lt;/a&gt; - the short version is that I think I'm going to have to learn to decode a whole bunch more opcodes before I can get this to work.&lt;/p&gt;
&lt;p&gt;This is also a very risk way of attacking this problem. The SQLite &lt;a href="https://www.sqlite.org/opcode.html"&gt;documentation for the bytecode engine&lt;/a&gt; includes the following warning:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This document describes SQLite internals. The information provided here is not needed for routine application development using SQLite. This document is intended for people who want to delve more deeply into the internal operation of SQLite.&lt;/p&gt;
&lt;p&gt;The bytecode engine is not an API of SQLite. Details about the bytecode engine change from one release of SQLite to the next. Applications that use SQLite should not depend on any of the details found in this document.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So it's pretty clear that this is a highly unsupported way of working with SQLite!&lt;/p&gt;
&lt;p&gt;I'm still tempted to try it though. This feature is very much a nice-to-have: if it breaks and the additional column context stops displaying it's not a critical bug - and hopefully I'll be able to ship a Datasette update that takes into account those breaking SQLite changes relatively shortly afterwards.&lt;/p&gt;
&lt;p&gt;If I can find another, more supported way to solve this I'll jump on it!&lt;/p&gt;
&lt;p&gt;In the meantime, I did use this technque to solve a simpler problem. Datasette extracts &lt;code&gt;:named&lt;/code&gt; parameters from arbitrary SQL queries and turns them &lt;a href="https://latest.datasette.io/fixtures/neighborhood_search?_show_sql=1"&gt;into form fields&lt;/a&gt; - but since it uses a simple regular expression for this it could be confused by things like a literal &lt;code&gt;00:04:05&lt;/code&gt; time string contained in a SQL query.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;explain&lt;/code&gt; output for that query includes the following:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;addr&lt;/th&gt;
&lt;th&gt;opcode&lt;/th&gt;
&lt;th&gt;p1&lt;/th&gt;
&lt;th&gt;p2&lt;/th&gt;
&lt;th&gt;p3&lt;/th&gt;
&lt;th&gt;p4&lt;/th&gt;
&lt;th&gt;p5&lt;/th&gt;
&lt;th&gt;comment&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;...&lt;/td&gt;
&lt;td&gt;...&lt;/td&gt;
&lt;td&gt;...&lt;/td&gt;
&lt;td&gt;...&lt;/td&gt;
&lt;td&gt;...&lt;/td&gt;
&lt;td&gt;...&lt;/td&gt;
&lt;td&gt;...&lt;/td&gt;
&lt;td&gt;...&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;Variable&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;:text&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;So I wrote some code which uses &lt;code&gt;explain&lt;/code&gt; to extract just the &lt;code&gt;p4&lt;/code&gt; operands from &lt;code&gt;Variable&lt;/code&gt; columns and treats those as the extracted parameters! This feels a lot safer than the more complex &lt;code&gt;ResultRow&lt;/code&gt;/&lt;code&gt;Column&lt;/code&gt; logic - and it also falls back to the regular expression if it runs into any SQL errors. More &lt;a href="https://github.com/simonw/datasette/issues/1421"&gt;in the issue&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/cloudrun/tailing-cloud-run-request-logs"&gt;Tailing Google Cloud Run request logs and importing them into SQLite&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/python/find-local-variables-in-exception-traceback"&gt;Find local variables in the traceback for an exception&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/sphinx/sphinx-autodoc"&gt;Adding Sphinx autodoc to a project, and configuring Read The Docs to build it&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-x-forwarded-host"&gt;datasette-x-forwarded-host&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-x-forwarded-host/releases/tag/0.1"&gt;0.1&lt;/a&gt; - 2021-08-12
&lt;br /&gt;Treat the X-Forwarded-Host header as the Host header&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/sqlite-utils/releases/tag/3.15.1"&gt;3.15.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/sqlite-utils/releases"&gt;84 releases total&lt;/a&gt;) - 2021-08-10
&lt;br /&gt;Python CLI utility and library for manipulating SQLite databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-query-links"&gt;datasette-query-links&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-query-links/releases/tag/0.1.2"&gt;0.1.2&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-query-links/releases"&gt;3 releases total&lt;/a&gt;) - 2021-08-09
&lt;br /&gt;Turn SELECT queries returned by a query into links to execute them&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette"&gt;datasette&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette/releases/tag/0.59a1"&gt;0.59a1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette/releases"&gt;96 releases total&lt;/a&gt;) - 2021-08-09
&lt;br /&gt;An open source multi-tool for exploring and publishing data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-pyinstrument"&gt;datasette-pyinstrument&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-pyinstrument/releases/tag/0.1"&gt;0.1&lt;/a&gt; - 2021-08-08
&lt;br /&gt;Use pyinstrument to analyze Datasette page performance&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mypy"&gt;mypy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-codespaces"&gt;github-codespaces&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="documentation"/><category term="github"/><category term="sql"/><category term="sqlite"/><category term="datasette"/><category term="weeknotes"/><category term="sqlite-utils"/><category term="mypy"/><category term="github-codespaces"/></entry><entry><title>Adding Sphinx autodoc to a project, and configuring Read The Docs to build it</title><link href="https://simonwillison.net/2021/Aug/11/sphinx-autodoc/#atom-tag" rel="alternate"/><published>2021-08-11T01:21:28+00:00</published><updated>2021-08-11T01:21:28+00:00</updated><id>https://simonwillison.net/2021/Aug/11/sphinx-autodoc/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://til.simonwillison.net/sphinx/sphinx-autodoc"&gt;Adding Sphinx autodoc to a project, and configuring Read The Docs to build it&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
My TIL notes from figuring out how to use sphinx-autodoc for the sqlite-utils reference documentation today.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sphinx-docs"&gt;sphinx-docs&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/read-the-docs"&gt;read-the-docs&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="sqlite-utils"/><category term="sphinx-docs"/><category term="read-the-docs"/></entry><entry><title>sqlite-utils API reference</title><link href="https://simonwillison.net/2021/Aug/11/sqlite-utils-api-reference/#atom-tag" rel="alternate"/><published>2021-08-11T01:03:33+00:00</published><updated>2021-08-11T01:03:33+00:00</updated><id>https://simonwillison.net/2021/Aug/11/sqlite-utils-api-reference/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://sqlite-utils.datasette.io/en/stable/reference.html"&gt;sqlite-utils API reference&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I released sqlite-utils 3.15.1 today with just one change, but it’s a big one: I’ve added docstrings and type annotations to nearly every method in the library, and I’ve started using sphinx-autodoc to generate an API reference page in the documentation directly from those docstrings. I’ve deliberately avoided building this kind of documentation in the past because I so often see projects where the class reference is the ONLY documentation, which I find makes it really hard to figure out how to actually use it. sqlite-utils already has extensive narrative prose documentation so in this case I think it’s a useful enhancement—especially since the docstrings and type hints can help improve the usability of the library in IDEs and Jupyter notebooks.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://sqlite-utils.datasette.io/en/stable/changelog.html#v3-15-1"&gt;sqlite-utils 3.15.1 release notes&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/documentation"&gt;documentation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sphinx-docs"&gt;sphinx-docs&lt;/a&gt;&lt;/p&gt;



</summary><category term="documentation"/><category term="python"/><category term="sqlite-utils"/><category term="sphinx-docs"/></entry></feed>