Simon Willison’s Weblog

Subscribe
Atom feed for documentation

50 items tagged “documentation”

2024

Introducing Limbo: A complete rewrite of SQLite in Rust (via) This looks absurdly ambitious:

Our goal is to build a reimplementation of SQLite from scratch, fully compatible at the language and file format level, with the same or higher reliability SQLite is known for, but with full memory safety and on a new, modern architecture.

The Turso team behind it have been maintaining their libSQL fork for two years now, so they're well equipped to take on a challenge of this magnitude.

SQLite is justifiably famous for its meticulous approach to testing. Limbo plans to take an entirely different approach based on "Deterministic Simulation Testing" - a modern technique pioneered by FoundationDB and now spearheaded by Antithesis, the company Turso have been working with on their previous testing projects.

Another bold claim (emphasis mine):

We have both added DST facilities to the core of the database, and partnered with Antithesis to achieve a level of reliability in the database that lives up to SQLite’s reputation.

[...] With DST, we believe we can achieve an even higher degree of robustness than SQLite, since it is easier to simulate unlikely scenarios in a simulator, test years of execution with different event orderings, and upon finding issues, reproduce them 100% reliably.

The two most interesting features that Limbo is planning to offer are first-party WASM support and fully asynchronous I/O:

SQLite itself has a synchronous interface, meaning driver authors who want asynchronous behavior need to have the extra complication of using helper threads. Because SQLite queries tend to be fast, since no network round trips are involved, a lot of those drivers just settle for a synchronous interface. [...]

Limbo is designed to be asynchronous from the ground up. It extends sqlite3_step, the main entry point API to SQLite, to be asynchronous, allowing it to return to the caller if data is not ready to consume immediately.

Datasette provides an async API for executing SQLite queries which is backed by all manner of complex thread management - I would be very interested in a native asyncio Python library for talking to SQLite database files.

I successfully tried out Limbo's Python bindings against a demo SQLite test database using uv like this:

uv run --with pylimbo python
>>> import limbo
>>> conn = limbo.connect("/tmp/demo.db")
>>> cursor = conn.cursor()
>>> print(cursor.execute("select * from foo").fetchall())

It crashed when I tried against a more complex SQLite database that included SQLite FTS tables.

The Python bindings aren't yet documented, so I piped them through LLM and had the new google-exp-1206 model write this initial documentation for me:

files-to-prompt limbo/bindings/python -c | llm -m gemini-exp-1206 -s 'write extensive usage documentation in markdown, including realistic usage examples'

# 10th December 2024, 7:25 pm / rust, sqlite, uv, open-source, python, llm, ai-assisted-programming, documentation

Generating documentation from tests using files-to-prompt and LLM. I was experimenting with the wasmtime-py Python library today (for executing WebAssembly programs from inside CPython) and I found the existing API docs didn't quite show me what I wanted to know.

The project has a comprehensive test suite so I tried seeing if I could generate documentation using that:

cd /tmp
git clone https://github.com/bytecodealliance/wasmtime-py
files-to-prompt -e py wasmtime-py/tests -c | \
  llm -m claude-3.5-sonnet -s \
  'write detailed usage documentation including realistic examples'

More notes in my TIL. You can see the full Claude transcript here - I think this worked really well!

# 5th November 2024, 10:37 pm / llm, webassembly, generative-ai, ai, llms, claude, claude-3-5-sonnet, ai-assisted-programming, documentation

docs.jina.ai—the Jina meta-prompt. From Jina AI on Twitter:

curl docs.jina.ai - This is our Meta-Prompt. It allows LLMs to understand our Reader, Embeddings, Reranker, and Classifier APIs for improved codegen. Using the meta-prompt is straightforward. Just copy the prompt into your preferred LLM interface like ChatGPT, Claude, or whatever works for you, add your instructions, and you're set.

The page is served using content negotiation. If you hit it with curl you get plain text, but a browser with text/html in the accept: header gets an explanation along with a convenient copy to clipboard button.

Screenshot of an API documentation page for Jina AI with warning message, access instructions, and code sample. Contains text: Note: This content is specifically designed for LLMs and not intended for human reading. For human-readable content, please visit Jina AI. For LLMs/programmatic access, you can fetch this content directly: curl docs.jina.ai/v2 # or wget docs.jina.ai/v2 # or fetch docs.jina.ai/v2 You only see this as a HTML when you access docs.jina.ai via browser. If you access it via code/program, you will get a text/plain response as below. You are an AI engineer designed to help users use Jina AI Search Foundation API's for their specific use case. # Core principles...

# 30th October 2024, 5:07 pm / llm, jina, generative-ai, ai, documentation, llms

simonw/docs cookiecutter template. Over the last few years I’ve settled on the combination of Sphinx, the Furo theme and the myst-parser extension (enabling Markdown in place of reStructuredText) as my documentation toolkit of choice, maintained in GitHub and hosted using ReadTheDocs.

My LLM and shot-scraper projects are two examples of that stack in action.

Today I wanted to spin up a new documentation site so I finally took the time to construct a cookiecutter template for my preferred configuration. You can use it like this:

pipx install cookiecutter
cookiecutter gh:simonw/docs

Or with uv:

uv tool run cookiecutter gh:simonw/docs

Answer a few questions:

[1/3] project (): shot-scraper
[2/3] author (): Simon Willison
[3/3] docs_directory (docs):

And it creates a docs/ directory ready for you to start editing docs:

cd docs
pip install -r requirements.txt
make livehtml

# 23rd September 2024, 9:45 pm / uv, markdown, sphinx-docs, cookiecutter, read-the-docs, python, projects, documentation

If you learn something the hard way, share your findings with others. You have blazed a new trail; now you must mark it for your fellow travellers. Sharing knowledge is an unreasonably effective way of helping others.

Nicolas Bouliane

# 5th January 2024, 10:32 pm / documentation

2023

PostgreSQL Lock Conflicts (via) I absolutely love how extremely specific and niche this documentation site is. It details every single lock that PostgreSQL implements, and shows exactly which commands acquire that lock. That’s everything. I can imagine this becoming absurdly useful at extremely infrequent intervals for advanced PostgreSQL work.

# 23rd August 2023, 3:08 am / postgresql, documentation

2022

Coping strategies for the serial project hoarder

Visit Coping strategies for the serial project hoarder

I gave a talk at DjangoCon US 2022 in San Diego last month about productivity on personal projects, titled “Massively increase your productivity on personal projects with comprehensive documentation and automated tests”.

[... 3,864 words]

Automating screenshots for the Datasette documentation using shot-scraper

Visit Automating screenshots for the Datasette documentation using shot-scraper

I released shot-scraper back in March as a tool for keeping screenshots in documentation up-to-date.

[... 1,810 words]

Software engineering practices

Gergely Orosz started a Twitter conversation asking about recommended “software engineering practices” for development teams.

[... 1,557 words]

How I’m a Productive Programmer With a Memory of a Fruit Fly (via) Hynek Schlawack describes the value he gets from searchable offline developer documentation, and advocates for the Documentation Sets format which bundles docs, metadata and a SQLite search index. Hynek’s doc2dash command can convert documentation generated by tools like Sphinx into a docset that’s compatible with several offline documentation browser applications.

# 19th September 2022, 4:19 pm / sqlite, hynek-schlawack, documentation, sphinx-docs

Your documentation is complete when someone can use your module without ever having to look at its code. This is very important. This makes it possible for you to separate your module's documented interface from its internal implementation (guts). This is good because it means that you are free to change the module's internals as long as the interface remains the same.

Remember: the documentation, not the code, defines what a module does.

Ken Williams

# 4th August 2022, 3:50 pm / documentation

Cleaning data with sqlite-utils and Datasette (via) I wrote a new tutorial for the Datasette website, showing how to use sqlite-utils to import a CSV file, clean up the resulting schema, fix date formats and extract some of the columns into a separate table. It’s accompanied by a ten minute video originally recorded for the HYTRADBOI conference.

# 31st July 2022, 7:57 pm / tutorials, sqlite-utils, datasette, documentation

Weeknotes: Building Datasette Cloud on Fly Machines, Furo for documentation

Visit Weeknotes: Building Datasette Cloud on Fly Machines, Furo for documentation

Hosting provider Fly released Fly Machines this week. I got an early preview and I’ve been working with it for a few days—it’s a fascinating new piece of technology. I’m using it to get my hosting service for Datasette ready for wider release.

[... 1,005 words]

GOV.UK Guidance: Documenting APIs (via) Characteristically excellent guide from GOV.UK on writing great API documentation. “Task-based guidance helps users complete the most common integration tasks, based on the user needs from your research.”

# 21st May 2022, 11:31 pm / documentation, gov-uk

jq language description (via) I love jq but I’ve always found it difficult to remember how to use it, and the manual hasn’t helped me as much as I would hope. It turns out the jq wiki on GitHub offers an alternative, more detailed description of the language which fits the way my brain works a lot better.

# 26th April 2022, 7:04 pm / jq, documentation, programming-languages

Deno by example (via) Interesting approach to documentation: a big list of annotated examples illustrating the Deno way of solving a bunch of common problems.

# 17th March 2022, 1:02 am / deno, documentation

Instantly create a GitHub repository to take screenshots of a web page

Visit Instantly create a GitHub repository to take screenshots of a web page

I just released shot-scraper-template, a GitHub repository template that helps you start taking automated screenshots of a web page by filling out a form.

[... 1,177 words]

Weeknotes: Distracted by Playwright

Visit Weeknotes: Distracted by Playwright

My goal for this week was to unblock progress on Datasette by finally finishing the dash encoding implementation I described last week. I was getting close, and then I got very distracted by Playwright.

[... 892 words]

shot-scraper: automated screenshots for documentation, built on Playwright

Visit shot-scraper: automated screenshots for documentation, built on Playwright

shot-scraper is a new tool that I’ve built to help automate the process of keeping screenshots up-to-date in my documentation. It also doubles as a scraping tool—hence the name—which I picked as a complement to my git scraping and help scraping techniques.

[... 1,802 words]

Weeknotes: Datasette Tutorials

I published two new tutorials for Datasette this week, both focused at end-users of the web application.

[... 479 words]

2021

Making world-class docs takes effort (via) Curl maintainer Daniel Stenberg writes about his principles for good documentation. I agree with all of these: he emphasizes keeping docs in the repo, avoiding the temptation to exclusively generate them from code, featuring examples and ensuring every API you provide has documentation. Daniel describes an approach similar to the documentation unit tests I’ve been using for my own projects: he has scripts which scan the curl documentation to ensure not only that everything is documented but that each documentation area contains the same sections in the same order.

# 6th September 2021, 6:58 pm / curl, documentation, daniel-stenberg

The Diátaxis documentation framework. Daniele Procida’s model of four types of technical documentation—tutorials, how-to guides, technical reference and explanation—now has a name: Diátaxis.

# 21st August 2021, 10:59 pm / diataxis, documentation

Datasette on Codespaces, sqlite-utils API reference documentation and other weeknotes

Visit Datasette on Codespaces, sqlite-utils API reference documentation and other weeknotes

This week I broke my streak of not sending out the Datasette newsletter, figured out how to use Sphinx for Python class documentation, worked out how to run Datasette on GitHub Codespaces, implemented Datasette column metadata and got tantalizingly close to a solution for an elusive Datasette feature.

[... 2,164 words]

Adding Sphinx autodoc to a project, and configuring Read The Docs to build it. My TIL notes from figuring out how to use sphinx-autodoc for the sqlite-utils reference documentation today.

# 11th August 2021, 1:21 am / sqlite-utils, documentation, sphinx-docs, read-the-docs

sqlite-utils API reference (via) I released sqlite-utils 3.15.1 today with just one change, but it’s a big one: I’ve added docstrings and type annotations to nearly every method in the library, and I’ve started using sphinx-autodoc to generate an API reference page in the documentation directly from those docstrings. I’ve deliberately avoided building this kind of documentation in the past because I so often see projects where the class reference is the ONLY documentation, which I find makes it really hard to figure out how to actually use it. sqlite-utils already has extensive narrative prose documentation so in this case I think it’s a useful enhancement—especially since the docstrings and type hints can help improve the usability of the library in IDEs and Jupyter notebooks.

# 11th August 2021, 1:03 am / sqlite-utils, documentation, python, sphinx-docs

2020

Design Docs at Google. Useful description of the format used for software design docs at Google—informal documents of between 3 and 20 pages that outline the proposed design of a new project, discuss trade-offs that were considered and solicit feedback before the code starts to be written.

# 7th August 2020, 4:31 pm / google, documentation

The unofficial Google Cloud Run FAQ. This is really useful: a no-fluff, content rich explanation of Google Cloud Run hosted as a GitHub repo that actively accepts pull requests from the community. It’s maintained by Ahmet Alp Balkan, a Cloud Run engineer who states “Googlers: If you find this repo useful, you should recognize the work internally, as I actively fight for alternative forms of content like this”. One of the hardest parts of working with AWS and GCP is digging through the marketing materials to figure out what the product actually does, so the more alternative forms of documentation like this the better.

# 22nd July 2020, 5:20 pm / cloudrun, google, documentation

How to find what you want in the Django documentation (via) Useful guide by Matthew Segal to navigating the Django documentation, and tips for reading documentation in general. The Django docs have a great reputation so it’s easy to forget how intimidating they can be for newcomers: Matthew emphasizes that docs are rarely meant to be read in full: the trick is learning how to quickly search them for the things you need to understand right now.

# 3rd July 2020, 3:04 pm / documentation, django

2019

Documentation needs to include and be structured around its four different functions: tutorials, how-to guides, explanation and technical reference. Each of them requires a distinct mode of writing. People working with software need these four different kinds of documentation at different times, in different circumstances - so software usually needs them all.

Daniele Procida

# 3rd August 2019, 8:29 am / documentation, diataxis

2018

The subset of reStructuredText worth committing to memory

reStructuredText is the standard for documentation in the Python world.

[... 1,186 words]