Simon Willison’s Weblog

Subscribe

5 items tagged “datastructures”

2024

Zed Decoded: Rope & SumTree (via) Text editors like Zed need in-memory data structures that are optimized for handling large strings where text can be inserted or deleted at any point without needing to copy the whole string.

Ropes are a classic, widely used data structure for this.

Zed have their own implementation of ropes in Rust, but it's backed by something even more interesting: a SumTree, described here as a thread-safe, snapshot-friendly, copy-on-write B+ tree where each leaf node contains multiple items and a Summary for each Item, and internal tree nodes contain a Summary of the items in its subtree.

These summaries allow for some very fast traversal tree operations, such as turning an offset in the file into a line and row coordinate and vice-versa. The summary itself can be anything, so each application of SumTree in Zed collects different summary information.

Uses in Zed include tracking highlight regions, code folding state, git blame information, project file trees and more - over 20 different classes and counting.

Zed co-founder Nathan Sobo calls SumTree "the soul of Zed".

Also notable: this detailed article is accompanied by an hour long video with a four-way conversation between Zed maintainers providing a tour of these data structures in the Zed codebase. # 28th April 2024, 3:25 pm

2020

Reducing search indexing latency to one second. Really detailed dive into the nuts and bolts of Twitter’s latest iteration of search indexing technology, including a great explanation of skip lists. # 26th June 2020, 5:06 pm

2013

Why does Python not have any data structures that store data in sorted order?

The bisect module provides functions for achieving this using a python list as the underlying data structure: http://docs.python.org/2/library...

[... 39 words]

What data structures are used to implement the DOM tree?

You may enjoy this post from Hixie back in 2002 which illustrates how different browsers deal with incorrectly nested HTML. IE6 used to create a tree that wasn’t actually a tree! http://ln.hixie.ch/?start=103791...

[... 49 words]

2007

Speeding up dateutil: Python’s heapq module turns minutes into seconds. Neat case study in data structure optimisation. # 22nd December 2007, 1:07 pm