Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

Storing trees in a database

SitePoint: Storing Hierarchical Data in a Database, by Gijs Van Tulder. The article first shows how the easy way of storing hierarchies in a database, using parent fields and a recursive PHP function to iterate up the tree. It then goes on to talk about a far more interesting alternative called “Modified Preorder Tree Traversal” where trees are first “flattened” in to a heap-like structure, then each node is stored with a pair of numbers representing that node’s position in the tree. I’d seen this somewhere before but Gijs Van Tulder’s explanation is far clearer, and comes with some good examples showing how this unconventional storage method can retrieve all of the eventual children of a node in a single query. He also talks about ways of updating the tree structure when new items are added.

This is Storing trees in a database by Simon Willison, posted on 19th June 2003.

View blog reactions

Next: Quick testing of alt attributes

Previous: Thunderbird supports extensions

8 comments

  1. Have you seen Joe Celko's Nested Set Model for hierarchies? It's great, especially if you do more SELECTs than INSERTs.

    Jim Biancolo - 20th June 2003 02:20 - #

  2. Another heads up for Joe Celko... required reading for anyone doing anything reasonably complicated. Have a look at http://istherelifeafter.com/joecelko.html for my notes and a compilation of celkos nested set work

    Tim Parkin - 20th June 2003 09:30 - #

  3. A reluctant thumbs down for Joe's model I'm afraid. I've tried implementing it and then had to spend a considerable amount of time teaching the developers how to use it.

    The hierarchical model may not be as pure, but it is simple and easy to understand. It also has the benefit of being supported by a language extension in Oracle (CONNECT BY .. START WITH)

    Finally, have you tried to re-organise a nested set? As Jim said, its fine when you have a static hierarchy, but with even a moderate level of inserts and updates it becomes hard to manage.

    Andy Todd - 20th June 2003 09:53 - #

  4. nested sets are very 'write heavy' for want of a better phrase. However, as most queries against hierarchies are reads then this generally isn't a problem. If the problem is biased towards updating then it probably isn't the right solution. There should be no problem with developers using it as you would only expose a high level interface to the model. Then it wouldn't matter if it were nested or hierarchial.

    Tim Parkin - 20th June 2003 14:17 - #

  5. As noted, I think both "nested set" and "adjacency list" approaches have their place (SELECT-heavy and INSERT-heavy respectively). It's valuable to have a good understanding of both approaches, IMO. I have had applications that have had to both SELECT and INSERT speedily, and the problem was complicated by needing to do many SELECTs to process a single transaction, which translated into *way* too much recursive traversal. In this situation what worked for me was to store the data in the DB using the "adjacency list" model, while generating an in-memory "nested set" representation for speedy determination of parent/child relationships.

    Jim Biancolo - 20th June 2003 14:45 - #

  6. There is a really excellent booklet with some unique info, called "Tree Processing in SQL", by Rozenshtein, Abramovich, and Alexandrova. It is available through http://sqlforum.com/.

    This booklet has some unique techniques for optimized processing of tress in SQL, using the parent/child data arrangement.

    The optimized techniques in the booklet require the use of stored procedures for traversing the tree, but it is otherwise the common parent/child data arrangement so it can be accessed directly and efficiently with INSERTs, SELECTS, etc., as well.

    Definitely worth checking out if you are working with trees / graphs large enough to make you think about the alternate storage arrangements.

    Jay Fienberg - 20th June 2003 21:31 - #

  7. There are two Nested Sets implemenations in PEAR, see http://pear.php.net/package-info.php?pacid=104 and http://pear.php.net/package-info.php?pacid=187.

    Sebastian Bergmann - 21st June 2003 14:12 - #

  8. http://www.codeproject.com/cs/database/Trees_in_SQ L_databases.asp

    dev@ - 26th October 2004 23:28 - #

Comments are closed.

Previously hosted at http://simon.incutio.com/archive/2003/06/19/storingTrees

A django site