Discovering Berkeley DB
I’m working on a project at the moment which involves exporting a whole bunch of data out of an existing system. The system is written in Perl and uses Berkeley DB files for most of its storage.
I’d never done anything with Berkeley DB before, but luckily Python has a module which seems to do all of the hard work for me:
>>> db = bsddb.btopen('xpand.db')
>>> db.keys()[0:10]
[':archives:index.html', ':art:test.html', ...
>>> db[':art:test.html']
'template;front.tp\x01\x01'
>>>
The Berkeley DB libraries are maintained by Sleepycat Software. Unfortunately, their site is completely saturated with marketing jargon. Our customers rely on Berkeley DB for fast, scalable, reliable and cost-effective data management for their mission-critical applications
. Great—now what does it do exactly?
Some digging around turned up the real information: the Berkeley DB Tutorial and Reference Guide, which contains pretty much everything you could possible want to know about the technology. It turns out that at a basic level Berkeley DB is just a very high performance, reliable way of persisting dictionary style data structures—anything where a piece of data can be stored and looked up using a unique key. The key and the value can each be up to 4 gigabytes in length and can consist of anything that can be crammed in to a string of bytes, so what you do with it is completely up to you. The only operations available are “store this value under this key”, “check if this key exists” and “retrieve the value for this key” so conceptually it’s pretty simple—the complicated stuff all happens under the hood.
It seems like a great alternative to a full on relational database for simple applications, although I’m slightly confused by the license which allows free use for open source products but requires a license for commercial applications. Does that mean that if I use the bsddb Python module in a commercial app I need to get a license from Sleepycat?
Be aware of one trap with Berkeley DB: it has a notorious history of changing the on-disk storage format between small version upgrades. To the point that data written with one version has not always been readable with a new point-release upgrade. They may have stabilised lately, in view of the public feedback, but this has been a problem for a few years.
That being said, this package is a great one for many data storage needs (and I cannot help you with the license question, having never found myself in that position and so having not thought about it).
Malcolm Tredinnick - 26th November 2003 03:13 - #
anders - 26th November 2003 03:49 - #
Just a note to check out the Berkley DB XML development weblog. It contains a lot of FAQ and such like that you may find useful, if you want to go for an XML based approach.
Ben Meadowcroft - 26th November 2003 07:36 - #
Sterling Hughes - 26th November 2003 09:46 - #
Mark - 26th November 2003 11:44 - #
A.M. Kuchling - 26th November 2003 12:20 - #
My experience would confirm Andrew's statement. I contacted SleepyCat some time ago re: their license and in their response (wish I could find that e-mail!) they indicated that they would not require a commercial license for commercial software that was "completely written in an open-source language such as Perl or Python" (a rough paraphrase from memory).
Of course, one could e-mail them independently and get confirmation straight from the cat's mouth. ;-)
Graham Fawcett - 28th November 2003 04:54 - #
jens - 4th December 2003 17:34 - #
I'm glad you found SleepyCat's Berkeley DB Tutorial and Reference Guide both useful and legible ...
and that's just the first paragraph on the first method!
In my estimation, this 'manual' tome stands at the pinnacle of abstract expressionism in geek literature, opaque as it is obtuse, and brimming with impenetrable jargon. It's easy to see how the tech crew and the marketing crew could be best friends ;)
That said, and thank you for the opportunity to say it (I feel better now) there is a decent (Java-centric) getting-started tutorial at http://today.java.net/pub/a/today/2004/08/24/sleep y.html, but all I really want to know, in plain English even a fool like me can understand, is how do I implement multiprocess lock protection on an ordered Btree that uses cursors across a range of keys ...
mrG - 7th October 2005 20:03 - #
zyxtberk - 8th February 2006 04:24 - #
dbusertobe - 21st May 2006 03:21 - #
Perl/Python Licensing - I found this under Sleepycat licensing.
Do I have to pay for a Berkeley DB license to use it in my Perl or Python scripts?
No, you may use the Berkeley DB open source license at no cost. The Berkeley DB open source license requires that software that uses Berkeley DB be freely redistributable. In the case of Perl or Python, that software is Perl or Python, and not your scripts. Any scripts you write are your property, including scripts that make use of Berkeley DB. None of the Perl, Python or Berkeley DB licenses place any restrictions on what you may do with them.
John Resler - 8th June 2006 11:39 - #