Simon Willison’s Weblog

60 items tagged “databases”

PostgreSQL 8.5 alpha 2 is out. “P.S. If you’re wondering about Hot Standby and Synchronous Replication, they’re still under heavy development and still (at this point) expected to be in 8.5.”—Hot Standby is PostgreSQL-speak for MySQL-style master/slave replication for scaling your reads. # 28th October 2009, 9:02 am

MySQL Connector/Python. A pure Python implementation of the MySQL client/server protocol, meaning you can talk to a MySQL server from Python without needing to first install the MySQL client libraries (which often requires compiling from source). # 2nd October 2009, 2:16 pm

Why we migrated from MySQL to MongoDB. Includes some useful information on MongoDB’s limitations—for example, running many different collections can waste disk space and repairing large datasets or bulk deleting many rows can block and lock the database for the duration of the operation. # 27th July 2009, 10:49 am

Keyspace. Yet Another Key-Value Store—this one focuses on high availability, with one server in the cluster serving as master (and handling all writes), and the paxos algorithm handling replication and ensuring a new master can be elected should the existing master become unavailable. Clients can chose to make dirty reads against replicated servers or clean reads by talking directly to the master. Underlying storage is BerkeleyDB, and the authors claim 100,000 writes/second. Released under the AGPL. # 16th July 2009, 10:30 am

Using Mongo for Real-Time Analytics. MongoDB supports an “upsert” query, which when combined with the $inc operator can cause counter fields to be incremented if they exist and created otherwise. This makes it a great fit for real-time analytics applications (one increment per page view), something that regular relational databases aren’t particularly good at. # 30th June 2009, 7:28 pm

PostgreSQL Development Priorities. The top two for 8.4 are “Simple built-in replication” and “Upgrade-in-place”, Josh Berkus is seeking feedback on priorities for future work on 8.5. # 28th May 2009, 8:08 pm

ericflo’s django-tokyo-sessions. A Django sessions backend using Tokyo Cabinet, via Tokyo Tyrant and the PyTyrant library. A fast key/value store is a much better solution for sessions than a relational database. # 7th May 2009, 7:30 am

Drop ACID and think about data. I’ve been very impressed with the quality and speed with which the PyCon 2009 videos have been published. Here’s Bob Ippolito on distributed databases and key/value stores. # 17th April 2009, 5:13 pm

Introducing Digg’s IDDB Infrastructure. IDDB is Digg’s new infrastructure component for sharding data across multiple databases, with support for both MySQL and memcachedb. “The DiggBar and URL minifying service is powered by a 16 machine IDDB cluster, which includes 8 write masters in the index and 8 MySQL storage nodes.” # 3rd April 2009, 8:42 pm

Southerly Breezes. Andrew Godwin is slowly assimilating the best ideas from other Django migration systems in to South—the latest additions include ORM Freezing from Migratory and automatic change detection. Exciting stuff. # 15th March 2009, 1:17 pm

[Drizzle] won’t be a get-out-of-jail-free card for very write-heavy applications but I bet it will do wonders for heavily replicated, heavily federated, read-heavy architectures (you know, normal stuff).

Richard Crowley # 8th March 2009, 6:05 pm

What happened to Hot Standby? Hot Standby (the ability to have read-only replication slaves) has been dropped from PostgreSQL 8.4 and is now scheduled for 8.5. “Making hard decisions to postpone features which aren’t quite ready is how PostgreSQL makes sure that our DBMS is ”bulletproof“ and that we release close to on-time every year”. # 8th March 2009, 9:28 am

Database Sharding at Netlog, with MySQL and PHP. Detailed MySQL sharding case study from Netlog, who serve five billion page requests a month using thousands of shards across more than 80 database servers. # 2nd March 2009, 10:22 am

How FriendFeed uses MySQL to store schema-less data. The pain of altering/ adding indexes to tables with 250 million rows was killing their ability to try out new features, so they’ve moved to storing pickled Python objects and manually creating the indexes they need as denormalised two column tables. These can be created and dropped much more easily, and are continually populated by an off-line index building process. # 27th February 2009, 2:33 pm

DB2 support for Django is coming. From IBM, under the Apache 2.0 License. I’m not sure if this makes it hard to bundle it with the rest of Django, which uses the BSD license. # 18th February 2009, 10:58 pm

Tokyo Tyrant Tutorial. Buried at the bottom of the Tokyo Tyrant protocol documentation, this is the best resource I’ve seen yet for getting up and running with the database server (including setting up replication). # 14th February 2009, 11:29 am

Tokyo Cabinet: Beyond Key-Value Store. Useful overview of Yet Another Scalable Key Value Store. Interesting points: multiple backends (hash table, B-Tree, in memory, on disk), a “table” engine which enables more advanced queries, a network server that supports HTTP, memcached or its own binary protocol and the ability to extend the engine with Lua scripts. # 14th February 2009, 11:17 am

Rules of Database App Aging. Peter Harkins: All fields become optional, all relationships become many-to-many, chatter always expands. This is why document oriented databases such as CouchDB are looking more and more attractive. # 18th January 2009, 9:09 am

Spock Proxy. A MySQL Proxy fork (no Lua) that concentrates solely on sharding, by parsing incoming SQL statements and redirecting them across multiple databases. There are some limitations on the SQL that can be handled (no nested queries, joins across a maximum of two tables) but generally it looks pretty impressive. # 11th December 2008, 9:49 am

A Brief Tour of Graphd. The secret sauce behind Freebase—a custom written graph server that models everything as a typed, versioned relationship and can churn through over 3,000 simple queries a second on a single AMD64 core. # 29th September 2008, 11:32 am

Hugely informative thread on multi-db support in Django. I brain-dumped some ideas for a Django multi-database connection API on the developer list, and got a ton of smart push-back from people who’ve been there and have the scars to prove it. # 10th September 2008, 11:31 pm

Drizzle, Clouds, “What If?”. Exciting news in the world of MySQL: Drizzle is a new project to produce a massively stripped down version of the database server—InnoDB/UTF8 only, no permissions, views, stored procedures or triggers, simplified field types, optimised for the common subset of functionality used by web apps. MySQL’s Firefox? # 23rd July 2008, 12:30 am

Historically the project policy has been to avoid putting replication into core PostgreSQL, so as to leave room for development of competing solutions [...] However, it is becoming clear that this policy is hindering acceptance of PostgreSQL to too great an extent, compared to the benefit it offers to the add-on replication projects. Users who might consider PostgreSQL are choosing other database systems because our existing replication options are too complex to install and use for simple cases.

Tom Lane # 7th July 2008, 2:08 pm

Sun To Acquire MySQL. Sun also employ Josh Berkus, one of the lead developers of PostgreSQL. # 16th January 2008, 1:55 pm

What You Need To Know About Amazon SimpleDB. Amazon have finally launched the database component of their web service suite. It fits a bunch of current trends: key/value pairs, schemaless, built on top of Erlang. “Eventual consistency” is an interesting characteristic. # 14th December 2007, 11:21 am

PostgreSQL 8.3 beta 4 release notes. In addition to the huge speed improvements, 8.3 adds support for XML, UUID and ENUM data types and brings full text (tsearch2) in to the core database engine. # 12th December 2007, 12:43 am

Django Evolution. Really smart take on the problem of updating database tables to reflect changes to Django models. Code that automatically modifies your database tables can be pretty scary, but Evolution seems to hit the right balance. # 23rd November 2007, 11:49 pm

CouchDB “Joins”. Different approaches to indexing a blog post and its associated comments in the non-relational CouchDB. # 25th October 2007, 8:27 am

CouchDB: Thinking beyond the RDBMS. CouchDB is a fascinating project—an Erlang powered non-relational database with a JSON API that lets you define “views” (really computed tables) based on JavaScript functions that execute using map/reduce. Damien Katz, the main developer currently works for MySQL and used to work on Lotus Notes. # 3rd September 2007, 9:48 am

SELECT * FROM everything, or why databases are awesome. I’m beginning to think that for scalable applications the thinner your ORM is the better—if you even use one at all. # 22nd June 2007, 12:40 am