Simon Willison’s Weblog

Items in Feb, 2010

Filters: Year: 2010 × Month: Feb ×

Notes from a production MongoDB deployment. Notes from running MongoDB for 8 months in production, with 664 million documents spread across 72 GB master and slave servers located in two different data centers. # 28th February 2010, 11:05 pm

Node.js, redis, and resque (via) Paul Gross has been experimenting with Node.js proxies for allowing web applications to be upgraded without missing any requests. Here he places all incoming HTTP requests in a redis queue, then has his backend Rails servers consume requests from the queue and push the responses back on to a queue for Node to deliver. When the backend application is upgraded, requests remain in the queue and users see a few seconds of delay before their request is handled. It’s not production ready yet (POST requests aren’t handled, for example) but it’s a very interesting approach. # 28th February 2010, 11:02 pm

Johnny Cache. Clever twist on ORM-level caching for Django. Johnny Cache (great name) monkey-patches Django’s QuerySet classes and caches the result of every single SELECT query in memcached with an infinite expiry time. The cache key includes a “generation” ID for each dependent database table, and the generation is changed every single time a table is updated. For apps with infrequent writes, this strategy should work really well—but if a popular table is being updated constantly the cache will be all but useless. Impressively, the system is transaction-aware—cache entries created during a transaction are held in local memory and only pushed to memcached should the transaction complete successfully. # 28th February 2010, 10:55 pm

Unit Testing Achievements. A plugin for Python’s nose test runner that adds achievements—“Night Shift: Make a failing suite pass between 12am and 5am.” # 28th February 2010, 3:56 pm

ClearMaps: A Mapping Framework for Data Visualization. An open source library for map visualisations using ActionScript, with an Adobe AIR based encoding tool for translating data from shapefiles in to vector data suitable for use with the library. # 28th February 2010, 3:52 pm

kriszyp’s node-promise. Another elegant approach to managing asynchronous flows in Node, including running things both in parallel and serial. # 28th February 2010, 3:50 pm

Django Advent: Scaling Django. Mike Malone’s advice on scaling Django applications, including taking advantage of new features in 1.2. # 26th February 2010, 7:22 pm

PiCloud. An interesting twist on cloud computing for Python. “import cloud;, arguments)” serialises my_function and its arguments, pushes it up to one of their EC2 servers and hands you back a job ID which you can poll (or block on) for a response. They suggest using it for long running tasks such as web crawling or image processing. # 26th February 2010, 6:25 pm

Internet Explorer Cookie Internals (FAQ). Grr... IE 6, 7 and 8 don’t support the max-age cookie argument, forcing you to use an explicit expiry date instead. This appears to affect the cache busting cookie pattern, where you set a cookie to expire in 30 seconds for any user who posts content and use the presence of that cookie to skip caches and/or send their queries to a master instead of slave database. If you have to use expires, users with incorrect system clocks may get inconsistent results. Anyone know of a workaround? # 26th February 2010, 12:25 pm

Using Bash’s History Effectively. The HISTIGNORE environment variable is particularly useful, allowing you to suppress certain commands by specifying a pattern. This article has a tip for causing a command to be omitted from the history if you prefix it with a space. # 25th February 2010, 12:21 pm

Hg Init. Joel Spolsky’s guide to Mercurial. If you’re still using a non-distributed version control system like Subversion, this is a great introduction to the new world order. # 25th February 2010, 10:31 am

“Do” it fast! Tim Caswell’s Do library has been upgraded for compatibility with Node v0.1.30, and now has a clever Do.convert() method which wraps Node’s low-level APIs with the Do libraries “continuable” abstraction. # 22nd February 2010, 7:02 pm

node-v0.1.30 (via) A very significant new release of Node.js: the Twisted/Dojo-style Promise abstraction has been removed entirely, causing backwards incompatible changes to a bunch of core APIs. This means the pseudo-blocking Promise.wait() method is gone too, making it even harder to accidentally block your event loop. Instead, user-level libraries are encouraged to add Promise-style abstractions. I’m pleased to see Node sticking to the low-level stuff. # 22nd February 2010, 7 pm

Ryan Tomayko on Github’s development process. In the comments—a fascinating insight in to how GitHub’s “developers work on whatever is most interesting to them” process manages to achieve really good results. # 22nd February 2010, 9:18 am

jacobian’s django-deployment-workshop. Notes and resources from Jacob’s 3 hour Django deployment workshop at PyCon, including example configuration files for Apache2 + mod_wsgi, nginx, PostgreSQL and pgpool. # 19th February 2010, 2:28 pm

Making Facebook 2x Faster. Facebook have a system called BigPipe which allows them to progressively send their pages to the browser as the server-side processing completes to optimise client loading time. Anyone reverse engineered this yet to figure out how they actually do it? # 19th February 2010, 9:14 am

Search Engine Time Machine. Detailed explanation of how ElasticSearch provides high availability, through clever sharding and replication strategies and configurable gateways for long-term persistent storage. # 17th February 2010, 10:32 pm

The Case For An Older Woman. OK Cupid’s fascinating statistics blog uses cleverly plotted aggregate data from the dating site to illustrate the difference in age tastes between the genders (men try to date younger women) and show why that might not be the best strategy. An infographics tour-de-force. # 17th February 2010, 10:20 pm

do. A library for Node that adds a higher level abstraction for dealing with chained and parallel callbacks. # 17th February 2010, 5:43 pm

How To Node. New blog about Node.js, with a superb series of tutorials aimed at both experienced and new JavaScript developers. The stuff on managing callbacks (including running them in both series and parallel) is pretty eye-opening. # 17th February 2010, 5:42 pm

Werewolf: How a parlour game became a tech phenomenon. The legendary “everyone’s a villager” game from Foo Camp ’08 gets a write-up in Wired. # 17th February 2010, 5:30 pm

A new global visual language for the BBC’s digital services. Detailed explanation of the BBC’s new “visual language” for their digital properties. # 17th February 2010, 12:34 pm

Django Advent. I can’t believe I haven’t linked to this already—Django Advent is “a series of articles about upcoming releases of the Django web framework”. Seven have been posted so far, covering topics from 1.2 including multi-db, messages, object permissions and natural keys. # 16th February 2010, 4:06 pm

A Collection Of Redis Use Cases. Lots of interesting case studies here, collated by Mathias Meyer. Redis clearly shines for anything involving statistics or high volumes of small writes. # 16th February 2010, 3:04 pm

The magic of sub-editors. A neat illustration of how sub-editors work their magic, using the original article with strikes through the parts that were edited out. # 16th February 2010, 10:44 am

Some questions about the “blocking” of HTML5

Some background reading. I was planning to fill in answers as they arrive, but I screwed up the moderation of the comments and got flooded with detailed responses—I strongly recommend reading the comments.

[... 136 words]

The Widening HTML5 Chasm. Simon St. Laurent’s commentary on the HTML5/Adobe situation. The most interesting piece I’ve read on it so far. # 15th February 2010, 9:51 pm

No part of HTML5 is, or was ever, “blocked” in the W3C HTML Working Group -- not HTML5, not Canvas 2D Graphics, not Microdata, not Video -- not by me, not by Adobe. Neither Adobe nor I oppose, are fighting, are trying to stop, slow down, hinder, oppose, or harm HTML5, Canvas 2D Graphics, Microdata, video in HTML, or any of the other significant features in HTML5. Claims otherwise are false. Any other disclaimers needed?

Larry Masinter # 15th February 2010, 9:31 pm

At this point all I could honestly tell you from the point of view of the editor of several of the HTML5 documents being held up is that the W3C have said they’re won’t publish without the objections being resolved, and that the objection is from Adobe. I can’t even tell what I could do to resolve the objection. It seems to be entirely a process-based objection.

Ian Hickson # 15th February 2010, 7:38 pm

A day on Chatroulette, the web’s weirdest new outpost. By the Guardian’s Bobbie Johnson. # 15th February 2010, 11:15 am