Simon Willison’s Weblog

Blogmarks in Jun, 2009

Filters: Type: blogmark × Year: 2009 × Month: Jun ×


Using Mongo for Real-Time Analytics. MongoDB supports an “upsert” query, which when combined with the $inc operator can cause counter fields to be incremented if they exist and created otherwise. This makes it a great fit for real-time analytics applications (one increment per page view), something that regular relational databases aren’t particularly good at. # 30th June 2009, 7:28 pm

MongoDB. Lots of discussions about this at EuroPython today—it’s a document database, very similar to CouchDB but significantly faster and suggested for production use. Best of all, trying it out on OS X is as easy as extracting the tarball and running “bin/mongod --dbpath /tmp/test-mongo-db run”. # 30th June 2009, 7:13 pm

Firefox 3.5 for developers. It’s out today, and the feature list is huge. Highlights include HTML 5 drag ’n’ drop, audio and video elements, offline resources, downloadable fonts, text-shadow, CSS transforms with -moz-transform, localStorage, geolocation, web workers, trackpad swipe events, native JSON, cross-site HTTP requests, text API for canvas, defer attribute for the script element and TraceMonkey for better JS performance! # 30th June 2009, 6:08 pm

cache-money. A “write-through caching library for ActiveRecord”, maintained by Nick Kallen from Twitter. Queries hit memcached first, and caches are automatically kept up-to-date when objects are created, updated and deleted. Only some queries are supported—joins and comparisons won’t hit the cache, for example. # 28th June 2009, 3:17 pm

Twitter, an Evolving Architecture. The most detailed write-up of Twitter’s current architecture I’ve seen, explaining the four layers of cache (all memcached) used by the Twitter API. # 28th June 2009, 3:09 pm

BashReduce. Map/Reduce in Bash is no longer a joke project (if it ever was)—Richard Crowley is extending it and using it for analysis at OpenDNS. # 28th June 2009, 3:03 pm

What’s New In Python 3.1. Lots of stuff, but the best bits are an ordered dictionary type (congrats, Armin), a Counter class for counting unique items in an iterable (I do this on an almost daily basis) and a bunch of performance improvements including a rewrite of the Python 3.0 IO system in C. # 28th June 2009, 3:02 pm

The Resource Expert Droid. Like the HTML Validator but for your server’s HTTP headers—extremely useful. # 25th June 2009, 10:06 am

Four crowdsourcing lessons from the Guardian’s (spectacular) expenses-scandal experiment. Michael Andersen from the Nieman Journalism Lab interviewed me about the MP expenses crowdsourcing site. # 24th June 2009, 3:31 pm

Test-Driven Heresy. Tim Bray advocates TDD for maintenance development, but argues that it may not be as useful during the exploratory, greenfield development phase of a project. # 24th June 2009, 11:03 am

To Sprite Or Not To Sprite. CSS sprite images are decompressed to full bitmaps by browsers before they are rendered, so sprite files with large numbers of pixels will dramatically increase the memory footprint of your site. # 24th June 2009, 10:33 am

Google asked people in Times Square:“What is a browser?”. Stuff like this makes me despair for creating a secure web—what chance do people have of surfing safely if they don’t understand browsers, web sites, operating systems, DNS, URLs, SSL, certificates... # 20th June 2009, 1:25 am

The breakneck race to build an application to crowdsource MPs’ expenses. Charles Arthur wrote up a very nice piece on the development effort behind the Guardian’s crowdsourcing expenses app. # 19th June 2009, 10:16 pm

Towards a Standard for Django Session Messages. I completely agree that Django’s user.message_set (which I helped design) is unfit for purpose, but I don’t think sessions are the right solution for messages sent to users. A signed cookie containing either the full message or a key referencing the message body on the server is a much more generally useful solution as it avoids the need for a round trip to a persistent store entirely. # 19th June 2009, 9:57 pm

Unimpressed by NodeIterator. John Resig, one of the most talented API designers I’ve ever come across, posts some well earned criticism of the document.createNodeIterator DOM traversal API. # 19th June 2009, 9:53 pm

Investigate your MP’s expenses. Launched today, this is the project that has been keeping me ultra-busy for the past week—we’re crowdsourcing the analysis of the 700,000+ scanned MP expenses documents released this morning. It’s the Guardian’s first live Django-powered application, and also the first time we’ve hosted something on EC2. # 18th June 2009, 11:16 pm

C64 Twitter client. Awesome. # 17th June 2009, 9:14 am

Jython 2.5.0 Final is out! It’s been a long time coming—congratulations to the team. # 16th June 2009, 11:21 pm

SWFUpload jQuery Plugin. Nice looking plugin around an invisible Flash shim that provides multiple file uploads and client-side progress indicators. # 16th June 2009, 11:46 am

Opera Unite. Opera’s big announcement: a developer preview (“labs release”) of their new web-server-in-your-browser feature, Unite. Includes an Opera-hosted proxy to help break through your firewall. The web server can be customised using server-side JavaScript running in an Opera Widget. # 16th June 2009, 11 am

Mr. Penumbra’s Twenty-Four-Hour Book Store. Enormously entertaining short story about data visualisation and creepy San Francisco bookshops by Robin Sloan. # 12th June 2009, 6:07 pm

Dealing with election results data. Alf Eaton loaded the Guardian’s European election results spreadsheet in to Google’s new Fusion Tables tool. # 12th June 2009, 6:06 pm

The GIF Pronunciation Page. It’s jiff. Here’s evidence. # 11th June 2009, 10:50 pm

Cryptographic Right Answers. Best practise recommendations for cryptography: “While some people argue that you should never use cryptographic primitives directly and that trying to teach people cryptography just makes them more likely to shoot themselves in their proverbial feet, I come from a proud academic background and am sufficiently optimistic about humankind that I think it’s a good idea to spread some knowledge around.” # 11th June 2009, 10:16 pm

Exactly how well did the BNP do where you live? Guardian journalists spent a day and a half calling round different local authorities to get a proper breakdown of the European election results (which are only officially published in aggregate) and published the results as a spreadsheet on the Datablog. # 11th June 2009, 11:37 am

Exclusive: The Future of Facebook Usernames. I have to admit I was planning to just let Facebook get on with it, assuming that the OpenID provider part would show up of its own accord—but maybe I should write a thoughtful and persuasive essay about it after all. # 11th June 2009, 9:46 am

Styling buttons to look like links. Nat has a neat trick for styling submit buttons to look like regular links—so there’s absolutely no excuse for using a “delete” link when you should be using a POST request. # 10th June 2009, 2:11 am

Augmenting photos—with OSM! “You climbed up a mountain and took a photo ... but it’s 2009! Why doesn’t it have all kind of magic over the top of it.”—Marmota matches your landscape photos to height field data, then overlays data from OpenStreetMap mapped to the contours of the photograph. # 9th June 2009, 11:34 am

The Twitpocalypse is Near: Will Your Twitter Client Survive? Twitter tweet IDs will shortly tick over past the maximum signed 32 bit integer, potentially breaking applications. I learnt this lesson when the same thing happened to Flickr photo IDs: never store numeric IDs from external systems as integers, always use strings. # 9th June 2009, 10:52 am

The Straight Choice | The election leaflet project. Nice crowdsourcing app by Richard Pope, Francis Irving and Julian Todd—UK political leaflets are hard to keep tabs on due to the way they are distributed over small geographical areas, so this site encourages you to take photos of leaflets delivered to your home and tag them with postcode, party and key topics. # 8th June 2009, 4:23 pm