Simon Willison’s Weblog

Blogmarks in Jan, 2009

Filters: Type: blogmark × Year: 2009 × Month: Jan ×

All you ever wanted to know about writing bloom filters. This helped me understand a key use case for bloom filters: reducing the impact of the “worst case search is when there are no matching results so everything gets scanned” problem. # 30th January 2009, 8:26 am

Ehy IE8, I Can Has Some Clickjacking Protection? (via) IE8 has built-in protection against clickjacking, but it’s opt-in (with a custom HTTP header) and IE only. It turns out the usual defence against clickjacking (using framebusting JavaScript) doesn’t work in IE as it can be worked around with a security=“restricted” attribute on an iframe. # 29th January 2009, 1:39 pm

Infrastructure for Modern Web Sites. Leonard’s thoughts on what the next generation of web frameworks should aim to provide. # 29th January 2009, 1:36 pm

Changeset 9793: SMTP testing documentation. I didn’t know this trick: running “python -m smtpd -n -c DebuggingServer localhost:1025” will start up a simple SMTP server which dumps received e-mails to the terminal instead of forwarding them on. # 29th January 2009, 1:35 pm

Giant, Python Powered Robots. “They each weigh about 11 tonnes and have a 17 meter reach. The control system is written in Python, with small sections of C which run in hard-real-time to guarantee safety.” # 28th January 2009, 10:35 am

Sharding Counters on Google App Engine. “While the datastore for App Engine scales to support a huge number of entities it is important to note that you can only expect to update any single entity, or entity-group, about five times a second”. This article explains a technique for sharding writes across multiple counters in detail, including a way to keep a memcache counter updated at the same time for faster reads. # 27th January 2009, 8:27 pm

A Year Later, AOL Is Contemplating A Bebo Sale. Ignore the headline and read the actual article—Mike Butcher’s description of how Bebo targeted old fashioned TV advertising agencies while other social networks ignored them completely is riveting. # 27th January 2009, 6:13 pm

Switching from scripting languages to Objective C and iPhone: useful libraries. Matt Biddulph collects together some very useful libraries for developers just getting started with Objective-C (though I’m not too keen on the title). # 27th January 2009, 5:50 pm

Dive into Python 3. Mark Pilgrim’s seminal work taught me Python nearly eight years ago. Now he’s updating it to cover Python 3. It’s just a table of contents at the moment, but the chapter on “Packaging Python libraries” has me very excited. # 26th January 2009, 6:10 pm

Coding Horror: A Scripter at Heart. Sigh. I cannot believe that the false distinction between “scripting” and “programming” is still being discussed. # 26th January 2009, 6:06 pm

OCR and Neural Nets in JavaScript. John dissects the brilliant Greasemonkey script that solves simple captchas using the canvas element and HTML5’s getImageData API. # 25th January 2009, 12 am

OWASP: XSS Prevention Cheat Sheet. Comprehensive set of rules for avoiding XSS—there’s a bit more to it than just escaping all output variables, since you have to take markup context in to account. # 24th January 2009, 11:58 pm

Neil Fraser: Differential Synchronization. Paper describing a robust method for “keeping two or more copies of the same document synchronized with each other in real-time”, over a variable network connection using clever diff algorithms. # 24th January 2009, 11:57 pm

google-mobwrite. Neal Fraser’s terrifyingly clever differential synchronization algorithm (for SubEthaEdit-style collaboration over the web) is now available as an open source Python and JavaScript library. # 24th January 2009, 11:55 pm

Crowbar. Headless Gecko/XULRunner which exposes a web service API for screen scraping using a real browser DOM—just pass it the URL of a page and the URL of a screen scraping JavaScript script (a bit like a Greasemonkey user script) and get back RDF/XML. # 24th January 2009, 11:52 pm

FluidDB domain names available early (and free) for Twitter users. It’s interesting how Twitter has revitalised the concept of usernames as first class identifiers. FluidDB hasn’t even launched yet, but it’s allowing people to reserve their Twitter username within the FluidDB system just by following @fluidDB. # 24th January 2009, 11:44 pm

EuroDjangoCon. 4th-6th of May 2009, in Prague. Talk submissions are open now, and registration starts on the 6th of February. # 24th January 2009, 6:54 pm

AJAX APIs Playground. Ferociously useful collection of executable and editable example code for all(?) of Google’s JavaScript APIs, including Google Maps and the increasingly interesting Visualization API. # 22nd January 2009, 6:38 pm

New PylonsHQ Site Launches. The new site uses CouchDB instead of a relational database, and the code for the site is open source so you can see how it all works. # 22nd January 2009, 6:33 pm

Learning to Think Like A Programmer. Outstanding advice aimed mainly at journalists, but important to anyone who collects information for a living and might want it to be automatically processed at some point in the future. # 22nd January 2009, 6:06 pm

jQuery 1.3.1 Released. Bug fix for 1.3, mainly browser compatibility issues. Of interest: jQuery no longer ship a packed version (where JS is used to further decompress a string), as their tests show that this reduces performance due to the overhead of the extra decompression. They still provide a YUI Compressor minified version. # 22nd January 2009, 10:41 am

Train Crash Leads LA Times to Create Django Database on Deadline. A story from last September. I didn’t know the LA Times used Django. UPDATE: Yes I did, I introduced their panel about it at DjangoCon. Sorry, mind like a sieve sometimes. # 21st January 2009, 5:19 pm

What is django.contrib? I’d add that including a package in django.contrib is a promise that the core development team will ensure that package is updated to work with future versions of Django. # 20th January 2009, 10:58 am

Apple shows us DRM’s true colors. The EFF reviews the various places that Apple still applies DRM (including locking iPhones to carriers, licensing authentication chips for iPod accessory vendors, preventing OS X from loading on generic PCs) and concludes that “the majority of these DRM efforts do not have even an arguable relation to ’piracy.’” # 18th January 2009, 10:16 am

Rules of Database App Aging. Peter Harkins: All fields become optional, all relationships become many-to-many, chatter always expands. This is why document oriented databases such as CouchDB are looking more and more attractive. # 18th January 2009, 9:09 am

Eviction, or the Coming Datapocalypse. Jason Scott on AOL’s closure of Hometown, their hosting service. In related news, Lycos just announced they are closing Tripod, which has been providing free hosting for 13 years. # 18th January 2009, 8:48 am

Load Windows ICO files. Apparently PIL has trouble with the most recent versions of the windows .ico format (Vista now embeds PNG images in them)—this clever function deals with the differences and gives back a PIL Image object. # 17th January 2009, 9:48 pm

Project Voldemort. Yet Another “big, distributed, persistent, fault-tolerant hash table”—this time from LinkedIn, released under the Apache 2.0 license. The approach to consistency is interesting—instead of using distributed transactions, they use versioning and “resolve inconsistencies at read time”. It also uses consistent hashing (as seen in libketama) to select servers. The design document has lots more information. # 17th January 2009, 7:45 pm

Raising Octopus from Eggs (via) I love that forums like this exist. # 17th January 2009, 2:59 pm

US economic data spreadsheets from the Guardian. At the Guardian we’ve just released a bunch of economic data about the US painstakingly collected by Simon Rogers, our top data journalist, as Google Docs spreadsheets. Get your data here. # 16th January 2009, 6:17 pm