Simon Willison’s Weblog

Subscribe

Items in Jan, 2009

Filters: Year: 2009 × Month: Jan × Sorted by date

All you ever wanted to know about writing bloom filters. This helped me understand a key use case for bloom filters: reducing the impact of the “worst case search is when there are no matching results so everything gets scanned” problem.

# 30th January 2009, 8:26 am / bloom-filters, search, jonathan-ellis

Ehy IE8, I Can Has Some Clickjacking Protection? (via) IE8 has built-in protection against clickjacking, but it’s opt-in (with a custom HTTP header) and IE only. It turns out the usual defence against clickjacking (using framebusting JavaScript) doesn’t work in IE as it can be worked around with a security=“restricted” attribute on an iframe.

# 29th January 2009, 1:39 pm / clickjacking, ie, ie8, http, security, javascript, iframes

Infrastructure for Modern Web Sites. Leonard’s thoughts on what the next generation of web frameworks should aim to provide.

# 29th January 2009, 1:36 pm / frameworks, leonardlin, django, rails, sysadmin, infrastructure

Changeset 9793: SMTP testing documentation. I didn’t know this trick: running “python -m smtpd -n -c DebuggingServer localhost:1025” will start up a simple SMTP server which dumps received e-mails to the terminal instead of forwarding them on.

# 29th January 2009, 1:35 pm / email, smtp, python, testing, django

Giant, Python Powered Robots. “They each weigh about 11 tonnes and have a 17 meter reach. The control system is written in Python, with small sections of C which run in hard-real-time to guarantee safety.”

# 28th January 2009, 10:35 am / python, robots, awesome, realtime

Sharding Counters on Google App Engine. “While the datastore for App Engine scales to support a huge number of entities it is important to note that you can only expect to update any single entity, or entity-group, about five times a second”. This article explains a technique for sharding writes across multiple counters in detail, including a way to keep a memcache counter updated at the same time for faster reads.

# 27th January 2009, 8:27 pm / memcache, caching, sharding, googleappengine, google, appengine, python, scaling

A Year Later, AOL Is Contemplating A Bebo Sale. Ignore the headline and read the actual article—Mike Butcher’s description of how Bebo targeted old fashioned TV advertising agencies while other social networks ignored them completely is riveting.

# 27th January 2009, 6:13 pm / advertising, bebo, aol, mike-butcher, agencies

Switching from scripting languages to Objective C and iPhone: useful libraries. Matt Biddulph collects together some very useful libraries for developers just getting started with Objective-C (though I’m not too keen on the title).

# 27th January 2009, 5:50 pm / matt-biddulph, objectivec, programming, iphone

Dive into Python 3. Mark Pilgrim’s seminal work taught me Python nearly eight years ago. Now he’s updating it to cover Python 3. It’s just a table of contents at the moment, but the chapter on “Packaging Python libraries” has me very excited.

# 26th January 2009, 6:10 pm / python, python3, mark-pilgrim, diveintopython, packaging

Coding Horror: A Scripter at Heart. Sigh. I cannot believe that the false distinction between “scripting” and “programming” is still being discussed.

# 26th January 2009, 6:06 pm / scripting, programming, ffs, flamewars, jeff-atwood

Tagging is like a salt water fish that lots of people thought was pretty and started trying to stick in fresh water tanks. I don’t think it thrives everywhere people have tried to stick it and not everyone who’s tried to clone tagging has gotten all the important parts right.

Les Orchard

# 25th January 2009, 10:29 am / tagging, analogy, les-orchard

OCR and Neural Nets in JavaScript. John dissects the brilliant Greasemonkey script that solves simple captchas using the canvas element and HTML5’s getImageData API.

# 25th January 2009, 12 am / getimagedata, captcha, ocr, javascript, greasemonkey, canvas, john-resig

OWASP: XSS Prevention Cheat Sheet. Comprehensive set of rules for avoiding XSS—there’s a bit more to it than just escaping all output variables, since you have to take markup context in to account.

# 24th January 2009, 11:58 pm / markup, xss, security, owasp

Neil Fraser: Differential Synchronization. Paper describing a robust method for “keeping two or more copies of the same document synchronized with each other in real-time”, over a variable network connection using clever diff algorithms.

# 24th January 2009, 11:57 pm / neal-fraser, paper, algorithms, diff

google-mobwrite. Neal Fraser’s terrifyingly clever differential synchronization algorithm (for SubEthaEdit-style collaboration over the web) is now available as an open source Python and JavaScript library.

# 24th January 2009, 11:55 pm / neal-fraser, mobwrite, google, open-source, python, javascript, collaboration, subethaedit

Crowbar. Headless Gecko/XULRunner which exposes a web service API for screen scraping using a real browser DOM—just pass it the URL of a page and the URL of a screen scraping JavaScript script (a bit like a Greasemonkey user script) and get back RDF/XML.

# 24th January 2009, 11:52 pm / rdf, xml, screenscraping, gecko, xulrunner, mozilla, dom, greasemonkey, webservice, crowbar

FluidDB domain names available early (and free) for Twitter users. It’s interesting how Twitter has revitalised the concept of usernames as first class identifiers. FluidDB hasn’t even launched yet, but it’s allowing people to reserve their Twitter username within the FluidDB system just by following @fluidDB.

# 24th January 2009, 11:44 pm / fluiddb, twitter, usernames, terry-jones, identity

EuroDjangoCon. 4th-6th of May 2009, in Prague. Talk submissions are open now, and registration starts on the 6th of February.

# 24th January 2009, 6:54 pm / eurodjangocon, django, djangocon, conferences, events, prague, python

OpenStreetMap is growing rapidly across all of Africa. Mapping is spreading through local mappers, mappers on vacation, foreign nationals, and remote mapping using satellite imagery. A recent comparison judged that OSM had the most comprehensive coverage of Africa among web mapping services, especially in cities.

Mikel Maron

# 23rd January 2009, 5:13 pm / mikel-maron, openstreetmap, africa, mapping

CSRF is not a security issue for the Web. A well-designed Web service should be capable of receiving requests directed by any host, by design, with appropriate authentication where needed. If browsers create a security issue because they allow scripts to automatically direct requests with stored security credentials onto third-party sites, without any user intervention/configuration, then the obvious fix is within the browser.

Roy Fielding

# 23rd January 2009, 8:14 am / royfielding, csrf, security, credentials, browsers

AJAX APIs Playground. Ferociously useful collection of executable and editable example code for all(?) of Google’s JavaScript APIs, including Google Maps and the increasingly interesting Visualization API.

# 22nd January 2009, 6:38 pm / google, javascript, ajax, google-maps, googlevisualization

New PylonsHQ Site Launches. The new site uses CouchDB instead of a relational database, and the code for the site is open source so you can see how it all works.

# 22nd January 2009, 6:33 pm / couchdb, pylons, python, ben-bangert, nonrelational

Learning to Think Like A Programmer. Outstanding advice aimed mainly at journalists, but important to anyone who collects information for a living and might want it to be automatically processed at some point in the future.

# 22nd January 2009, 6:06 pm / tom-armitage, programming, journalism, data-journalism

jQuery 1.3.1 Released. Bug fix for 1.3, mainly browser compatibility issues. Of interest: jQuery no longer ship a packed version (where JS is used to further decompress a string), as their tests show that this reduces performance due to the overhead of the extra decompression. They still provide a YUI Compressor minified version.

# 22nd January 2009, 10:41 am / minification, jquery, javascript, performance

There's no such thing as a good day to bury bad news any more, the Internet has seen to that.

Tom Steinberg

# 22nd January 2009, 10:20 am / internet, news, tom-steinberg, mysociety

Train Crash Leads LA Times to Create Django Database on Deadline. A story from last September. I didn’t know the LA Times used Django. UPDATE: Yes I did, I introduced their panel about it at DjangoCon. Sorry, mind like a sieve sometimes.

# 21st January 2009, 5:19 pm / latimes, data-journalism, django, newspapers, python

What is django.contrib? I’d add that including a package in django.contrib is a promise that the core development team will ensure that package is updated to work with future versions of Django.

# 20th January 2009, 10:58 am / django, python, djangocontrib, jacob-kaplan-moss

Apple shows us DRM’s true colors. The EFF reviews the various places that Apple still applies DRM (including locking iPhones to carriers, licensing authentication chips for iPod accessory vendors, preventing OS X from loading on generic PCs) and concludes that “the majority of these DRM efforts do not have even an arguable relation to ’piracy.’”

# 18th January 2009, 10:16 am / drm, eff, apple, osx, iphone, ipod, piracy

Rules of Database App Aging. Peter Harkins: All fields become optional, all relationships become many-to-many, chatter always expands. This is why document oriented databases such as CouchDB are looking more and more attractive.

# 18th January 2009, 9:09 am / databases, peter-harkins, couchdb

Eviction, or the Coming Datapocalypse. Jason Scott on AOL’s closure of Hometown, their hosting service. In related news, Lycos just announced they are closing Tripod, which has been providing free hosting for 13 years.

# 18th January 2009, 8:48 am / jason-scott, eviction, archive, tripod, lycos, aol, hometown