Simon Willison’s Weblog

Subscribe

January 2009

Jan. 24, 2009

FluidDB domain names available early (and free) for Twitter users. It’s interesting how Twitter has revitalised the concept of usernames as first class identifiers. FluidDB hasn’t even launched yet, but it’s allowing people to reserve their Twitter username within the FluidDB system just by following @fluidDB.

# 11:44 pm / fluiddb, twitter, usernames, terryjones, identity

Crowbar. Headless Gecko/XULRunner which exposes a web service API for screen scraping using a real browser DOM—just pass it the URL of a page and the URL of a screen scraping JavaScript script (a bit like a Greasemonkey user script) and get back RDF/XML.

# 11:52 pm / rdf, xml, screenscraping, gecko, xulrunner, mozilla, dom, greasemonkey, webservice, crowbar

google-mobwrite. Neal Fraser’s terrifyingly clever differential synchronization algorithm (for SubEthaEdit-style collaboration over the web) is now available as an open source Python and JavaScript library.

# 11:55 pm / neal-fraser, mobwrite, google, open-source, python, javascript, collaboration, subethaedit

Neil Fraser: Differential Synchronization. Paper describing a robust method for “keeping two or more copies of the same document synchronized with each other in real-time”, over a variable network connection using clever diff algorithms.

# 11:57 pm / neal-fraser, paper, algorithms, diff

OWASP: XSS Prevention Cheat Sheet. Comprehensive set of rules for avoiding XSS—there’s a bit more to it than just escaping all output variables, since you have to take markup context in to account.

# 11:58 pm / markup, xss, security, owasp

Jan. 25, 2009

OCR and Neural Nets in JavaScript. John dissects the brilliant Greasemonkey script that solves simple captchas using the canvas element and HTML5’s getImageData API.

# 12 am / getimagedata, captcha, ocr, javascript, greasemonkey, canvas, john-resig

Tagging is like a salt water fish that lots of people thought was pretty and started trying to stick in fresh water tanks. I don’t think it thrives everywhere people have tried to stick it and not everyone who’s tried to clone tagging has gotten all the important parts right.

Les Orchard

# 10:29 am / tagging, analogy, les-orchard

Jan. 26, 2009

Coding Horror: A Scripter at Heart. Sigh. I cannot believe that the false distinction between “scripting” and “programming” is still being discussed.

# 6:06 pm / scripting, programming, ffs, flamewars, jeff-atwood

Dive into Python 3. Mark Pilgrim’s seminal work taught me Python nearly eight years ago. Now he’s updating it to cover Python 3. It’s just a table of contents at the moment, but the chapter on “Packaging Python libraries” has me very excited.

# 6:10 pm / python, python3, mark-pilgrim, diveintopython, packaging

Jan. 27, 2009

Switching from scripting languages to Objective C and iPhone: useful libraries. Matt Biddulph collects together some very useful libraries for developers just getting started with Objective-C (though I’m not too keen on the title).

# 5:50 pm / matt-biddulph, objectivec, programming, iphone

A Year Later, AOL Is Contemplating A Bebo Sale. Ignore the headline and read the actual article—Mike Butcher’s description of how Bebo targeted old fashioned TV advertising agencies while other social networks ignored them completely is riveting.

# 6:13 pm / advertising, bebo, aol, mike-butcher, agencies

Sharding Counters on Google App Engine. “While the datastore for App Engine scales to support a huge number of entities it is important to note that you can only expect to update any single entity, or entity-group, about five times a second”. This article explains a technique for sharding writes across multiple counters in detail, including a way to keep a memcache counter updated at the same time for faster reads.

# 8:27 pm / memcache, caching, sharding, googleappengine, google, appengine, python, scaling

Jan. 28, 2009

Giant, Python Powered Robots. “They each weigh about 11 tonnes and have a 17 meter reach. The control system is written in Python, with small sections of C which run in hard-real-time to guarantee safety.”

# 10:35 am / python, robots, awesome, realtime

Jan. 29, 2009

Changeset 9793: SMTP testing documentation. I didn’t know this trick: running “python -m smtpd -n -c DebuggingServer localhost:1025” will start up a simple SMTP server which dumps received e-mails to the terminal instead of forwarding them on.

# 1:35 pm / email, smtp, python, testing, django

Infrastructure for Modern Web Sites. Leonard’s thoughts on what the next generation of web frameworks should aim to provide.

# 1:36 pm / frameworks, leonardlin, django, rails, sysadmin, infrastructure

Ehy IE8, I Can Has Some Clickjacking Protection? (via) IE8 has built-in protection against clickjacking, but it’s opt-in (with a custom HTTP header) and IE only. It turns out the usual defence against clickjacking (using framebusting JavaScript) doesn’t work in IE as it can be worked around with a security=“restricted” attribute on an iframe.

# 1:39 pm / clickjacking, ie, ie8, http, security, javascript, iframes

Jan. 30, 2009

All you ever wanted to know about writing bloom filters. This helped me understand a key use case for bloom filters: reducing the impact of the “worst case search is when there are no matching results so everything gets scanned” problem.

# 8:26 am / bloom-filters, search, jonathan-ellis

2009 » January

MTWTFSS
   1234
567891011
12131415161718
19202122232425
262728293031