Simon Willison’s Weblog

Subscribe

April 2009

April 10, 2009

Experiences deploying a large-scale infrastructure in Amazon EC2. “At OpenX we recently completed a large-scale deployment of one of our server farms to Amazon EC2. Here are some lessons learned from that experience.”

# 9:43 am / openx, amazonec2, ec2, amazon, scaling, griggheorghiu

Digg Search: Now With 99.987% Less Suck. Really nice implementation of faceted search, still using Lucene and Solr under the hood.

# 10:17 pm / search, lucene, solr, digg, fulltext, facets

April 11, 2009

Using Scala with Google App Engine. Scala works, but I haven’t seen confirmation on actors yet (which are likely to break due to their dependency on threads).

# 3:28 pm / scala, java, appengine, threads, google

rev=canonical bookmarklet and designing shorter URLs

I’ve watched the proliferation of URL shortening services over the past year with a certain amount of dismay. I care about the health of the web and try to ensure that URLs I am responsible will last for as long as possible, and I think it’s very unlikely that all of these new services will still be around in twenty years time. Last month I suggested that the Internet Archive start mirroring redirect databases, and last week I was pleased to hear that Archiveteam, a different organisation, had already started crawling.

[... 920 words]

April 12, 2009

Revving up. Jeremy Keith advocates adding the revcanonical attribute to regular A elements as well as / instead of hiding it in the head of the document, following the microformats design principle that invisible metadata is less valuable than augmenting visible links. I’ve updated my shorten bookmarklet to handle this case.

# 12:29 pm / jeremy-keith, microformats, revcanonical, metadata

A rev=“canonical” HTTP Header. Chris Shiflett proposes optionally exposing rev=canonical information in an HTTP header, thus allowing sites to discover shorter URLs using just a HEAD request and removing the need to parse HTML. The pingback specification also uses this shortcut.

# 12:33 pm / pingback, chris-shiflett, head, http, headers, revcanonical

Running Rhino and Helma NG on Google App Engine. Helma NG is a JavaScript web app framework, which now works on App Engine out of the box.

# 12:52 pm / appengine, helmang, helma, javascript, google, rhino

Tweenbots: Cute Beats Smart. How do you build a robot that can get from one end of Washington Square Park to the other without your help? Give it a cute smile and a sign explaining where it’s going and rely on strangers to point it in the right direction along the way.

# 1:47 pm / tweenbots, cute, robots

The App Store has an inscrutable, time-consuming, whim-dependent approval process. The App Store newsgroup postings are full of angry claims that this is a bug, but I bet it's a feature. If you can't get an app approved until it's working perfectly, and you have to wait a week or two -- or more -- between approval rounds, you're much more likely to put a lot more effort in up front to get it right.

Marc Hedlund

# 1:49 pm / apple, appstore, iphone, marchedlund

We’re using the same trick on flic.kr to avoid having to maintain a look up database, though we’re using base 58.

Kellan Elliott-McCrea

# 4 pm / base58, flickr, kellan-elliott-mccrea, revcanonical, urls

17-year-old claims responsibility for Twitter worm. It was a text book XSS attack—the URL on the user profile wasn’t properly escaped, allowing an attacker to insert a script element linking out to externally hosted JavaScript which then used Ajax to steal any logged-in user’s anti-CSRF token and use it to self-replicate in to their profile.

# 7:22 pm / xss, twitter, security, csrf, worms

April 13, 2009

django-shorturls. Jacob took my self-admittedly shonky shorter URL code and turned it in to a proper reusable Django application.

# 9:31 am / jacob-kaplan-moss, python, django, revcanonical, djangoshorturls

I like rev=“canonical”. Les Orchard summarises the current debate over what colour to paint the rev=“canonical” bikeshed.

# 10:41 am / revcanonical, les-orchard, urls

favikon.com. Small, easy to use online favicon generator.

# 12:09 pm / favikon, favicon

How to cause moral outrage from the entire Internet in ten lines of code. Looks legit—the author claims to have sparked this weekend’s #amazonfail moral outrage (where Amazon where accused of removing Gay and Lesbian books from their best seller rankings) by exploiting a CSRF hole in Amazon’s “report as inappropriate” feature to trigger automatic takedowns. EDIT: His claim is disputed elsewhere (see comments)

# 7:48 pm / csrf, amazon, amazonfail, prdisaster, security

tinyarchive.org. Blaine Cook’s archive of 301 and 302 redirects—needs to be automatically updated by a crawler for it to be really useful though.

# 9:57 pm / tinyarchive, urls, archive, tinyurls

April 14, 2009

Amazon Says Listing Problem Was an Error, Not a Hack (via) “A friend within the company told him that someone working on Amazon’s French site mistagged a number of keyword categories, including the ’Gay and Lesbian’ category, as pornographic, using what’s known internally as the Browse Nodes tool. Soon the mistake affected Amazon sites worldwide.”

# 8:32 am / security, amazon, amazonfail, csrf

Visualising Sorting Algorithms. Aldo Cortesi dislikes animations of sorting algorithms, so he designed a beautiful technique for statically visualising them instead (using Python and Cairo to generate the images).

# 8:55 am / aldo-cortesi, python, cairo, sorting, algorithms, visualisation

You guys are moving on this stuff too fast! Welcome to 2002, when lots of us had more spare time than employment and we deployed new crap like this on our blogs and sites daily.

Les Orchard

# 8:57 am / les-orchard, nostalgia, revcanonical

Reducing XSS by way of Automatic Context-Aware Escaping in Template Systems (via) The Google Online Security Blog reminds us that simply HTML-escaping everything isn’t enough—the type of escaping needed depends on the current markup context, for example variables inside JavaScript blocks should be escaped differently. Google’s open source Ctemplate library uses an HTML parser to keep track of the current context and apply the correct escaping function automatically.

# 9:26 am / html, google, ctemplate, django, escaping, open-source, security, xss

Counting the ways that rev=“canonical” hurts the Web. Mark Nottingham complains about misapplied trust (a page can falsely claim to be the canonical URL for another page), the easy confusion between rev and rel and the lack of discussion with relevant communities.

# 2:11 pm / mark-nottingham, revcanonical, standards, urls

London’s abandoned Underground Stations on Google Street View. “The network is littered with buildings that belonged to stations that closed their doors to the public because routes were changed and diverted, or because there was just too little traffic to make them viable. Here are some of the remnants of disused Underground stations that you can see on Google’s Street View of London.”

# 2:51 pm / google, martinbelam, streetview, underground, london

We did some studies and found that the attribute was almost never used, and most of the time, when it was used, it was a typo where someone meant to write rel="" but wrote rev="". To be precise, the most commonly used value was rev="made", which is equivalent to rel="author" and thus was not a convincing use case. The second most common value was rev="stylesheet", which is meaningless and obviously meant to be rel="stylesheet".

Ian Hickson

# 4:34 pm / hixie, html5, ian-hickson, markup, rev, revcanonical

April 15, 2009

10 Cool Things We’ll Be Able To Do Once IE6 Is Dead. Highlights include child and attribute selectors, 24bit PNGs and max-width and min-width. Simple pleasures, but I can hardly wait.

# 2:17 pm / ie6, maxwidth, browsers, minwidth, css, pngs, selectors, standards, brothercake

April 16, 2009

(Yet) Another DiggBar Update. Digg are responding in exactly the right way in my opinion—the DiggBar will start returning 301 redirects for anonymous users, while users who are logged in to Digg can opt-out of the feature if they want to (usage statistics show that most Digg users are fine with the feature).

# 12:50 am / digg, diggbar, urls, redirects

Developing Django apps with zc.buildout. Jacob went ahead and actually documented one of Python’s myriad of packaging options.

# 9:50 am / jacob-kaplan-moss, zcbuildout, python, packaging

April 17, 2009

Cross Browser Base64 Encoded Images Embedded in HTML (via) Scarily clever. View the PHP source to see what’s going on—most browsers get image tags that use data URIs starting with data:image/png;base64, but IE gets served a Content-type:message/rfc822 header and a MIME formatted multipart/related document, as used by e-mail clients to embed inline image attachments.

# 4:12 pm / hedgerwang, base64, browsers, ie, internet-explorer, mime, php

Installing CouchDB from source on OS X. So far I’ve just been playing with it in an Ubuntu virtual machine.

# 4:22 pm / osx, couchdb, building, ubuntu

Drop ACID and think about data. I’ve been very impressed with the quality and speed with which the PyCon 2009 videos have been published. Here’s Bob Ippolito on distributed databases and key/value stores.

# 5:13 pm / bobippolito, python, acid, databases, data, pycon, pycon2009

Paul Buchheit: Make your site faster and cheaper to operate in one easy step. Paul promotes gzip encoding using nginx as a proxy, and mentions that FriendFeed use a “custom, epoll-based python server” as their application server. Does that mean that they’re serving their real-time comet feeds directly from Python?

# 5:19 pm / paul-buchheit, python, friendfeed, nginx, comet, epoll, gzip

2009 » April

MTWTFSS
  12345
6789101112
13141516171819
20212223242526
27282930