Simon Willison’s Weblog

Items in Nov, 2009

Filters: Year: 2009 × Month: Nov ×


Today, Facebook counts 29% of its employees (and growing!) as Hive users. More than half (51%) of those users are outside of Engineering. They come from distinct groups like User Operations, Sales, Human Resources, and Finance. Many of them had never used a database before working here. Thanks to Hive, they are now all data ninjas who are able to move fast and make great decisions with data.

Facebook Data Team # 30th November 2009, 11:30 am

Haystack 1.0 Final Released. I’ve used Haystack on a number of projects recently, and it has proved itself as a completely painless way of adding full-text search (using Solr or Whoosh—I haven’t tried the Xapian backend yet) to a Django ORM powered project in just a few minutes. Congratulations, Daniel + contributors. # 30th November 2009, 8:07 am

CCD. Joe Gregorio on the growingly ubiquitous and disruptive nature of CCDs. If everything has a camera attached to it, what problems can we solve (and what new problems do we introduce)? # 29th November 2009, 9:08 am

Djangopeople JSON parser. Awesome—Andy McKay has compensated for the lack of an official DjangoPeople API by creating a JSONP screen scraped API and hosting it on App Engine. As far as I’m concerned this is an officially supported feature—I’ll make sure future site changes don’t break it, and when I do add an API I’ll try to keep it compatible and help Andy set up redirects. # 28th November 2009, 11:29 am

Perl: Love it, or hate it, but don’t ignore it. Phillip Smith calls me out for omitting Perl from my list of Node.js event loop alternatives (I only mentioned Twisted and EventMachine). No conspiracy here, I’m just not connected enough to the Perl community to know what the popular event loop libraries are. To Perl’s credit, Perlbal was the first piece of software I saw that showed me how a single threaded, event loop based system could massively outperform a threaded alternative. # 27th November 2009, 7:51 am

CouchDB View Cookbook for SQL Jockeys. This demystified CouchDB views for me. From “CouchDB: The Definitive Guide”, the free online manual. # 26th November 2009, 1:20 pm

flXHR. I was looking for something like this recently, glad to see it exists. flXHR is a drop-in replacement for regular XMLHttpRequest which uses an invisible Flash shim to allow cross-domain calls to be made, taking advantage of the Flash crossdomain.xml security model. # 26th November 2009, 12:52 pm

LABjs: new hotness for script loading. Created in collaboration with Steve Souders, LABjs is a JavaScript loading library which makes it easy to have scripts download in parallel while still ensuring that they execute sequentially where required to ensure dependencies are met. It’s unclear how you would decide to use this over concatenating all scripts together in to a single file. # 26th November 2009, 12:28 pm

Request Routing With URI Templates in Node.JS. I quite like this approach (though the implementation is a bit “this” heavy for my taste). JavaScript has no equivalent to Python’s raw strings, so regular expression based routing ala Django ends up being a bit uglier in JavaScript. URI template syntax is more appealing. # 24th November 2009, 9:06 am

Woof—simply exchange files (via) Ultra simple file sharing for local networks: run “woof filename” to start a local web server which will serve up that file, just once, and then terminate. Can also serve up an entire directory as a compressed archive. Written in Python, as a single script which you can drop in to your ~/bin. “woof -s” serves the script itself, so you can easily pass it to someone who has a file you want. # 24th November 2009, 8:44 am

Hacker News thread on Negative Cashback. Is it common practice for online stores with affiliate referral schemes to artificially inflate their prices if they’re going to have to pay out a referral bonus? # 23rd November 2009, 9:44 pm

Negative Cashback from Bing Cashback (via) Some online stores show you a higher price if you click through from Bing—and set a cookie that continues to show you the higher price for the next three months. It’s unclear if this is Bing’s fault—comments on Hacker News report that Google Shopping sometimes suffers from the same problem (POST UPDATED: I originally blamed Bing for this). # 23rd November 2009, 9:24 pm

django-batch-select (via) A smart attempt at solving select_related for many-to-many relationships in Django. Add a custom manager to your model and call e.g. Entry.objects.all()[:10].batch_select(“tags”) to execute two queries—one pulling back the first ten entries and another using an “IN” query against the tags table to pull back all of the tags for those entries in one go. # 23rd November 2009, 4:19 pm

Node.js is genuinely exciting

I gave a talk on Friday at Full Frontal, a new one day JavaScript conference in my home town of Brighton. I ended up throwing away my intended topic (JSONP, APIs and cross-domain security) three days before the event in favour of a technology which first crossed my radar less than two weeks ago.

[... 2009 words]

jQSlickWrap. Clever jQuery plugin which allows text to wrap around irregularly shaped images, by processing the image with canvas and rewriting it as a sequence of floated horizontal bars of different widths. It’s a a modern variant of the the ragged float trick first introduced by Eric Meyer. # 23rd November 2009, 7:44 am

IE 6 and 7 hit by hack attack code. IE6 and 7 have what looks like a buffer overflow vulnerability caused by a strange intersection of CSS, innerHTML and large JavaScript arrays. No exploits in the wild yet but it’s only a matter of time. # 22nd November 2009, 3:38 pm

Major IE8 flaw makes ’safe’ sites unsafe. IE8 has an XSS protection feature which rewrites potentially harmful code in HTML pages—I think it looks for suspicious input in query strings which appears to have been output directly on the page. Unfortunately it turns out there’s a flaw in the feature that can allow attackers to rewrite safe pages to introduce XSS flaws. Google are serving all of their pages with the X-XSS-Protection: 0 header. Until the fix is released, that’s probably a good idea. # 22nd November 2009, 3:34 pm

Programmers don’t use launch-fast-and-iterate out of laziness. They use it because it yields the best results. By obstructing that process, Apple is making them do bad work, and programmers hate that as much as Apple would.

Paul Graham # 19th November 2009, 10:13 pm

Chromium OS User Experience. The 2 minute UI concept video is probably the best way to understand the ideas behind Google’s Chrome OS. # 19th November 2009, 10:12 pm

Authority, historically, gets bestowed on the gatekeepers of information, such as Britannica, universities, newspapers, etc. Everything that can be digitized will be digitized, and will then be available over the internet, which is disruptive, not only to business models, but to authority.

Joe Gregorio # 19th November 2009, 6:53 pm

Simple CouchDB multi-master clustering via Nginx. An impressive combination. CouchDB can be easily set up in a multi-master configuration, where writes to one master are replicated to the other and vice versa. This makes setting up a reliable CouchDB cluster is as simple as putting two such servers behind a single nginx proxy. # 19th November 2009, 4:37 pm

The OS Opportunity. John Gruber repeats his argument that PC makers should create their own OSes, and points out that compatibility concerns are less important than they’ve ever been because “the Web provides us with a core set of software and APIs that work everywhere”. # 19th November 2009, 8:02 am

Announcing Kong: A server description and deployment testing tool. An ultra simple website monitoring tool written in Django which makes it easy to manage a list of Twill scripts for testing different sites. It was developed at the Lawrence Journal-World—Eric showed me a demo if this a year or so ago and I’ve been hoping they would open source it. # 18th November 2009, 12:47 pm

Debugging in Python. The missing manual for Python’s powerful pdb debugger. # 18th November 2009, 12:34 pm

It’s clear that, even those who are privileged by access and wealth and the ability to amplify their own voices have anticipated that we’ll all be disenfranchised by the private companies that own and control our networks of communication. And yet, most of our effort and ambition in the technology industry are not going towards building for the open web.

Anil Dash # 18th November 2009, 9:38 am

Deep Tracing of Internet Explorer. dynaTrace Ajax looks like an awesome tool. For once, Internet Explorer has a development tool that other browsers can be jealous of. # 18th November 2009, 8:06 am

Me and Belle de Jour—’Could it be Brooke?’ (via) Lovely piece of internet detective work and UK blogging history. Darren from LinkMachineGo figured out Belle de Jour’s identity right back in the start, based on his knowledge of the early UK blogging scene. Not only did he keep the secret, but he set up a clever honeypot in the form of an innocuous page containing terms that tied her identities together. When the page started getting hits from an Associated Newspapers (Daily Mail) IP address a few weeks ago he tipped Belle off via Twitter. # 18th November 2009, 12:18 am

About 80 per cent of public sector data mentions a place. Making Ordnance Survey data more freely available will encourage more effective exploitation of public data by businesses, individuals and community organisations.

Stephen Timms, Minister for Digital Britain # 17th November 2009, 6:10 pm

Re-mapping the future for Ordnance Survey—making public data public. “The Prime Minister and Communities Secretary John Denham will today announce that the public will have more access to Ordnance Survey maps from next year, as part of a Government drive to open up data to improve transparency.” # 17th November 2009, 6:09 pm

node.js at JSConf.eu (PDF). node.js creator Ryan Dahl’s presentation at this year’s JSConf.eu. The principle philosophy is that I/O in web applications should be asynchronous—for everything. No blocking for database calls, no blocking for filesystem access. JavaScript is a mainstream programming language with a culture of callback APIs (thanks to the DOM) and is hence ideally suited to building asynchronous frameworks. # 17th November 2009, 6:07 pm