Simon Willison’s Weblog


Giving away the index

4th May 2005

My final year project is due in two weeks, and I’m going to be running on silent for most of them. I have, however, upgraded to Tiger and playing with Spotlight has given me plenty to think about.

Giving away the index

The great benefit of having an electronic version of a book you own in dead-tree format to hand is that you can search it. Publishers generally don’t hand out free digital copies because, well, they want you to buy the books, not freely distribute electronic copies.

The thing is, you don’t need a digital copy of a book to be able to search it; you just need a full-text index of it (if you don’t understand what this means, go and read Tim Bray’s series On Search). An index isn’t enough to reconstruct the book, but it is enough to answer questions like "on what pages of Eric Meyer on CSS are float layouts discussed?"

Imagine if technical publishers made binary full-text index files of their titles available for download, for free in some kind of open standard format. Readers could query them using Spotlight or similar technologies, and gain the ability to search the titles they own all without needing to rely on centralised, artificially limited services such as Amazon’s Search Inside the Book.

O’Reilly, I’m looking at you.

Full-text phishing

On a darker note, one thing about Spotlight that has given me pause is the immense ease with which it can uncover passwords saved amongst my email. Lost password reminders, new account details, invitations to sign up for services—they’re all hidden away in my mail archive. Spotlight makes it trivial to dig them back up again, and offers the APIs for applications to do so as well. Combine this with a piece of spyware / some trojan horse and you’ve got the ultimate vector for phishing attacks.

This problem isn’t limited to Macs either; Google and MSN’s Desktop Search engines could be used for much the same purpose, and full-text search is bound to end up built in to Windows sooner or later. For the moment, the safest thing to do is either delete those pesky emails or move them to a folder that is excluded from Spotlight’s index. Somehow I doubt many people will think to take such precautions.

And with that off my chest, it’s time to get back to my dissertation.

This is Giving away the index by Simon Willison, posted on 4th May 2005.

Next: Fighting RFCs with RFCs

Previous: A Firefox observation

Previously hosted at