Giving away the index
4th May 2005
My final year project is due in two weeks, and I’m going to be running on silent for most of them. I have, however, upgraded to Tiger and playing with Spotlight has given me plenty to think about.
Giving away the index
The great benefit of having an electronic version of a book you own in dead-tree format to hand is that you can search it. Publishers generally don’t hand out free digital copies because, well, they want you to buy the books, not freely distribute electronic copies.
The thing is, you don’t need a digital copy of a book to be able to search it; you just need a full-text index of it (if you don’t understand what this means, go and read Tim Bray’s series On Search). An index isn’t enough to reconstruct the book, but it is enough to answer questions like "on what pages of Eric Meyer on CSS are float layouts discussed?"
Imagine if technical publishers made binary full-text index files of their titles available for download, for free in some kind of open standard format. Readers could query them using Spotlight or similar technologies, and gain the ability to search the titles they own all without needing to rely on centralised, artificially limited services such as Amazon’s Search Inside the Book.
O’Reilly, I’m looking at you.
Full-text phishing
On a darker note, one thing about Spotlight that has given me pause is the immense ease with which it can uncover passwords saved amongst my email. Lost password reminders, new account details, invitations to sign up for services—they’re all hidden away in my mail archive. Spotlight makes it trivial to dig them back up again, and offers the APIs for applications to do so as well. Combine this with a piece of spyware / some trojan horse and you’ve got the ultimate vector for phishing attacks.
This problem isn’t limited to Macs either; Google and MSN’s Desktop Search engines could be used for much the same purpose, and full-text search is bound to end up built in to Windows sooner or later. For the moment, the safest thing to do is either delete those pesky emails or move them to a folder that is excluded from Spotlight’s index. Somehow I doubt many people will think to take such precautions.
And with that off my chest, it’s time to get back to my dissertation.
More recent articles
- Weeknotes: datasette-enrichments, datasette-comments, sqlite-chronicle - 8th December 2023
- Datasette Enrichments: a new plugin framework for augmenting your data - 1st December 2023
- llamafile is the new best way to run a LLM on your own computer - 29th November 2023
- Prompt injection explained, November 2023 edition - 27th November 2023
- I'm on the Newsroom Robots podcast, with thoughts on the OpenAI board - 25th November 2023
- Weeknotes: DevDay, GitHub Universe, OpenAI chaos - 22nd November 2023
- Deciphering clues in a news article to understand how it was reported - 22nd November 2023
- Exploring GPTs: ChatGPT in a trench coat? - 15th November 2023
- Financial sustainability for open source projects at GitHub Universe - 10th November 2023
- ospeak: a CLI tool for speaking text in the terminal via OpenAI - 7th November 2023