Giving away the index
My final year project is due in two weeks, and I’m going to be running on silent for most of them. I have, however, upgraded to Tiger and playing with Spotlight has given me plenty to think about.
Giving away the index
The great benefit of having an electronic version of a book you own in dead-tree format to hand is that you can search it. Publishers generally don’t hand out free digital copies because, well, they want you to buy the books, not freely distribute electronic copies.
The thing is, you don’t need a digital copy of a book to be able to search it; you just need a full-text index of it (if you don’t understand what this means, go and read Tim Bray’s series On Search). An index isn’t enough to reconstruct the book, but it is enough to answer questions like "on what pages of Eric Meyer on CSS are float layouts discussed?"
Imagine if technical publishers made binary full-text index files of their titles available for download, for free in some kind of open standard format. Readers could query them using Spotlight or similar technologies, and gain the ability to search the titles they own all without needing to rely on centralised, artificially limited services such as Amazon’s Search Inside the Book.
O’Reilly, I’m looking at you.
On a darker note, one thing about Spotlight that has given me pause is the immense ease with which it can uncover passwords saved amongst my email. Lost password reminders, new account details, invitations to sign up for services—they’re all hidden away in my mail archive. Spotlight makes it trivial to dig them back up again, and offers the APIs for applications to do so as well. Combine this with a piece of spyware / some trojan horse and you’ve got the ultimate vector for phishing attacks.
This problem isn’t limited to Macs either; Google and MSN’s Desktop Search engines could be used for much the same purpose, and full-text search is bound to end up built in to Windows sooner or later. For the moment, the safest thing to do is either delete those pesky emails or move them to a folder that is excluded from Spotlight’s index. Somehow I doubt many people will think to take such precautions.
And with that off my chest, it’s time to get back to my dissertation.
More recent articles
- Weeknotes: Parquet in Datasette Lite, various talks, more LLM hacking - 4th June 2023
- It's infuriatingly hard to understand how closed models train on their input - 4th June 2023
- ChatGPT should include inline tips - 30th May 2023
- Lawyer cites fake cases invented by ChatGPT, judge is not amused - 27th May 2023
- llm, ttok and strip-tags - CLI tools for working with ChatGPT and other LLMs - 18th May 2023
- Delimiters won't save you from prompt injection - 11th May 2023
- Weeknotes: sqlite-utils 3.31, download-esm, Python in a sandbox - 10th May 2023
- Leaked Google document: "We Have No Moat, And Neither Does OpenAI" - 4th May 2023
- Midjourney 5.1 - 4th May 2023
- Prompt injection explained, with video, slides, and a transcript - 2nd May 2023