Tim Bray on search
I love it when bloggers stick to their word. The other day, while describing a quick Perl hack that really impressed a major client a few years ago, Tim Bray mentioned the following:
Then I turned on Microsoft’s search engine, at that time called Index Server, now I believe called Index Services, which is a pretty nice tool (we don’t have the equivalent in the Open Source world, more on that another time).
And sure enough, he’s just posted the first in a series of essays on full-text search. Go read it: it’s really interesting stuff. Tim’s conclusion is:
What we need is for Apache to come out-of-the-box with a built-in search capability that you just push a button and it works, and it’s fast, and doesn’t need much care and feeding, and it’s internationalized, and it has the right API for when you want to get fancy.
Until that happens, I will happily recommend MySQL’s built in fulltext search indexing for quickly adding a relatively powerful search facility to a site. I use it on this blog and my only real criticism is that it insists on search words of at least 4 letters, which is less than ideal when most of your entries include TLAs. Hopefully they’ll provide a way around this limitation in a future release.
More recent articles
- Understanding GPT tokenizers - 8th June 2023
- Weeknotes: Parquet in Datasette Lite, various talks, more LLM hacking - 4th June 2023
- It's infuriatingly hard to understand how closed models train on their input - 4th June 2023
- ChatGPT should include inline tips - 30th May 2023
- Lawyer cites fake cases invented by ChatGPT, judge is not amused - 27th May 2023
- llm, ttok and strip-tags - CLI tools for working with ChatGPT and other LLMs - 18th May 2023
- Delimiters won't save you from prompt injection - 11th May 2023
- Weeknotes: sqlite-utils 3.31, download-esm, Python in a sandbox - 10th May 2023
- Leaked Google document: "We Have No Moat, And Neither Does OpenAI" - 4th May 2023
- Midjourney 5.1 - 4th May 2023