Tim Bray on search
I love it when bloggers stick to their word. The other day, while describing a quick Perl hack that really impressed a major client a few years ago, Tim Bray mentioned the following:
Then I turned on Microsoft’s search engine, at that time called Index Server, now I believe called Index Services, which is a pretty nice tool (we don’t have the equivalent in the Open Source world, more on that another time).
And sure enough, he’s just posted the first in a series of essays on full-text search. Go read it: it’s really interesting stuff. Tim’s conclusion is:
What we need is for Apache to come out-of-the-box with a built-in search capability that you just push a button and it works, and it’s fast, and doesn’t need much care and feeding, and it’s internationalized, and it has the right API for when you want to get fancy.
Until that happens, I will happily recommend MySQL’s built in fulltext search indexing for quickly adding a relatively powerful search facility to a site. I use it on this blog and my only real criticism is that it insists on search words of at least 4 letters, which is less than ideal when most of your entries include TLAs. Hopefully they’ll provide a way around this limitation in a future release.
in mysql 4.0 and later, you can set the 'ft_min_word_len' variable in your my.cnf file. (as of 4.0.10, you can also set 'ft_stopword_file' to have it load the stopword list from an external file.) you'll have to rebuild the indexes after changing the setting. i believe the plan is to add support for setting these sorts of things per-index sometime post 5.0. (it requires the text-based table definition files that will be part of 5.1.)
more information about tuning full-text indexes can be found in the mysql reference manual.
jim winstead - 16th June 2003 16:50 - #
Scott Johnson - 16th June 2003 17:45 - #
Matt - 16th June 2003 19:29 - #
with 4.0 or later, you can use
MATCH (field) AGAINST ('base*' IN BOOLEAN MODE)to do partial-word matching. support for stemming is on the todo list.the full documentation for mysql's full-text searching is in the manual too, of course.
for stuff that isn't in a database already, mnogosearch is an excellent search engine, although the documentation is a bit rough.
(oh, and the disclaimer i should have added before is that i work for mysql ab, but am not speaking for them here.)
jim winstead - 16th June 2003 19:52 - #
The manual says . So, you can use it to do 'appl*' search and expecting 'apples', 'applied', 'application' shown on the result, but you can't do '*apple*' or any other 'apple' form with MySQL 4.0 fulltext search to find 'pineapple' entry.
Andronicus Riyono - 9th June 2004 08:18 - #