Simon Willison’s Weblog

Subscribe

Items tagged search in 2009

Filters: Year: 2009 × search × Sorted by date


Haystack 1.0 Final Released. I’ve used Haystack on a number of projects recently, and it has proved itself as a completely painless way of adding full-text search (using Solr or Whoosh—I haven’t tried the Xapian backend yet) to a Django ORM powered project in just a few minutes. Congratulations, Daniel + contributors. # 30th November 2009, 8:07 am

Large Problems in Django, Mostly Solved: Search. Eric Holscher shows how Haystack uses a number of common Django patterns (object registration, pluggable backends, QuerySet-style chaining and class-based views) to great effect in creating a powerful search application for Django. Makes me wonder if more of those patterns should be promoted to first class concepts within Django. # 3rd November 2009, 10:42 am

So’s your facet: Faceted global search for Mozilla Thunderbird. Yes! This is the kind of innovation I’ve been hoping would show up in e-mail clients for years. Faceting is a really natural fit for e-mail. # 4th September 2009, 10:29 am

Collection: Search Patterns. Peter Morville’s enormous collection of screenshots of search engine interfaces. # 30th July 2009, 12:35 pm

MongoDB—Capped Collections. Collections with a size limit that automatically expire older entries are interesting—useful for things like a “recent searches on this site” feature. # 7th June 2009, 12:50 pm

Haystack (via) A brand new modular search plugin for Django, by Daniel Lindsley. The interface is modelled after the Django ORM (complete with declarative classes for defining your search schema) and it ships with backends for both Solr and pure-python Whoosh, with more on the way. Excellent documentation. # 17th April 2009, 9:53 pm

Digg Search: Now With 99.987% Less Suck. Really nice implementation of faceted search, still using Lucene and Solr under the hood. # 10th April 2009, 10:17 pm

Sphinx 0.9.9-rc2 is out. Interesting new feature: the Sphinx search server now supports the MySQL binary protocol, so you can talk to it using a regular MySQL client library and fire off search queries using SELECT syntax and the new SphinxQL query language. # 8th April 2009, 1:59 pm

Guardian + Lucene = Similar Articles + Categorisation. Alf Eaton loaded 13,000 Guardian articles tagged Science in to Solr and Lucene and is using Solr’s MoreLikeThisHandler to find related articles and automatically apply Guardian tags to Nature News articles. # 11th March 2009, 12:53 pm

How search.twitter.com uses Varnish. Includes examples of the configuration options they use. # 2nd March 2009, 5:08 pm

django-springsteen and Distributed Search. Will Larson’s Django search library currently just talks to Yahoo! BOSS, but is designed to be extensible for other external search services. Interestingly, it uses threads to fire off several HTTP requests in parallel from within the Django view. # 25th February 2009, 10:28 pm

Xapian performance comparision with Whoosh. Whoosh appears to be around four times slower than Xapian for indexing and empty cache searches, but Xapian with a full cache blows Whoosh out of the water (5408 searches/second compared to 26.3). Considering how fast Xapian is, that’s still a pretty impressive result for the pure-Python Whoosh. # 14th February 2009, 1:15 pm

Whoosh. A brand new, pure-python full text indexing engine (think Lucene). Claims to offer performance in the same league as wrappers to C or Java libraries. If this works as well as it claims it will be an excellent tool for adding search to projects that wish to avoid a dependency on an external engine. # 12th February 2009, 12:49 pm

Introduction to Information Retrieval (via) This looks excellent—a modern guide to implementing search engines written by some of the engineers behind Yahoo! Search. The full text is available online, but it looks like it’s well worth investing in the dead tree edition. # 9th February 2009, 8:54 pm

Announcing the Article Search API. The most interesting API from the NYTimes yet—search against 2.8 million articles from 1981 until today using 35 searchable fields and get back detailed metadata as well as the first paragraph of the articles themselves. # 5th February 2009, 11:06 pm

solango. Another attempt at a Django/Solr integration library, based on code written for “a top 20 newspaper site” (I’d love to know which one). This is well documented, uses a registration model clearly inspired by the Django admin which keeps search related metadata out of your regular models and includes management commands for re-indexing and generating Solr schema.xml files. # 4th February 2009, 12:22 pm

All you ever wanted to know about writing bloom filters. This helped me understand a key use case for bloom filters: reducing the impact of the “worst case search is when there are no matching results so everything gets scanned” problem. # 30th January 2009, 8:26 am