Simon Willison’s Weblog

Items tagged search in Feb

Filters: Month: Feb × search ×


Discussion about Altavista on Hacker News. Fascinating thread on Hacker News where Bryant Durrell, a former Director from Altavista provides some insider thoughts on how they lost against Google. # 16th February 2019, 6:57 pm

elasticsearch: Percolator. Another fascinating elasticsearch feature: Percolator lets you register searches with your elasticsearch cluster, then pass in a document and have the matching query IDs returned. It’s an upside down search engine. I’m sure there are some very neat things you could build with this, I just haven’t figured out what they are just yet. # 8th February 2011, 11:16 pm

Search Engine Time Machine. Detailed explanation of how ElasticSearch provides high availability, through clever sharding and replication strategies and configurable gateways for long-term persistent storage. # 17th February 2010, 10:32 pm

ElasticSearch: Your Data, Your Search. A neat example of how ElasticSearch’s schemaless indexes and native JSON support make it ridiculously easy to index different types of data and run queries across them. # 12th February 2010, 3:22 pm

Elastic Search (via) Solr has competition! Like Solr, Elastic Search provides a RESTful JSON HTTP interface to Lucene. The focus here is on distribution, auto-sharding and high availability. It’s even easier to get started with than Solr, partly due to the focus on providing a schema-less document store, but it’s currently missing out on a bunch of useful Solr features (a web interface and faceting are the two that stand out). The high availability features look particularly interesting. UPDATE: I was incorrect, basic faceted queries are already supported. # 11th February 2010, 6:33 pm

django-springsteen and Distributed Search. Will Larson’s Django search library currently just talks to Yahoo! BOSS, but is designed to be extensible for other external search services. Interestingly, it uses threads to fire off several HTTP requests in parallel from within the Django view. # 25th February 2009, 10:28 pm

Xapian performance comparision with Whoosh. Whoosh appears to be around four times slower than Xapian for indexing and empty cache searches, but Xapian with a full cache blows Whoosh out of the water (5408 searches/second compared to 26.3). Considering how fast Xapian is, that’s still a pretty impressive result for the pure-Python Whoosh. # 14th February 2009, 1:15 pm

Whoosh. A brand new, pure-python full text indexing engine (think Lucene). Claims to offer performance in the same league as wrappers to C or Java libraries. If this works as well as it claims it will be an excellent tool for adding search to projects that wish to avoid a dependency on an external engine. # 12th February 2009, 12:49 pm

Introduction to Information Retrieval (via) This looks excellent—a modern guide to implementing search engines written by some of the engineers behind Yahoo! Search. The full text is available online, but it looks like it’s well worth investing in the dead tree edition. # 9th February 2009, 8:54 pm

Announcing the Article Search API. The most interesting API from the NYTimes yet—search against 2.8 million articles from 1981 until today using 35 searchable fields and get back detailed metadata as well as the first paragraph of the articles themselves. # 5th February 2009, 11:06 pm

solango. Another attempt at a Django/Solr integration library, based on code written for “a top 20 newspaper site” (I’d love to know which one). This is well documented, uses a registration model clearly inspired by the Django admin which keeps search related metadata out of your regular models and includes management commands for re-indexing and generating Solr schema.xml files. # 4th February 2009, 12:22 pm

Google cruft

New Google feature: Google Movies. Displays aggregated movie reviews (like Rotten Tomatoes), looks up local movie times based on your zip code saved in Google Local (more evidence of the fabled Google cookie), and even handles recommendations.

[... 120 words]