Simon Willison’s Weblog

Subscribe

12 items tagged “lucene”

2020

Reducing search indexing latency to one second. Really detailed dive into the nuts and bolts of Twitter’s latest iteration of search indexing technology, including a great explanation of skip lists.

# 26th June 2020, 5:06 pm / data-structures, lucene, scaling, search, twitter

2010

Who are major competitors to Solr?

ElasticSearch is a really interesting one—it’s the same underlying search library (Lucene) and the same integration model (an HTTP interface) but takes quite a different approach. It hasn’t been around for a long time but it looks very impressive: http://www.elasticsearch.com/

[... 95 words]

How do Solr, Lucene, Sphinx and Searchify compare?

Lucene is a Java library for creating and searching through a full text index. If you want to make use of it, you’ll need to write your own Java code that integrates with it.

[... 109 words]

Which major companies are using Solr for search?

The Guardian newspaper uses Solr for its Open Platform Content API. http://www.guardian.co.uk/open-p...

[... 27 words]

[UPDATE] Spatial Search in Apache Lucene and Solr. Spacial search is finally coming (back) to Solr—trunk now supports sorting and boosting by distance.

# 20th July 2010, 6:28 pm / lucene, search, solr, recovered, spatialsearch

Elastic Search (via) Solr has competition! Like Solr, Elastic Search provides a RESTful JSON HTTP interface to Lucene. The focus here is on distribution, auto-sharding and high availability. It’s even easier to get started with than Solr, partly due to the focus on providing a schema-less document store, but it’s currently missing out on a bunch of useful Solr features (a web interface and faceting are the two that stand out). The high availability features look particularly interesting. UPDATE: I was incorrect, basic faceted queries are already supported.

# 11th February 2010, 6:33 pm / elasticsearch, http, java, json, lucene, rest, scaling, search, sharding, solr

2009

Digg Search: Now With 99.987% Less Suck. Really nice implementation of faceted search, still using Lucene and Solr under the hood.

# 10th April 2009, 10:17 pm / digg, facets, fulltext, lucene, search, solr

Guardian + Lucene = Similar Articles + Categorisation. Alf Eaton loaded 13,000 Guardian articles tagged Science in to Solr and Lucene and is using Solr’s MoreLikeThisHandler to find related articles and automatically apply Guardian tags to Nature News articles.

# 11th March 2009, 12:53 pm / alf-eaton, fulltext, guardian, lucene, morelikethishandler, naturenews, openplatform, search, solr

Whoosh. A brand new, pure-python full text indexing engine (think Lucene). Claims to offer performance in the same league as wrappers to C or Java libraries. If this works as well as it claims it will be an excellent tool for adding search to projects that wish to avoid a dependency on an external engine.

# 12th February 2009, 12:49 pm / fulltextindexing, lucene, open-source, python, search, whoosh

solango. Another attempt at a Django/Solr integration library, based on code written for “a top 20 newspaper site” (I’d love to know which one). This is well documented, uses a registration model clearly inspired by the Django admin which keeps search related metadata out of your regular models and includes management commands for re-indexing and generating Solr schema.xml files.

# 4th February 2009, 12:22 pm / django, lucene, python, search, solr

2008

pysolr. Python wrapper for Solr, the search web service wrapper for Lucene. One thing I’m not clear on: do you need to configure Solr with the fields you’ll be indexing in advance, or can Solr create new fields on the fly to match the data you send it?

# 9th January 2008, 8:50 pm / apache, lucene, pysolr, python, search, solr

2007

Apache Solr 1.1. Solr is the search Web Service built on top of Lucene. The latest release introduces JSON, Python and Ruby response formats in addition to XML.

# 13th January 2007, 1:16 am / json, lucene, python, ruby, search, solr, webservice, xml