Simon Willison’s Weblog


22 items tagged “solr”


Who is the author of the Apache Solr project?

Solr has a lot of contributors, but I believe the project was originally created by Yonik Seeley while working at CNET.

[... 37 words]

How can I install apache solr?

Solr is a powerful full-text search engine wrapped up in a REST-ish web API. It needs to run as part of a web server—but the default download package includes Jetty which is definitely good enough for starting to try out the software (and I’d argue easily good enough to run in production).

[... 128 words]


Indexing JSON in Solr 3.1. The next release of Solr will support indexing documents provided as JSON—Solr currently requires incoming documents to be formatted as XML.

# 10th December 2010, 9:46 am / json, search, solr, xml, recovered

What is the best way to hire Solr developers?

Do you really need to hire a Solr specialist? It shouldn’t take a competent developer more than a few days to get familiar with Solr—the HTTP API is extremely easy to work with in my experience. You can always hire in a consultant from one of the companies that provide commercial Solr support for a few days to help your developers get up to scratch.

[... 82 words]

Who are major competitors to Solr?

ElasticSearch is a really interesting one—it’s the same underlying search library (Lucene) and the same integration model (an HTTP interface) but takes quite a different approach. It hasn’t been around for a long time but it looks very impressive:

[... 95 words]

How do Solr, Lucene, Sphinx and Searchify compare?

Lucene is a Java library for creating and searching through a full text index. If you want to make use of it, you’ll need to write your own Java code that integrates with it.

[... 109 words]

Which major companies are using Solr for search?

The Guardian newspaper uses Solr for its Open Platform Content API.

[... 27 words]

Which Solr app for Django is better: Haystack or django-solr-search (solango)?

I’d go with Haystack—while it supports multiple backends, I get the feeling Solr is the principle backend it was developed for. It’s extremely well documented in my opinion, and the SearchQuerySet API it gives you makes running low-level queries really easy if the higher level class-based view it provides don’t do quite what you want.

[... 109 words]

[UPDATE] Spatial Search in Apache Lucene and Solr. Spacial search is finally coming (back) to Solr—trunk now supports sorting and boosting by distance.

# 20th July 2010, 6:28 pm / lucene, search, solr, recovered, spatialsearch

What’s powering the Content API? The new Guardian Content API runs on Solr, scaled using EC2 and Solr replication and with a Scala web service layer sitting between Solr and the API’s end users.

# 24th May 2010, 2:08 pm / apis, contentapi, ec2, guardian, openplatform, scala, scaling, solr, recovered

Elastic Search (via) Solr has competition! Like Solr, Elastic Search provides a RESTful JSON HTTP interface to Lucene. The focus here is on distribution, auto-sharding and high availability. It’s even easier to get started with than Solr, partly due to the focus on providing a schema-less document store, but it’s currently missing out on a bunch of useful Solr features (a web interface and faceting are the two that stand out). The high availability features look particularly interesting. UPDATE: I was incorrect, basic faceted queries are already supported.

# 11th February 2010, 6:33 pm / elasticsearch, http, java, json, lucene, rest, scaling, search, sharding, solr

World Government Data. Launched last week, this is the Guardian’s meta-search engine for searching and browsing through data from four different government data sites (with more sites planned). Under the hood it’s Django, Solr, Haystack and the Scrapy crawling library. The application was built by Ben Firshman during an internship over Christmas.

# 27th January 2010, 12:27 pm / ben-firshman, data, datagovuk, django, guardian, haystack, projects, python, scrapy, solr

The Seven Deadly Sins of Solr. Useful advice on managing and deploying Solr.

# 24th January 2010, 1:30 pm / lucidimagination, search, solr


Haystack 1.0 Final Released. I’ve used Haystack on a number of projects recently, and it has proved itself as a completely painless way of adding full-text search (using Solr or Whoosh—I haven’t tried the Xapian backend yet) to a Django ORM powered project in just a few minutes. Congratulations, Daniel + contributors.

# 30th November 2009, 8:07 am / daniel-lindsley, django, haystack, python, search, solr, whoosh

NASA NEBULA Services (via) NASA’s new NEBULA cloud computing platform appears to be built entirely on open source infrastructure, including Python, Django, Fabric, Eucalyptus, RabbitMQ, Trac and Solr.

# 28th July 2009, 12:10 pm / cloud-computing, django, eucalyptus, fabric, nasa, nebula, open-source, python, rabbitmq, solr, trac

Installing Django, Solr, Varnish and Supervisord with Buildout. Useful, detailed instructions... but I still think this stuff is Way Too Difficult at the moment. I’m a big fan of the idea of sites that are assembled from multiple smaller web services talking HTTP to each other, but ensuring all the moving parts stay running is massively more painful than just running Apache and MySQL.

# 7th June 2009, 1:54 pm / apache, bertrandmathieu, buildout, django, mysql, operations, python, rest, solr, supervisord, sysadmin, varnish

Haystack (via) A brand new modular search plugin for Django, by Daniel Lindsley. The interface is modelled after the Django ORM (complete with declarative classes for defining your search schema) and it ships with backends for both Solr and pure-python Whoosh, with more on the way. Excellent documentation.

# 17th April 2009, 9:53 pm / daniel-lindsley, django, haystack, orm, python, search, solr, whoosh

Digg Search: Now With 99.987% Less Suck. Really nice implementation of faceted search, still using Lucene and Solr under the hood.

# 10th April 2009, 10:17 pm / digg, facets, fulltext, lucene, search, solr

Guardian + Lucene = Similar Articles + Categorisation. Alf Eaton loaded 13,000 Guardian articles tagged Science in to Solr and Lucene and is using Solr’s MoreLikeThisHandler to find related articles and automatically apply Guardian tags to Nature News articles.

# 11th March 2009, 12:53 pm / alf-eaton, fulltext, guardian, lucene, morelikethishandler, naturenews, openplatform, search, solr

solango. Another attempt at a Django/Solr integration library, based on code written for “a top 20 newspaper site” (I’d love to know which one). This is well documented, uses a registration model clearly inspired by the Django admin which keeps search related metadata out of your regular models and includes management commands for re-indexing and generating Solr schema.xml files.

# 4th February 2009, 12:22 pm / django, lucene, python, search, solr


pysolr. Python wrapper for Solr, the search web service wrapper for Lucene. One thing I’m not clear on: do you need to configure Solr with the fields you’ll be indexing in advance, or can Solr create new fields on the fly to match the data you send it?

# 9th January 2008, 8:50 pm / apache, lucene, pysolr, python, search, solr


Apache Solr 1.1. Solr is the search Web Service built on top of Lucene. The latest release introduces JSON, Python and Ruby response formats in addition to XML.

# 13th January 2007, 1:16 am / json, lucene, python, ruby, search, solr, webservice, xml