Simon Willison’s Weblog

Subscribe

74 items tagged “search”

2009

Announcing the Article Search API. The most interesting API from the NYTimes yet—search against 2.8 million articles from 1981 until today using 35 searchable fields and get back detailed metadata as well as the first paragraph of the articles themselves. # 5th February 2009, 11:06 pm

solango. Another attempt at a Django/Solr integration library, based on code written for “a top 20 newspaper site” (I’d love to know which one). This is well documented, uses a registration model clearly inspired by the Django admin which keeps search related metadata out of your regular models and includes management commands for re-indexing and generating Solr schema.xml files. # 4th February 2009, 12:22 pm

All you ever wanted to know about writing bloom filters. This helped me understand a key use case for bloom filters: reducing the impact of the “worst case search is when there are no matching results so everything gets scanned” problem. # 30th January 2009, 8:26 am

2008

How-to: Full-text search in Google App Engine. Use search.SearchableModel instead of db.Model—it’s pretty rough at the moment which is probably why it’s still undocumented. # 27th June 2008, 8:25 am

Google AJAX Search API: Flash and Server Side Access. Over a year after Google shot down their SOAP Search API, they’ve quietly released a JSON based one under the guise of supporting “Flash and other non JavaScript environments”. Comes with the strange requirement that an HTTP referer be sent with every request; the API key is optional. # 22nd April 2008, 7:16 pm

In-Depth django-sphinx Tutorial. Another neat Django extension from the guys at Curse: easy integration with the sphinx full text search engine. # 5th March 2008, 12:03 am

pysolr. Python wrapper for Solr, the search web service wrapper for Lucene. One thing I’m not clear on: do you need to configure Solr with the fields you’ll be indexing in advance, or can Solr create new fields on the fly to match the data you send it? # 9th January 2008, 8:50 pm

2007

Opera 9.5 alpha, Kestrel, released. “With history search, Opera creates a full-text index of each and every page you visit, and when you go to the address bar, you can simply start entering words you know have been on pages you’ve visited before, and items matching your search show up.” I just tried this; it’s magic. I’m switching back to Opera from Camino. # 16th September 2007, 8:34 pm

django-sphinx (via) More code from Curse Gaming; this time a really nice API for adding Sphinx full-text search to a Django model. # 9th September 2007, 12:35 am

Grub. Jimmy Wales just announced at OSCON that Wikia have acquired Grub from LookSmart, and will be releasing it as open source. # 27th July 2007, 5:24 pm

Apache Solr 1.1. Solr is the search Web Service built on top of Lucene. The latest release introduces JSON, Python and Ruby response formats in addition to XML. # 13th January 2007, 1:16 am

2005

Giving away the index

My final year project is due in two weeks, and I’m going to be running on silent for most of them. I have, however, upgraded to Tiger and playing with Spotlight has given me plenty to think about.

[... 414 words]

Google cruft

New Google feature: Google Movies. Displays aggregated movie reviews (like Rotten Tomatoes), looks up local movie times based on your zip code saved in Google Local (more evidence of the fabled Google cookie), and even handles recommendations.

[... 120 words]

2004