Simon Willison’s Weblog

11 items tagged “searchengines”

Why is site search so bad on most websites?

It’s not so much that site search is bad, it’s that your expectations have been raised enormously high by the incredible quality of search provided by search engines like Google.

[... 125 words]

Do comments really count for SEO link building?

Most sensible commenting systems will put rel=nofollow on links to discourage comment spam, which will have a significant effect on SEO.

[... 35 words]

Is there such thing as a specific multi-site search?

Yes—Blekko does exactly this, with its “slashtags” feature:

[... 27 words]

Is there a free/open-source software source code search engine?

If you want to search through actual code in open source projects, GitHub search is fantastic—e.g. here’s a search for all Ruby code that mentions oauth

[... 71 words]

How can you build a search engine for a website built in PHP/MySQL?

There are a bunch of options.

[... 310 words]

What are the best events search engines?

Since I co-founded one I’m certainly not qualified to express an opinion on which ones are best, but here are a few of my favourites:

[... 233 words]

What kind of publicly available search software is able to be purchased or used freely as part of a website, and how good is it?

There are plenty of good open source options—Solr is currently my favourite. It’s extremely powerful but you do need to do some programming on top of it—I use Django and Haystack to build the search UI on most of my projects.

[... 115 words]

Who are major competitors to Solr?

ElasticSearch is a really interesting one—it’s the same underlying search library (Lucene) and the same integration model (an HTTP interface) but takes quite a different approach. It hasn’t been around for a long time but it looks very impressive:

[... 95 words]

How do Solr, Lucene, Sphinx and Searchify compare?

Lucene is a Java library for creating and searching through a full text index. If you want to make use of it, you’ll need to write your own Java code that integrates with it.

[... 109 words]

Official Google Webmaster Blog: A proposal for making AJAX crawlable. It’s horrible! The Google crawler would map url#!state to url?_escaped_fragment_=state, then expect your site to provide rendered HTML that reflects that state (they even go as far as to suggest running a headless browser within your web server to do this). Just stick to progressive enhancement instead, it’s far less hideous. It looks like the proposal may have originated with the GWT team. # 8th October 2009, 5:52 pm

Specify your canonical. You can now use a link rel=“canonical” to tell Google that a page has a canonical URL elsewhere. I’ve run in to this problem a bunch of times—in some sites it really does make sense to have the same content shown in two different places—and this seems like a neat solution that could apply to much more than just metadata for external search engines. # 14th February 2009, 11:28 am