Simon Willison on search-engines

12 posts tagged “search-engines”

2024

Google is the only search engine that works on Reddit now thanks to AI deal (via) This is depressing. As of around June 25th reddit.com/robots.txt contains this:

User-agent: *
Disallow: /

Along with a link to Reddit's Public Content Policy.

Is this a direct result of Google's deal to license Reddit content for AI training, rumored at $60 million? That's not been confirmed but it looks likely, especially since accessing that robots.txt using the Google Rich Results testing tool (hence proxied via their IP) appears to return a different file, via this comment, my copy here.

# 24th July 2024, 6:29 pm / google, reddit, search-engines, seo, ai, llms

2013

Why is site search so bad on most websites?

It’s not so much that site search is bad, it’s that your expectations have been raised enormously high by the incredible quality of search provided by search engines like Google.

[... 125 words]

5:28 pm / 3rd May 2013 / search, search-engines, quora

Do comments really count for SEO link building?

Most sensible commenting systems will put rel=nofollow on links to discourage comment spam, which will have a significant effect on SEO.

[... 35 words]

9:48 am / 1st January 2013 / search-engines, seo, quora

2012

Is there such thing as a specific multi-site search?

Yes—Blekko does exactly this, with its “slashtags” feature: http://help.blekko.com/index.php...

[... 27 words]

6:15 pm / 23rd June 2012 / search-engines, web-development, quora

Is there a free/open-source software source code search engine?

If you want to search through actual code in open source projects, GitHub search is fantastic https://github.com/search—e.g. here’s a search for all Ruby code that mentions oauth https://github.com/search?q=oaut...

[... 71 words]

3:59 pm / 20th June 2012 / open-source, search-engines, quora

How can you build a search engine for a website built in PHP/MySQL?

There are a bunch of options.

[... 310 words]

6:39 pm / 11th February 2012 / mysql, php, search-engines, sphinx-search, quora

What are the best events search engines?

Since I co-founded one I’m certainly not qualified to express an opinion on which ones are best, but here are a few of my favourites:

[... 233 words]

11:15 am / 13th January 2012 / events, marketing, search, search-engines, quora

What kind of publicly available search software is able to be purchased or used freely as part of a website, and how good is it?

There are plenty of good open source options—Solr is currently my favourite. It’s extremely powerful but you do need to do some programming on top of it—I use Django and Haystack to build the search UI on most of my projects.

[... 115 words]

1:45 pm / 6th January 2012 / search, search-engines, web-development, quora

2010

Who are major competitors to Solr?

ElasticSearch is a really interesting one—it’s the same underlying search library (Lucene) and the same integration model (an HTTP interface) but takes quite a different approach. It hasn’t been around for a long time but it looks very impressive: http://www.elasticsearch.com/

[... 95 words]

6:01 pm / 2nd September 2010 / apache, lucene, search, search-engines, solr, sphinx-search, xapian, quora

How do Solr, Lucene, Sphinx and Searchify compare?

Lucene is a Java library for creating and searching through a full text index. If you want to make use of it, you’ll need to write your own Java code that integrates with it.

[... 109 words]

2:14 pm / 26th August 2010 / databases, lucene, search, search-engines, solr, sphinx-search, web-development, quora

2009

Official Google Webmaster Blog: A proposal for making AJAX crawlable. It's horrible! The Google crawler would map url#!state to url?_escaped_fragment_=state, then expect your site to provide rendered HTML that reflects that state (they even go as far as to suggest running a headless browser within your web server to do this). Just stick to progressive enhancement instead, it's far less hideous. It looks like the proposal may have originated with the GWT team.

# 8th October 2009, 5:52 pm / ajax, crawling, google, gwt, javascript, progressive-enhancement, search-engines, seo

Specify your canonical. You can now use a link rel=“canonical” to tell Google that a page has a canonical URL elsewhere. I’ve run in to this problem a bunch of times—in some sites it really does make sense to have the same content shown in two different places—and this seems like a neat solution that could apply to much more than just metadata for external search engines.

# 14th February 2009, 11:28 am / canonical, google, metadata, relcanonical, search-engines, seo, urls