Greasemonkey etiquette
In Meme tracking with Greasemonkey, Jon Udell introduces a userscript which grabs the number of references from del.icio.us and bloglines and appends that information to the top of every page you visit. To be fair on Jon, the version he has released defaults to only doing this for pages on Infoworld.com but modifying it to run on every web page is trivial.
The obvious downside of this kind of script is the amount of additional web traffic it induces. Every page you load in your browser induces an extra HTTP request to both del.icio.us and bloglines. Times that by several hundred users and those sites are going to be serving thousands of requests every minute.
Can this kind of thing scale? The Google toolbar retrieves the PageRank for every page you view, and the Alexa toolbar (and thousands of spyware applications) request information for every page viewed as well. The difference is that the developers host their own servers, and are responsible for their own bandwidth bills.
There are also very serious privacy implications involved in this kind of activity. Right now, Joshua Schachter’s del.icio.us access logs are collecting a detailed history of Jon Udell’s browsing history—and that of anyone else using the script.
This is a frustrating quandry, because the technique used in Jon’s script can be extended in almost limitless ways. Sadly, in a world where bandwidth and server resources are limited such scripts must be approached with caution.
I'm having that exact problem with my project - a Firefox toolbar that gets a reputation for the site being browsed.
The asynchronous call doesn't really hurt the browser but it would certainly impact the server given enough users.
My solution is to distribute the servers - in my system the 'server' is simply a PHP script which serves a Web Service and provides ratings aggregated from its own table, and other trusted servers.
Installation of the server is just a single FTP copy, which will be facilitated by the toolbar. Users who have PHP-enabled webspace can therefore host their own rating/review servers, which their friends (or anyone) can subscribe to.
The social network grows through email recommendation - user A hosts a server and an option within his toolbar allows him to send an email to his friend user B in order to recommend my rating system. User B then downloads the Firefox toolbar (a .xpi) and then clicks a link in the email to configure their toolbar to source ratings from user A's server. Hence the social network grows --- hopefully!
Chris Beach - 11th April 2005 18:06 - #
A set of caching reverse proxies fronting del.icio.us would be one way of moving some of the load off. I wonder what Joshua would think about people throwing up caching proxies like this? He's been extremely liberal with issues of this nature in the past.
For example, I've just set up a reverse proxy (not caching yet) on my home machine:
http://delicious.naeblis.cx/
Most cable/DSL utilize very little of their upstream. It would be interesting to see if we could create some kind of caching-proxy-mesh for community tools like delicious.
Being able to support various types of caching intermediaries is one of the major design goals of HTTP after all.
Ryan Tomayko - 11th April 2005 18:17 - #
Figured I throw out the apache configuration needed to enable the reverse proxy:
You may also need to enable mod_proxy.
Ryan Tomayko - 11th April 2005 18:21 - #
Mark Wubben - 11th April 2005 18:43 - #
alan taylor - 11th April 2005 18:54 - #
Aaron - 11th April 2005 20:35 - #
pete - 12th April 2005 00:30 - #
Aaron - 12th April 2005 05:08 - #
pete - 12th April 2005 05:41 - #
Mark Wubben - 12th April 2005 11:50 - #
Mark, you can specifiy a user agent string in user scripts. Perhaps we should default it to something like <Firefox string> + "; Greasemonkey 0.2.6"?
Jeremy Dunck - 12th April 2005 13:07 - #
Mark Wubben - 12th April 2005 18:22 - #
Aaron - 13th April 2005 20:44 - #
I edited Jon's script on my machine to use the Coral network, such as:
var deliciousUrl = 'http://del.icio.us.nyud.net:8090/url/?url=' + currentPage;
Coral is peer-to-peer content distribution network, comprised of a world-wide network of web proxies and nameservers.
http://coralcdn.org/
It seems to be working, but obviously the caching system makes the results not as real-time.Brady Joslin - 17th April 2005 03:53 - #
splintor - 17th April 2005 08:15 - #
Simon Willison - 17th April 2005 11:58 - #
splintor - 20th April 2005 05:56 - #
Matthew A. Schneider - 21st April 2005 19:50 - #
http://udell.roninhouse.com/GroupwareReport.html
I've never used that method since, it's just too intrusive for most purposes. I like the dynamic way better, but until/unless a lot of people use publishing tools that implement the strategy, it seems like an uphill swim.Jon Udell - 22nd April 2005 13:08 - #
Matth - 10th May 2005 15:11 - #
Mark Russell - 11th May 2006 14:25 - #