Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

Greasemonkey etiquette

In Meme tracking with Greasemonkey, Jon Udell introduces a userscript which grabs the number of references from del.icio.us and bloglines and appends that information to the top of every page you visit. To be fair on Jon, the version he has released defaults to only doing this for pages on Infoworld.com but modifying it to run on every web page is trivial.

The obvious downside of this kind of script is the amount of additional web traffic it induces. Every page you load in your browser induces an extra HTTP request to both del.icio.us and bloglines. Times that by several hundred users and those sites are going to be serving thousands of requests every minute.

Can this kind of thing scale? The Google toolbar retrieves the PageRank for every page you view, and the Alexa toolbar (and thousands of spyware applications) request information for every page viewed as well. The difference is that the developers host their own servers, and are responsible for their own bandwidth bills.

There are also very serious privacy implications involved in this kind of activity. Right now, Joshua Schachter’s del.icio.us access logs are collecting a detailed history of Jon Udell’s browsing history—and that of anyone else using the script.

This is a frustrating quandry, because the technique used in Jon’s script can be extended in almost limitless ways. Sadly, in a world where bandwidth and server resources are limited such scripts must be approached with caution.

This is Greasemonkey etiquette by Simon Willison, posted on 11th April 2005.

Tagged ,

View blog reactions

Next: Greasemonkey FUD

Previous: Flickr without the Flash

21 comments

  1. I'm having that exact problem with my project - a Firefox toolbar that gets a reputation for the site being browsed.

    The asynchronous call doesn't really hurt the browser but it would certainly impact the server given enough users.

    My solution is to distribute the servers - in my system the 'server' is simply a PHP script which serves a Web Service and provides ratings aggregated from its own table, and other trusted servers.

    Installation of the server is just a single FTP copy, which will be facilitated by the toolbar. Users who have PHP-enabled webspace can therefore host their own rating/review servers, which their friends (or anyone) can subscribe to.

    The social network grows through email recommendation - user A hosts a server and an option within his toolbar allows him to send an email to his friend user B in order to recommend my rating system. User B then downloads the Firefox toolbar (a .xpi) and then clicks a link in the email to configure their toolbar to source ratings from user A's server. Hence the social network grows --- hopefully!

    Chris Beach - 11th April 2005 18:06 - #

  2. A set of caching reverse proxies fronting del.icio.us would be one way of moving some of the load off. I wonder what Joshua would think about people throwing up caching proxies like this? He's been extremely liberal with issues of this nature in the past.

    For example, I've just set up a reverse proxy (not caching yet) on my home machine:

    http://delicious.naeblis.cx/

    Most cable/DSL utilize very little of their upstream. It would be interesting to see if we could create some kind of caching-proxy-mesh for community tools like delicious.

    Being able to support various types of caching intermediaries is one of the major design goals of HTTP after all.

    Ryan Tomayko - 11th April 2005 18:17 - #

  3. Figured I throw out the apache configuration needed to enable the reverse proxy:

    RewriteEngine on
    
    RewriteCond %{HTTP_HOST}        ^delicious.naeblis.cx$
    RewriteRule ^(.*)                http://del.icio.us$1     [P,L]

    You may also need to enable mod_proxy.

    Ryan Tomayko - 11th April 2005 18:21 - #

  4. Tough question. Perhaps it can be considered okay to do this stuff if the functionality is exposed through an API, instead of screen scraping?

    Mark Wubben - 11th April 2005 18:43 - #

  5. I ran right up against this same thing when writing the bookmarklet for delicious linkbacks. I would have loved for it to be inline and automatic like Jon's user script, or to live in a sidebar, but made the choice to release it as a bookmarklet, so that it would always be user-initiated, not automatic. Not as slick or serendipitous or cool as Greasemonkey inline rendering, but maybe a little friendlier (no diss to Jon, his script as released is fine). I'm glad this discussion is taking place too.

    alan taylor - 11th April 2005 18:54 - #

  6. Also, note that greasemonkey has had GM_registerMenuCommand for quite some time now. It would be trivial to make Jon's script require bookmarklet like initiation, while still taking advantage of the cross-domain XMLHTTP and other features that GM offers.

    Aaron - 11th April 2005 20:35 - #

  7. Is anyone else getting "GM_xmlhttprequest is undefined" with that script? Also in response to Mark's comment , I think that the del.icio.us api specifies that any page that hits delicious one time per load is considered abusive. I think that would definitely apply in this case.

    pete - 12th April 2005 00:30 - #

  8. Pete, You need at least GM 0.2.6 to use that script. Which version do you have?

    Aaron - 12th April 2005 05:08 - #

  9. Thanks, that sorted it out.

    pete - 12th April 2005 05:41 - #

  10. Pete, yeah. However, in this case it's the user who's hitting delicious, not the page itself (even though it'll look that way to delicious). This brings up another issue I think: the page on which you run your userscript gets the blame for what it does.

    Mark Wubben - 12th April 2005 11:50 - #

  11. Mark, you can specifiy a user agent string in user scripts. Perhaps we should default it to something like <Firefox string> + "; Greasemonkey 0.2.6"?

    Jeremy Dunck - 12th April 2005 13:07 - #

  12. Jeremy, yeah, I think that would take away the responsibility issues. However, then the question becomes: do we want sites to be able to determine if it's a Greasemonkey script or not?

    Mark Wubben - 12th April 2005 18:22 - #

  13. "do we want sites to be able to determine if it's a Greasemonkey script or not?" Well, it will be up to the user script author to make this determination. My goal is to make GM itself completely invisible so that it cannot be blocked wholesale. If somebody writes a nasty script, that script should be blocked -- not all scripst the user may be running. On the whole, though, I think that people should add something distinctive to their XMLHTTP requests to identify themselves when creating user scripts.

    Aaron - 13th April 2005 20:44 - #

  14. I edited Jon's script on my machine to use the Coral network, such as:

    var deliciousUrl = 'http://del.icio.us.nyud.net:8090/url/?url=' + currentPage;

    Coral is peer-to-peer content distribution network, comprised of a world-wide network of web proxies and nameservers.

    http://coralcdn.org/

    It seems to be working, but obviously the caching system makes the results not as real-time.

    Brady Joslin - 17th April 2005 03:53 - #

  15. Just a small question: What the difference between setting an RSS feed to del.icio.us and having a user script that sends a request on every site I browse? The average user browse to say 20-50 sites a day, whereas a feed agregator sends a request to the server once a minute. I would a suume that a popular site that has RSS support is getting ready for a large amount of traffic. Adding a request from users on every site the browse should'nt make that much of a change.

    splintor - 17th April 2005 08:15 - #

  16. RSS feed readers default to polling once an hour, not once a minute.

    Simon Willison - 17th April 2005 11:58 - #

  17. Even so, Meme Tracking will only double this amount, not an exponential growth, so I don't think this should be a real problem.

    splintor - 20th April 2005 05:56 - #

  18. I noticed Jon Udell pointed to comment2 in one of his recent blogs[1]. I suspect he didn't realize he was utilizing Purple Numbers at the time. He likely saw the # and recognized it as a handle to get his readers back here. While the war has raged regarding the visibility, color, placement, and form of Purple Numbers, I can't help but (still) wonder whether he (and others) mightn't use Purple Numbers more often if they were in fact always visible. [1] http://weblog.infoworld.com/udell/2005/04/13.html# a1214

    Matthew A. Schneider - 21st April 2005 19:50 - #

  19. I did actually intend to point to comment2. But you're right, I had not noticed the Purple Numbers until I saw this comment. An example of an always-visible version of this idea is here:

    http://udell.roninhouse.com/GroupwareReport.html

    I've never used that method since, it's just too intrusive for most purposes. I like the dynamic way better, but until/unless a lot of people use publishing tools that implement the strategy, it seems like an uphill swim.

    Jon Udell - 22nd April 2005 13:08 - #

  20. Perhaps we'll have to have a situation where those offering web services require you to sign up for a token ID valid for a certain number of uses of the API, like Google does. Puts a bit of a roadblock on innovation but may be necessary to stop third-party tools that make abusive bandwidth demands.

    Matth - 10th May 2005 15:11 - #

  21. Why not access del.icio.us via coral?

    Mark Russell - 11th May 2006 14:25 - #

Comments are closed.

Previously hosted at http://simon.incutio.com/archive/2005/04/11/etiquette

A django site