Simon Willison’s Weblog

Pingback redux

I think I’ve worked out a way of implementing Pingback (or a Pingback-like system) without any need for XML-RPC, <link> elements or custom HTTP headers.

There are three principle reasons for using Pingback to “detect” a link to a page rather than relying on referrals:

  1. A referral from a blog is likely to come from that blog’s front page, whereas any link back to that content should target the permalink of the specific entry from which the incoming link was made.
  2. Pingbacks are deliberate—they show that the source of the Pingback is a deliberate response to the linked item and wishes to be listed as such.
  3. It is possible (although unlikely) for a link to remain undetected simply because that link has never been “clicked” by anyone.

Pingback solves these problems through “alert me if you link to me information” embedded in the HTTP headers / embedded metadata of a page, combined with a simple XML-RPC server for accepting alerts. While this solves the problems outlined above, the overhead of carrying out a Pingback is quite large and the implementation of the client / server is quite challenging. The system is also of no use at all unless both parties have Pingback installed.

My solution is an extension of my own Pingback implementation. Whenever I link to a site from my blog, a script running on my server requests each of the pages I have linked to and checks for information on a related Pingback server (this is standard behaviour for any conformant Pingback client). As a nod towards those users who do not have Pingback enabled, the script sends the permalink of the linking item as the Referer header, to ensure that their logs have at least one hit from the entry in question. It dawned upon me that if this single “hit” was identifiable as a Pingback probe the process could stop there—the target server would have the required information that “Page X linked to Page Y at Time T” and would be able to process the Pingback straight away. How to identify the hit? Two methods come to mind—the request could include an additional header (X-Pingback-Probe: yes) or the User-Agent string could include some standard string. Since some scripting languages (such as PHP) do not provide access to non-standard headers in the HTTP request, the second option seems immediately favourable.

Here is some outline PHP code for spotting and responding to my proposed “pingback-probe” requests:


if (isset($_SERVER['HTTP_REFERER']) && 
    $strpos($_SERVER['HTTP_USER_AGENT'], 'pingback-probe') !== false) {
    // User Agent contains 'pingback-probe' and referer information is present
    $linkFrom = $_SERVER['HTTP_REFERER']
    $linkTo = 'http://'.$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI'];
    if ($info = checkPingback($linkFrom, $linkTo)) {
        addPingback($linkFrom, $linkTo, $info['title'], $info['extract']);
    }
}

function checkPingback($linkFrom, $linkTo) {
    /* This function loads the $linkFrom page and checks that it really does 
       contain a link to $linkTo. If not, it returns false. If the link exists,
       it grabs the title of the page and an extract of text found surrounding 
       the $linkTo link and returns them in a small associative array. */
}

function addPingback($linkFrom, $linkTo, $title, $extract) {
    /* This function saves the pingback information, presumable by logging it 
       to a file or saving it to a database. It would almost certainly save 
       the time the Pingback was received as well. */
}

Pingback client implementations would say similar to the way they work now, except instead of having to retrieve the target page, check for Pingback server information and send an XML-RPC ping they would just have to send a single request with the specified referral and user agent information. Implementation is thus simpler for both client and server sides of the system, while keeping the required functionality.

This is Pingback redux by Simon Willison, posted on 24th February 2003.

Tagged

Next: Doing forms justice

Previous: Browser detection reconsidered

Previously hosted at http://simon.incutio.com/archive/2003/02/24/pingbackRedux