Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

More thoughts on RSS

I helped my girlfriend set up an RSS feed for her (home brewed) weblog last night. Explaining what RSS was was easy. Explaining what she needed to put in her feed took a little bit longer. All she needed to do was provide a feed of entries, each with a title, the full body of the entry, the date it was posted and a permalink to the archived entry. In working out how to do this, we identified the following steps:

  1. Pick an RSS version, out of RSS 0.9x, RSS 1.0 or RSS 2.0. The 0.9x series was out because there was no way of presenting the date of an entry, so it was between 1.0 and 2.0. 1.0 had all that unnecessary RDF stuff in (is there really a good reason for the <rdf:Seq> element?) so she went with RSS 2.0.
  2. Pick a date format, out of >dc:date< and <pubDate> (see Mark Pilgrim’s recent entry for information on the difference). <pubDate> won on the strength of being part of the core RSS 2.0 spec.
  3. Pick a way of serving the actual entries, from the following alternatives:
    1. Entity encoded HTML in the <description> element.
    2. Entity encoded HTML in a <content:encoded> element.
    3. Unencoded XHTML in a <body> element with a namespace.
    I advised her to go with the first option as I don’t know how good aggregator support for the other two is.
  4. Pick a way of providing the permalinks. Up until RSS 2.0 this would have been done with the nice and simple <link> element, but 2.0 introduced the more confusing <guid> element as well. The <link> element was chosen as the move obvious of the two.

It took less time to write the PHP than it did to decide on the elements to use in the feed in the first place. I’ve been following RSS for a while so I was in a position to guide her through the quagmire, but I can’t imagine it would have been much fun working it out without a guide.

I support the Road map for a new format. Another format may seem like the last thing we need but the current situation is pretty much untenable and forward motion has got to be a good thing. The few concerns I had have been answered by Tim Bray in his latest essay on the subject.

This is More thoughts on RSS by Simon Willison, posted on 25th June 2003.

View blog reactions

Next: Moving forward from Internet Explorer

Previous: Tom Gilder's blog

10 comments

  1. Quick question, Simon: should your link to Mark's "recent entry" point to this URL instead?

    Love your blog--thanks for the great posts, as always.

    Ethan Marcotte - 25th June 2003 02:32 - #

  2. Simon I think you might have steered her astray.

    pubDate being part of the RSS 2.0 core is just unfortunate legacy, and encoded HTML in the description field is an ugly hack.

    Support for content:encoded fields is pretty broad, and could probably be one of the litmus tests of an aggregator that took more then an afternoon to write.

    Its sort of like the choice between CSS and spacer gifs. Spacer gifs worked, but so nice to use something that someone actually *designed* and put some thought into, don't you know?

    Not that its worth changing it now that you've got running code, but a few sign posts for your next pass through the swamp :)

    kellan - 25th June 2003 04:06 - #

  3. Simon, you made basically all the same decisions I made when upgrading to 2.0 recently, and for the same reasons :)

    Only difference (besides the fact that I never considered using 1.0) was that I threw a guid in there anyway (with isPermaLink="false") so that in case I ever change the name of a post it won't screw aggregators up.

    Seriously, <description> and <content:encoded> are the same damn thing, and I tried <xhtml:body> but it turns out that if you just throw XHTML in there you can wind up with invalid XML because HTML defines entities that aren't recognized in XML. I thought namespaces were supposed to solve this problem?

    Keith - 25th June 2003 06:04 - #

  4. Keith, namespaces solve the problem of foreign elements and attributes, but entities must be separately defined. You'ld have to add a reference to the HTML character entity definitions to your RSS feed. (Note: I don't know how well current RSS parsers actually deal with this.)

    Check out the XHTML spec for the proper incantations.

    Of course, converting the character refs to numeric references is probably the most straight-forward solution.

    Jan - 25th June 2003 07:51 - #

  5. was chosen as the move obvious of the two

    This is a spelling mistake isn't it? (if not, don't blame me, i'm dutch :))

    And a sidenote: When I went to your homepage I got an error the first time and when I clicked on comments I got another error :s.

    Anne van Kesteren - 25th June 2003 09:59 - #

  6. I'd like to put together a FAQ to help people in this situation. In the past that would have started a flamewar maybe now people won't mind so much. BTW, I think you should use guid for the permalink, not link. I could explain that in a FAQ. Maybe that's a good place to start. Otherwise I think you made all the right decisions.

    Dave Winer - 25th June 2003 12:35 - #

  7. Simon, I put together a rather long-winded explanation of why I think guid is the way to go.

    Dave Winer - 25th June 2003 13:01 - #

  8. I would like to see <description> used as a synopsis, not as an excerpt. I do not think that <description> and <content:encoded> are the same thing; I think that the former should be used as a synopsis (this is in line with the RSS 2.0 spec), and the latter should be used to include the full entry, perhaps optionally.

    There are RSS producers who prefer not to include the full bodies of their entries, and there are RSS consumers who prefer not to use the full bodies of entries. Likewise, there are both RSS producers and consumers who want more than a description/synopsis/excerpt. I hope that the new syndication format accommodates both preferences.

    jacob - 25th June 2003 17:52 - #

  9. I agree with Jacob. Not only is entity encoding dumb from an XML standpoint, it's dumb from an aggregator standpoint. Why? Because the only aggregator I use is the Trillian Pro news plugin. It seems to handle different versions of RSS just fine, although it's only designed for 0.91. The problem is that it shows the description in a tooltip.

    Now to me, that makes perfect sense. A description of what's linked to should be shown in a tooltip, like using the title attribute in HTML on a link. So just imagine what entity-encoded feeds look like in this, especially long entries. Just now I've added your feed to it, to illustrate the point to myself. Every time I mouseover for just a little bit on a title (which I like doing, because feeds like the BBC ones provide extra information) I get a freaking huge tooltip. On this entry in particular it's 'interesting', because the tooltip takes up almost the full height of my screen and a good chunk of the width. That and, of course, the HTML isn't rendered.

    I also think I'm one of the few people to see use in the rdf:Seq element. If you're hand-rolling a feed, you don't have to worry about the order of the items (in theory) as long as the rdf:Seq is done right.

    Sean - 25th June 2003 18:48 - #

  10. (Yikes, the comments form was prefilled with Jacques Distler's info.)

    I don't really understand entity encoding the contents of <description>, either. Isn't that just a lot of unnecessary work for the RSS consumer? Wrapping everything in a CDATA block within <content:encoded> just makes the most sense to me, and unlike <xhtml:body>, it doesn't assume that XHTML is used by the weblogger. Isn't Mark Pilgrim using HTML 4.01, for instance? (Though perhaps I'm missing something.)

    jacob - 26th June 2003 08:06 - #

Comments are closed.

Previously hosted at http://simon.incutio.com/archive/2003/06/25/moreOnRss

A django site