5.1 RSS

RSS, which stands for either Rich Site Summary or Really Simple Syndication (the latter term is the currently official term for RSS), is one of the most popular uses of XML currently. Many websites are adding RSS capability so web browsers Firefox, Internet Explorer 7, various syndication-tracking programs, and even MP3 players can check for the latest updates to a website without using HTML. RSS is reasonably simple to learn, and a great way to get better acquainted with "real world" XML. As RSS becomes more popular, there is a high demand for websites, particularly very large websites with a lot of server-side programs behind the scenes, to implement RSS feeds, which are individual XML documents, to keep up with the trend. Note, however, that not every site is right for RSS. RSS should only be used on sites that are driven by updates, for example, news sites, blogs, stores adding new products, or sites that update and add new content regularly. A site like Fred's Restaurant would be a silly place to run an RSS feed.

Historically, the idea of using XML to syndicate web content actually came from Microsoft. They created the Channel Definition Format, which was released with Internet Explorer 4 and used in conjunction with the "Active Desktop" feature. This feature was not widely used, partly because it was overcomplicated and had few features. One hassle was the need to create not one, not two, but three logo images to be displayed in the various favorites menus in IE4. The CDF vocabulary also did not carry much information to the user; instead it facilitated offline browsing. Microsoft submitted CDF to the W3C for consideration to becoming a recommendation in 1997, but nothing ever became of that. RSS improves upon CDF by carrying a short summary of the latest news items that can be read, in the case of Firefox, right from the bookmarks menu. The so-called "Live Bookmarks" display a list of the titles of updates, and you can go straight to the update that you find interesting.

RSS is another free standard, although it is not maintained by the W3C. It was developed by Dan Libby at Netscape in 1999, to purposely compete with CDF in Netscape's "My Netscape" portal. In 2003, RSS had gotten popular enough to gain its own standardizing body, the RSS Advisory Board. This book will cover the RSS 2.0.1 Specification.

A valid RSS file must first be a valid XML file, so be sure to follow all the rules of a well-formed XML document. The root element of an RSS document is, simply enough, rss.

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
</rss>

Note that RSS has no DOCTYPE tag. You can create an XML vocabulary without creating a Document Type Definition, although you are then unable to take advantage of the features of DTDs. The RSS Advisory Board chose to take this route, so there is no DTD for RSS. In its Netscape days, RSS did have a DTD, but it was phased out. Also note the version attribute on the rss element, that attribute is required.

Once you have the RSS tag in place, you can add a channel to the feed. There can only be one channel per RSS document, which leaves me wondering why channel was not chosen as the root element for RSS. Anyway, the channel element has no attributes, only children. There are only three child elements that are required in RSS:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
 <channel>
  <title>Name of Channel</title>
  <link>http://address.of/site</link>
  <description>Description of Channel</description>
 </channel>
</rss>

The title element contains the channel name, the link element is a link to the site which is being syndicated, and the description element describes the channel. Notice, already, how intuitive these element names are. This is what you should expect from any good XML design. If you go out and look at a CDF file, you will see an example of a very poor XML design. Many of the elements and attributes are arbitrary and do not make sense, especially in terms of nesting.

At this point, you have a description of a feed, but no content. This is not much of an RSS file, so why isn't more information required? Basically, this allows for a new site that does not have any content yet to start a feed right from the beginning. It would be a tad annoying to forbid a webmaster from creating an RSS feed and adding the information above until he has content to syndicate. This is another good design point.

To begin syndicating content, you add items. You should also add a lastBuildDate every time you update the feed, so a browser can just check the date to see if there has been any change. There is a specific format you must use for the date, which you can find referenced from the RSS 2.0.1 Specification.

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
 <channel>
  <title>Name of Channel</title>
  <link>http://address.of/site</link>
  <description>Description of Channel</description>
  <lastBuildDate>Mon, 1 Jan 2007 00:00:00 GMT</lastBuildDate>

  <item>

   <title>Welcome to my Site</title>
   <link>http://address.of/site/0107/welcome</link>
   <description>My site is now open. I will be syndicating my content with RSS.</description>

   <pubDate>Mon, 1 Jan 2007 01:00:00 GMT</pubDate>
   <guid isPermaLink="true">http://address.of/site/0107/welcome</guid>

  </item>

 </channel>
</rss>

I've spaced out the item to make it a bit easier to look at. It is never a bad idea to do the same in your own RSS or XML documents. As you can see, there is a title, link, and description, just like before. These describe the item in question, rather than the whole channel. RSS only requires that either title or description is present as a child of item, but I suggest that you always include title, as that is most often used by browsers. description may contain HTML only if it is escaped with entities (see chapter 3, HTML for information on entities). These are self-explanatory fields, but as a side note, it is best to be brief with titles and descriptions. Many RSS syndication programs display feeds in a narrow box, such as a favorites menu or sidebar, and longer titles and descriptions may get cut off.

guid is a unique identifier for the news article. This is used by syndication programs to determine whether it has seen an item or not. If you modify the information about an item, but the URL remains the same, by having a guid element the syndication program can detect that it has already seen this item, and will not present it as a new article. You should place a URL for that one item here, although any string is allowed. isPermaLink is set to true, indicating that the content of the element can be treated as a URL. This attribute is optional and true by default, so you may skip it, unless you do not want the content to be treated as a URL. pubDate is the publication date and follows the same formatting rules as lastBuildDate.

Multiple items may be present, and newer entries should come higher than older ones. Usually the order in which items appear in the RSS feed is the order in which they are presented to the user.

To make an RSS feed appear automatically when a webpage is loaded, add this tag to the header section of the HTML:

<link rel="alternate" type="application/rss+xml" title="RSS Feed Title"
href="rssfile.xml" />

Firefox and Internet Explorer 7 will display an orange icon to notify the user that an RSS feed is available.

5.2 Podcasting

Podcasting has got to be one of the most Apple-centric terms that has ever been coined on the internet. It conveys the notion that a podcast can be used only by the Apple iPod player. In reality, a podcast is just an ordinary RSS file, and it could be used by any audio player.

The podcast involves syndicating a feed from a synchronization device for a portable audio player, such as iTunes for the Apple iPod, and having it download and transfer new content whenever the feed is updated. iTunes expands on the RSS format by embedding extra information within the itunes namespace. I will provide an example of this momentarily.

The basic podcast consists of one extra element within each item: an enclosure.

...
  <item>
   <title>Barking Dog</title>
   <link>http://address.of/site/bark.mp3 </link>
   <description>My podcast is now live. Listen to this dog barking.</description>

   <pubDate>Mon, 1 Jan 2007 01:00:00 GMT</pubDate>
   <guid isPermaLink="true">http://address.of/site/bark.mp3</guid>

   <enclosure url="http://address.of/site/bark.mp3" length="1234567" type="audio/mpeg" />

  </item>
...

The enclosure is one of the very few empty tags in RSS. All three attributes are required: url with the URL of the audio file (or video, or any other type of file), length with its file size in bytes, and type with its MIME type. A MIME (Multipurpose Internet Mail Extensions) type is a categorized description of the type of file in use, and is standardized by the IETF. A great listing of MIME types can be found at W3Schools.

The iTunes extensions are added by declaring a namespace and tying it to the itunes prefix. A simple example of this is the itunes:explicit element, which causes a parental advisory icon to appear in the iTunes interface to flag explicit content. Its values are yes, no, or clean. To indicate that this audio stream is clean, the above example might be modified in this way:

<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" version="2.0">
...
  <item>
   <title>Barking Dog</title>
   <link>http://address.of/site/bark.mp3 </link>
   <description>My podcast is now live. Listen to this dog barking.</description>

   <pubDate>Mon, 1 Jan 2007 01:00:00 GMT</pubDate>
   <guid isPermaLink="true">http://address.of/site/bark.mp3</guid>

   <enclosure url="http://address.of/site/bark.mp3" length="1234567" type="audio/mpeg" />

   <itunes:explicit>clean</itunes:explicit>

  </item>
...

This is an example of an extension to an XML document through an alternate namespace. The RSS 2.0.1 standard has been "frozen" by the RSS Advisory Board. RSS recommends that extensions be developed using a similar namespace method. LiveJournal, a popular blogging site, has added elements lj:music and lj:mood as children of each item to represent its trademark music and mood indicators on each post. Most RSS readers just ignore this information, but should one ever be able to recognize it, it's there.

5.3 Chapter Review & Exercises

This chapter covered the syntax for the RSS vocabulary. You should now know how to create regular website syndication feeds as well as multimedia "podcasts."

  1. Find a site that is appropriate for RSS as described above, that does not have any RSS feed associated with it. Create an RSS feed that includes the three latest entries to this site. Then save the HTML file from the site and link to the new RSS feed. Try adding the feed to your Live Bookmarks in Firefox or Internet Explorer 7 and see how it behaves.

  2. Create a podcast for a weekly radio show. You can make up the information for the channel, and give each item a title and description. The MP3 files you need to load are as follows:

    Show1.mp3 8,465,134 bytes
    Show2.mp3 7,510,978 bytes
    Show3.mp3 9,219,035 bytes
    Show4.mp3 9,932,805 bytes

    Note that the first show is the oldest, and the fourth show is the most recent.