Syndication formats – RSS 0.92, 2.0 & Atom 1.0
  1. 1. General info
    1. 1.1. Useful links
  2. 2. The basic XML
    1. 2.1. Standalone XML
  3. 3. Feed data
    1. 3.1. Channel information
    2. 3.2. Entry (story)
  4. 4. Atom constructs
    1. 4.1. Text construct
    2. 4.2. Person construct
    3. 4.3. The link element

This is a reference document I have made for myself in preparation to writing a universal PHP feed creator class. I have studied both RSS 0.92 & 2.0 specs and RFC 4287 that stands for The Atom Syndication Format (Atom 1.0).

General info

I’ll give a brief intro into the RSS versions here because they might be confusing at first glance (asides from Atom which has them straightforward).

There are 2 branches of the RSS format: RDF-like (0.90, 1.0 and 1.1) and XML-valid (0.91, 0.92 and 2.0). The first retains more of the original Netscape specification, RSS standing for RDF Site Summary. The second is a simplified, improved and XML-compliant format that is most popular among the two and Atom (not less than 50% of all feeds according to Wikipedia.

This article doesn’t cover the first branch but covers both RSS 0.92 and 2.0.

Useful links

But before we begin let me provide you with some useful links so you won’t have to look for these resources later:

The basic XML

All 3 feed formats are XML-compliant documents of which Atom even uses a XML namespace. Each XML document has one root element (naturally) which, in turn, also has one element (for RSS) or several elements (for Atom). Let us look at the table:

RSS 0.92
text/xml
xml<?xml version="1.0" encoding="utf-8"?>
<rss version="0.92">
  <channel> [FEED DATA] </channel>
</rss>
RSS 2.0
text/xml
xml<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel> [FEED DATA] </channel>
</rss>
Atom 1.0
application/atom+xml
xml<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  [FEED DATA]
</feed>
Atom-in-RSS
text/xml
xml<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/"
     xmlns:atom="http://www.w3.org/2005/Atom">
  <channel> [FEED DATA] </channel>
</rss>

There’s a confusion about RSS MIME type – some use application/rss+xml, some text/xml (like WordPress).

Standalone XML

Take a note that for extra bit of technicality you can include the standalone="yes" XmlDecl attribute for your RSS documents. In a nutshell, a standalone XML is a self-contained document with internal DTD. In other words, if a document has no formal DTD/namespaces it can be defined as standalone – although even if it does have them it can still be standalone as this attribute seems to be of recommendation to the parsing application.

In other words (can also be used with Atom-in-RSS):

xml<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0">
  ...
</rss>

This attribute doesn’t affect XML validation as far as I know.

Feed data

Now as we have the initial structure laid out for all 3 formats we can carry on the actual feed entries and differences between them; channel elements follow and after that we’ll see the entry elements.

Each element described below has an attached marker telling how many times it may/must be used:

Thick gray underline like here means that this has changed from the previous version.

Channel information

This section places nodes under the feed’s root node ([FEED DATA] in the examples above).

Note: string limits indicated here only apply to RSS 0.91 as RSS 0.92 has removed them.

RSS 0.92 RSS 2.0 Atom 1.0 Description
title REQ title REQ title REQ

The channel title; for RSS this is a plain-text field with encoded HTML entities while for Atom this is a Text construct that allows text or HTML.

description REQ description REQ subtitle OPT

The channel description; for RSS this is a plain-text field with encoded HTML entities while for Atom this is a Text construct that allows text or HTML.

RSS 0.91 imposes a maximum limit of 500 characters.

link REQ link REQ link rel=alternate 0+

Specifies the HTML page this feed belongs to. Note that Atom’s link is more complex than that of RSS.

xml<link>http://example.com/doc.html</link>
xml<link rel="alternate" title="Document title"
      type="text/html"
      href="http://example.com/doc.html" />
language
OPT (0.92)
REQ (0.91)
language OPT xml:lang OPT

Any Atom node, either channel’s or entry’s, may contain the xml:lang attribute.

The channel language. Language codes for RSS and Atom (RFC 3066) differ slightly but generally if you specify an ISO-639-2 (2-char long codes like «en») you’ll suit both.

xml<language>en-us</language>
xml<feed xml:lang="en">
xml:base OPT

Any Atom node, either channel’s or entry’s, may contain the xml:base attribute.

Specifies a base URL used to resolve all relative address under current node. This is useful not only feed-wise but also in elements such as entry content.
RSS doesn’t provide an alternative to this attribute and for this must always refer to addresses with their full URLs.

xml<feed xml:base="http://example.com">
image REQ image REQ icon, logo OPT (both)

This node must contain:

  • url (image URL);
  • title (for <img title="..." /> attribute, ≤ 100 chars for RSS 0.91);
  • link (linked page, e.g. site index)

This node may contain:

  • width (≤ 144, default is 88);
  • height (≤ 400, default is 31);
  • description (the spec says it’s meant for <a title="..."> attribute that the <img /> might be wrapped in) – must be ≤ 500 chars for RSS 0.91.
xml<image>
  <url>http://example.com/logo.png</url>
  <title>Example.com</title>
  <link>http://example.com/index.php</link>
  <width>50</width>
  <height>102</height>
  <description>An Example.com.</description>
</image>

Atom feeds may have two associated images – icon for a small glyph and logo for expanded site/channel/post image.

  • icon must be square (1:1) and «suitable for presentation at a small size» (spec);
  • logo must have the aspect ratio of 2:1 (twice as long as tall).

Unlike RSS Atom image elements simply contain the URL of an image file.

xml<icon>http://example.com/icon.png</icon>
<logo>http://example.com/logo.png</logo>
copyright OPT copyright OPT rights OPT

A copyright string; for RSS this is a plain-text field with encoded HTML entities while for Atom this is a Text construct that allows text or HTML. RSS 0.91 imposes a maximum limit of 100 characters.

Atom feeds may have none, one or more rights nodes.

managingEditor OPT managingEditor OPT author 0+

For RSS this is an e-mail and/or person’s name in the suggested format: e@mail.com (Name). For Atom this is a Person construct with several different fields.

Atom feeds may have none, one or more author nodes. See also RSS webMaster node.

webMaster OPT webMaster OPT

The format is identical to that of managingEditor above.

contributor 0+

Atom feeds may have any number of contributor nodes including none. This is a Person construct. See also author node.

rating OPT rating OPT

Channel rating in PICS notation (now superceded by POWDER).

lastBuildDate OPT lastBuildDate OPT updated REQ

The time of last channel update; RSS has this in RFC 822 format (W3C suggests RFC 1123) while Atom – in RFC 3339.

xml<lastBuildDate>
  Fri, 04 Nov 11 05:48:38 +0000
</lastBuildDate>
xml<updated>
  2011-11-04T05:48:38+00:00
</updated>
pubDate OPT pubDate OPT>

For RSS this specifies the next channel update date for feeds that update regularly (e.g. a newspaper coming out each Friday at 12:00 AM). The date is in RFC 822 format (W3C suggests RFC 1123).

skipDays, skipHours OPT skipDays, skipHours OPT

For RSS these nodes specify when a reader/aggregator is safe to not to do periodical feed fetches because the feed won’t change at the specified hours and/or days. Isn’t used by most aggregators.

The following snippet states that the feed doesn’t update on Sundays and Mondays:

xml<skipDays>
  <day>Sunday</day>
  <day>Monday</day>
</skipDays>

The following snippet states that the feed doesn’t update on 0 AM and on 11 PM:

xml<skipHours>
  <hour>0</hour>
  <hour>23</hour>
</skipHours>
generator OPT generator OPT

Unsupported by RSS 0.92; the latter may have two attributes (uri and version) and must have a text content.

xml<generator>
  UverseWiki R475
</generator>
xml<generator uri="http://uverse.i-forge.net/blog"
           version="R475">UverseBlog</generator>
textInput OPT textInput OPT

For RSS this node and its children specify an arbitrary input form – a search form, commenting, etc. Doesn’t seem to be supported by most aggregators.

This node requires 4 child nodes:

  • title – caption for the submit button (≤ 100 chars for RSS 0.91);
  • description – an expanded form description (≤ 500 chars for RSS 0.91);
  • name – the name for the input used when sending form data (≤ 20 chars for RSS 0.91);
  • link – the URL of handling script.
xml<textInput>
  <title>Search Example.com</title>
  <description>Search us</description>
  <name>q</name>
  <link>htp://example.com/search.php</link>
</textInput>
ttl OPT Time-to-live - value in minutes indicating how long this feed can be cached.
cloud OPT cloud OPT

A subscribing protocol on a cloud domain – see the official RSS 2.0 description. Should have 5 attributes: domain, port, path, registerProcedure and protocol (can be http-post, xml-rpc or soap).

xml<cloud domain="rpc.sys.com" port="80" path="/RPC2" protocol="xml-rpc"
       registerProcedure="myCloud.rssPleaseNotify" />
category 0+ category 0+

None, one or more channel categories; both are handled differently by RSS 2.0 and Atom.

For RSS 2.0 this is a node with an optional domain attribute; if it’s given then the node’s content is some kind of identifier used by that domain (e.g. a category ID); if it’s omitted then it’s a textual description separated by "/".

For Atom this is a childless node that must have term attribute and may have scheme (URL) and label (a human-readable caption if term isn’t enough).

xml<category domain="http://worldcat.com">
  100F-AA5R
</category>

<category>Personal/Vacation 2011</category>
xml<category scheme="http://example.com/cat/15"
          term="15" label="Cats" />
id REQ

Unique channel identifier (must be an IRI); the Atom specification designates that it is as unique and as permanent as possible and suggests an algorithm that should be used to achieve this.

Ideally, this identifier must not change even between different installations of a feed generator that took place on the same resource. For this an URI for the feed generator script itself can be used.

xml<id>http://example.com/feeds/atom.php</id>

Entry (story)

Each channel is describes using its information but of course the entire purpose of the feed is to bring stories (in RSS terminology – news, posts, etc.).

This is done by placing any number of <item> (RSS) or <entry> (Atom) elements under the channel root node ([FEED DATA] in the structure outline) – in other words, on the same level with the channel information.

RSS 0.91 had the limit of maximum 15 items in a feed – RSS 0.92 has removed it.

RSS 0.92 RSS 2.0 Atom 1.0 Description
title
OPT (0.92)
REQ (0.91)
title OPT title REQ

Entry title; for RSS this is a plain-text field with encoded HTML entities while for Atom this is a Text construct that allows text or HTML. RSS 0.91 imposes a maximum limit of 100 characters.

description
OPT (0.92)
REQ (0.91)
description OPT summary OPT

Entry synopsis; for RSS this is a plain-text field with encoded HTML entities while for Atom this is a Text construct that allows text or HTML. See also Atom content node.

xml<description>
  This node may contain
  &lt;em&gt;encoded&lt;/em&gt; HTML.
</description>

In RSS this field may contain encoded HTML that upon decoding will be used as such, as opposed to title and other fields that upon decoding will treat HTML entities as plain-text.
RSS 0.91 imposes a maximum limit of 500 characters.

link
OPT (0.92)
REQ (0.91)
link OPT link rel=alternate 0+

Specifies the URL of the full story (a web page).

Atom’s link is more complex than that of RSS – it allows you attaching documents, providing links to different representations (HTML, PDF, etc.) and more.

xml<link>
  http://example.com/entries/174.php
</link>
xml<link rel="alternate" title="PHP tricks"
      type="text/plain"
      href="http://example.com/entries/174.txt" />
author OPT author 0+

For RSS this is an e-mail and/or person’s name in the suggested format: e@mail.com (Name). For Atom this is a Person construct with several different fields.

Atom feeds may have none, one or more author nodes. The same element also exists in channel information. for details.

category 0+ category 0+ category 0+

None, one or more channel categories; both are handled differently by RSS 2.0 and Atom. The same element also exists in channel information – look here for details.

comments OPT link rel=related 0+

Address of the page where comments for this story can be viewed and posted. Atom’s link can be used to mimic RSS<comments> element.

xml<comments>
  http://example.com/153/comments
</comments>
xml<link rel="related" type="text/html" title="Comments"
      href="http://example.com/153/comments" />
enclosure OPT enclosure OPT link rel=enclosure 0+

An attachment – e.g. a song or a video clip. RSS’ node has 3 required attributes (url, length and MIME type) and is childless. See also Atom’s multipurpose link element.

xml<enclosure length="1845603" type="text/plain"
           url="http://example.com/log.txt" />
xml<link rel="enclosure" type="text/plain"
      href="http://example.com/log-03.txt"
      length="1845603" title="The log" />
guid OPT id REQ

Unique item identifier; Atom has this element for channel node as well while RSS – for items only.

This is usually the item’s permalink although the Atom standard doesn’t specifically tell this – look here for details.

xml<guid isPermalink="true">
  http://example.com/post/20110315-2240.html
</guid>

If isPermalink is omitted it’s assumed to be true. At first glance link and guid nodes serve the same purpose but it’s not necessary:

  • link is the address where the feed aggregator must go when user wants to read the full entry. This means that this page might be a synopsis of a very large article.
  • on the contrary, guid (assuming it isPermalink) always refers to the full blog post/magazine article/etc.

See also the last but one paragraph in the Comments section of the RSS 2.0 spec for the explanation.

xml<id>http://example.com/post/20110315.php</id>
pubDate OPT published OPT

The time on which the story was initially posted; RSS has this in RFC 822 format (W3C suggests RFC 1123) while Atom – in RFC 3339.
See also Atom’s updated node.

xml<pubDate>
  Sat, 05 Nov 11 08:58:41 +0000
</pubDate>
xml<published>
  2011-11-05T08:58:41+00:00
</published>
updated REQ

The time of last significant (from the publisher’s view) entry update. Atom has this field in RFC 3339 date/time format.
See also entry publication date.

source OPT source OPT source OPT

Reference to a channel that this story has come from. For RSS this is a simple node with one required attribute (url) and the text for the reference caption; for Atom this element has the same format as the entire channel – it just can’t have any <entry> nodes.

xml<source url="http://example.net/395.php">
  The Semees' Place
</source>
xml<source>
  <title>The Semees' Place</title>
  <icon>http://example.com/icon-32.png</icon>
  <rights type="html">Copyright &copy;</rights>
</source>
content OPT

This Atom-only node conveys a synopsis or an except from the entry in Text construct format that allows text or HTML. See also entity description node.

In Atom you can use xml:lang and xml:base attributes in any element to set specific content language and base URL for their children.

xml<content type="text">
  Plain &apos;text&apos;
</content>

<content type="application/rtf" xml:lang="ru">
  {\rtf Некий \b RTF\b0 -документ.}
</content>

<content type="application/zip" src="posts/20110315.zip" />

This is a Text construct that in addition to standard text, html and xhtml types allows any valid MIME type – in this case:

  • an extra attribute named src may be present to refer to a remote document; if src is present the content node must be childless;
  • if there’s no src attribute given the content node must contain the data with all entities encoded (< as &lt;, etc.).
rights 0+

Atom feeds may have any number of rights nodes containing copyright strings for current story in Text construct format that allows text or HTML. The same element also exists in channel information – look here for details.

contributor 0+

Atom feeds may have any number of contributor nodes containing information in Person construct format. The same element also exists in channel information – look here.

Atom constructs

Text construct

Text constructs are nodes with text attribute that can have 3 values (defaults to text):

text html xhtml
xml<title type="text">
  HTML chars like
  &lt; are quoted.
</title>

This node contains plain text so that even <tag>-looking strings that might happen to be encoded there are treated as-is and not as HTML or any other markup.

xml<title type="html">
  &quot;Quoted
  HTML&quot; here.
</title>

This node represents a proper HTML that the aggregator renders after unquoting this its value.

xml<summary type="xhtml">
  <div xmlns="http://www.w3.org/1999/xhtml">
    Unquoted XHTML <b>nodes</b> thanks to
    the <em>namespace</em>.
  </div>
</summary>

Since XHTML is a valid XML document its nodes can be embedded just like any other XML node.

Take a note that you can only have one Text Construct with the same tag name even if they have different types. In other words, you can’t have this:

xml<entry>
  <content type="text">Content in plain text form.</content>
  <content type="html">Some &lt;b&gt;HTML&lt;/b&gt; here.</content>
</entry>

Person construct

This node and its children describe a person. THere are 3 child elements: name (required), uri and email.

For example (taken from the Atom spec):

xml<author>
  <name>Mark Pilgrim</name>
  <uri>http://example.org/</uri>
  <email>f8dy@example.com</email>
</author>

The link element

This element is much like HTML <link /> tag. It’s a childless node the following attributes:

Take a node that as with any other node in an Atom XML feed <link /> can also have xml:lang (content language code) and xml:base (base URL) attributes:

Possible relations (the rel attribute):

alternate
Links an alternative representation – most often it’s the web page that this feed is generated for but it can also be a link to the RSS feed or a PDF version or something else.
<link rel="alternate" type="text/html" href="http://example.com/article.html" title="Article title" />
related
Some related document; let’s say we have a site reviwing CPU chips – then we can link to the manufacturers’ websites like this:
<link rel="related" type="text/html" href="http://amd.com" title="AMD (Advanced Micro Devices), Inc." />
self
The canonical link to the same feed being read:
<link rel="self" type="application/atom+xml" href="http://example.ocm/feeds/atom.php" />
enclosure
Attaches a file; for this rel the length attribute must be specified too:
<link rel="enclosure" type="image/png" href="screenshot.php" length="258409" />
via
An URL of an information provider; for example, a link to some press-release current entry/feed is about:
<link rel="via" type="text/html" href="http://example.com/press/20110212" />