This is a reference document I have made for myself in preparation to writing a universal PHP feed creator class. I have studied both RSS 0.92 & 2.0 specs and RFC 4287 that stands for The Atom Syndication Format (Atom 1.0).
I’ll give a brief intro into the RSS versions here because they might be confusing at first glance (asides from Atom which has them straightforward).
There are 2 branches of the RSS format: RDF-like (0.90, 1.0 and 1.1) and XML-valid (0.91, 0.92 and 2.0). The first retains more of the original Netscape specification, RSS standing for RDF Site Summary. The second is a simplified, improved and XML-compliant format that is most popular among the two and Atom (not less than 50% of all feeds according to Wikipedia.
This article doesn’t cover the first branch but covers both RSS 0.92 and 2.0.
But before we begin let me provide you with some useful links so you won’t have to look for these resources later:
All 3 feed formats are XML-compliant documents of which Atom even uses a XML namespace. Each XML document has one root element (naturally) which, in turn, also has one element (for RSS) or several elements (for Atom). Let us look at the table:
RSS 0.92
text/xml |
xml<?xml version="1.0" encoding="utf-8"?> <rss version="0.92"> <channel> [FEED DATA] </channel> </rss> |
---|---|
RSS 2.0
text/xml |
xml<?xml version="1.0" encoding="utf-8"?> <rss version="2.0"> <channel> [FEED DATA] </channel> </rss> |
Atom 1.0
application/atom+xml |
xml<?xml version="1.0" encoding="utf-8"?> <feed xmlns="http://www.w3.org/2005/Atom"> [FEED DATA] </feed> |
Atom-in-RSS
text/xml |
xml<?xml version="1.0" encoding="utf-8"?> <rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/" xmlns:atom="http://www.w3.org/2005/Atom"> <channel> [FEED DATA] </channel> </rss> |
There’s a confusion about RSS MIME type – some use application/rss+xml, some text/xml (like WordPress).
Take a note that for extra bit of technicality you can include the standalone="yes" XmlDecl attribute for your RSS documents. In a nutshell, a standalone XML is a self-contained document with internal DTD. In other words, if a document has no formal DTD/namespaces it can be defined as standalone – although even if it does have them it can still be standalone as this attribute seems to be of recommendation to the parsing application.
In other words (can also be used with Atom-in-RSS):
xml<?xml version="1.0" encoding="utf-8" standalone="yes"?> <rss version="2.0"> ... </rss>
This attribute doesn’t affect XML validation as far as I know.
Now as we have the initial structure laid out for all 3 formats we can carry on the actual feed entries and differences between them; channel elements follow and after that we’ll see the entry elements.
Each element described below has an attached marker telling how many times it may/must be used:
Thick gray underline like here means that this has changed from the previous version.
This section places nodes under the feed’s root node ([FEED DATA] in the examples above).
Note: string limits indicated here only apply to RSS 0.91 as RSS 0.92 has removed them.
RSS 0.92 | RSS 2.0 | Atom 1.0 | Description |
---|---|---|---|
title REQ | title REQ | title REQ |
The channel title; for RSS this is a plain-text field with encoded HTML entities while for Atom this is a Text construct that allows text or HTML. |
description REQ | description REQ | subtitle OPT |
The channel description; for RSS this is a plain-text field with encoded HTML entities while for Atom this is a Text construct that allows text or HTML. |
link REQ | link REQ | link rel=alternate 0+ |
Specifies the HTML page this feed belongs to. Note that Atom’s link is more complex than that of RSS. |
xml<link>http://example.com/doc.html</link> |
xml<link rel="alternate" title="Document title" type="text/html" href="http://example.com/doc.html" /> |
||
language OPT (0.92) REQ (0.91) |
language OPT |
xml:lang OPT
Any Atom node, either channel’s or entry’s, may contain the xml:lang attribute. |
The channel language. Language codes for RSS and Atom (RFC 3066) differ slightly but generally if you specify an ISO-639-2 (2-char long codes like «en») you’ll suit both. |
xml<language>en-us</language> |
xml<feed xml:lang="en"> |
||
— | — |
xml:base OPT
Any Atom node, either channel’s or entry’s, may contain the xml:base attribute. |
Specifies a base URL used to resolve all relative address under current node. This is
useful not only feed-wise but also in elements such as entry content.
|
xml<feed xml:base="http://example.com"> |
|||
image REQ | image REQ | icon, logo OPT (both) | |
xml<image> <url>http://example.com/logo.png</url> <title>Example.com</title> <link>http://example.com/index.php</link> <width>50</width> <height>102</height> <description>An Example.com.</description> </image> |
Atom feeds may have two associated images – icon for a small glyph and logo for expanded site/channel/post image.
Unlike RSS Atom image elements simply contain the URL of an image file. xml<icon>http://example.com/icon.png</icon> <logo>http://example.com/logo.png</logo> |
||
copyright OPT | copyright OPT | rights OPT |
A copyright string; for RSS this is a plain-text field with encoded HTML entities while for Atom this is a Text construct that allows text or HTML. RSS 0.91 imposes a maximum limit of 100 characters. |
managingEditor OPT | managingEditor OPT | author 0+ |
For RSS this is an e-mail and/or person’s name in the suggested format: e@mail.com (Name). For Atom this is a Person construct with several different fields. Atom feeds may have none, one or more author nodes. See also RSS webMaster node. |
webMaster OPT | webMaster OPT | — |
The format is identical to that of managingEditor above. |
— | — | contributor 0+ |
Atom feeds may have any number of contributor nodes including none. This is a Person construct. See also author node. |
rating OPT | rating OPT | — | |
lastBuildDate OPT | lastBuildDate OPT | updated REQ |
The time of last channel update; RSS has this in RFC 822 format (W3C suggests RFC 1123) while Atom – in RFC 3339. |
xml<lastBuildDate> Fri, 04 Nov 11 05:48:38 +0000 </lastBuildDate> |
xml<updated> 2011-11-04T05:48:38+00:00 </updated> |
||
pubDate OPT | pubDate OPT> | — |
For RSS this specifies the next channel update date for feeds that update regularly (e.g. a newspaper coming out each Friday at 12:00 AM). The date is in RFC 822 format (W3C suggests RFC 1123). |
skipDays, skipHours OPT | skipDays, skipHours OPT | — |
For RSS these nodes specify when a reader/aggregator is safe to not to do periodical feed fetches because the feed won’t change at the specified hours and/or days. Isn’t used by most aggregators. |
The following snippet states that the feed doesn’t update on Sundays and Mondays: xml<skipDays> <day>Sunday</day> <day>Monday</day> </skipDays> |
The following snippet states that the feed doesn’t update on 0 AM and on 11 PM: xml<skipHours> <hour>0</hour> <hour>23</hour> </skipHours> |
||
— | generator OPT | generator OPT |
Unsupported by RSS 0.92; the latter may have two attributes (uri and version) and must have a text content. |
xml<generator> UverseWiki R475 </generator> |
xml<generator uri="http://uverse.i-forge.net/blog" version="R475">UverseBlog</generator> |
||
textInput OPT | textInput OPT | — |
For RSS this node and its children specify an arbitrary input form – a search form, commenting, etc. Doesn’t seem to be supported by most aggregators. |
This node requires 4 child nodes:
|
xml<textInput> <title>Search Example.com</title> <description>Search us</description> <name>q</name> <link>htp://example.com/search.php</link> </textInput> |
||
— | ttl OPT | — | Time-to-live - value in minutes indicating how long this feed can be cached. |
cloud OPT | cloud OPT | — |
A subscribing protocol on a cloud domain – see the official RSS 2.0 description. Should have 5 attributes: domain, port, path, registerProcedure and protocol (can be http-post, xml-rpc or soap). |
xml<cloud domain="rpc.sys.com" port="80" path="/RPC2" protocol="xml-rpc" registerProcedure="myCloud.rssPleaseNotify" /> |
|||
— | category 0+ | category 0+ |
None, one or more channel categories; both are handled differently by RSS 2.0 and Atom. For RSS 2.0 this is a node with an optional domain attribute; if it’s given then the node’s content is some kind of identifier used by that domain (e.g. a category ID); if it’s omitted then it’s a textual description separated by "/". For Atom this is a childless node that must have term attribute and may have scheme (URL) and label (a human-readable caption if term isn’t enough). |
xml<category domain="http://worldcat.com"> 100F-AA5R </category> <category>Personal/Vacation 2011</category> |
xml<category scheme="http://example.com/cat/15" term="15" label="Cats" /> |
||
— | — | id REQ |
Unique channel identifier (must be an IRI); the Atom specification designates that it is as unique and as permanent as possible and suggests an algorithm that should be used to achieve this. Ideally, this identifier must not change even between different installations of a feed generator that took place on the same resource. For this an URI for the feed generator script itself can be used. |
xml<id>http://example.com/feeds/atom.php</id> |
Each channel is describes using its information but of course the entire purpose of the feed is to bring stories (in RSS terminology – news, posts, etc.).
This is done by placing any number of <item> (RSS) or <entry> (Atom) elements under the channel root node ([FEED DATA] in the structure outline) – in other words, on the same level with the channel information.
RSS 0.91 had the limit of maximum 15 items in a feed – RSS 0.92 has removed it.
RSS 0.92 | RSS 2.0 | Atom 1.0 | Description |
---|---|---|---|
title OPT (0.92) REQ (0.91) |
title OPT | title REQ |
Entry title; for RSS this is a plain-text field with encoded HTML entities while for Atom this is a Text construct that allows text or HTML. RSS 0.91 imposes a maximum limit of 100 characters. |
description OPT (0.92) REQ (0.91) |
description OPT | summary OPT |
Entry synopsis; for RSS this is a plain-text field with encoded HTML entities while for Atom this is a Text construct that allows text or HTML. See also Atom content node. |
xml<description> This node may contain <em>encoded</em> HTML. </description>
In RSS this field may contain encoded HTML that upon decoding will be used as
such, as opposed to title and other fields that upon decoding will treat HTML
entities as plain-text.
|
|||
link OPT (0.92) REQ (0.91) |
link OPT | link rel=alternate 0+ |
Specifies the URL of the full story (a web page). Atom’s link is more complex than that of RSS – it allows you attaching documents, providing links to different representations (HTML, PDF, etc.) and more. |
xml<link> http://example.com/entries/174.php </link> |
xml<link rel="alternate" title="PHP tricks" type="text/plain" href="http://example.com/entries/174.txt" /> |
||
— | author OPT | author 0+ |
For RSS this is an e-mail and/or person’s name in the suggested format: e@mail.com (Name). For Atom this is a Person construct with several different fields. Atom feeds may have none, one or more author nodes. The same element also exists in channel information. for details. |
category 0+ | category 0+ | category 0+ |
None, one or more channel categories; both are handled differently by RSS 2.0 and Atom. The same element also exists in channel information – look here for details. |
— | comments OPT | link rel=related 0+ |
Address of the page where comments for this story can be viewed and posted. Atom’s link can be used to mimic RSS’ <comments> element. |
xml<comments> http://example.com/153/comments </comments> |
xml<link rel="related" type="text/html" title="Comments" href="http://example.com/153/comments" /> |
||
enclosure OPT | enclosure OPT | link rel=enclosure 0+ |
An attachment – e.g. a song or a video clip. RSS’ node has 3 required attributes (url, length and MIME type) and is childless. See also Atom’s multipurpose link element. |
xml<enclosure length="1845603" type="text/plain" url="http://example.com/log.txt" /> |
xml<link rel="enclosure" type="text/plain" href="http://example.com/log-03.txt" length="1845603" title="The log" /> |
||
— | guid OPT | id REQ |
Unique item identifier; Atom has this element for channel node as well while RSS – for items only. This is usually the item’s permalink although the Atom standard doesn’t specifically tell this – look here for details. |
xml<guid isPermalink="true"> http://example.com/post/20110315-2240.html </guid> If isPermalink is omitted it’s assumed to be true. At first glance link and guid nodes serve the same purpose but it’s not necessary:
See also the last but one paragraph in the Comments section of the RSS 2.0 spec for the explanation. |
xml<id>http://example.com/post/20110315.php</id> |
||
— | pubDate OPT | published OPT |
The time on which the story was initially posted; RSS has this in
RFC 822 format (W3C suggests RFC 1123) while Atom – in
RFC 3339.
|
xml<pubDate> Sat, 05 Nov 11 08:58:41 +0000 </pubDate> |
xml<published> 2011-11-05T08:58:41+00:00 </published> |
||
— | — | updated REQ |
The time of last significant (from the publisher’s view) entry update. Atom has this
field in RFC 3339 date/time format.
|
source OPT | source OPT | source OPT |
Reference to a channel that this story has come from. For RSS this is a simple node with one required attribute (url) and the text for the reference caption; for Atom this element has the same format as the entire channel – it just can’t have any <entry> nodes. |
xml<source url="http://example.net/395.php"> The Semees' Place </source> |
xml<source> <title>The Semees' Place</title> <icon>http://example.com/icon-32.png</icon> <rights type="html">Copyright ©</rights> </source> |
||
— | — | content OPT |
This Atom-only node conveys a synopsis or an except from the entry in Text construct format that allows text or HTML. See also entity description node. In Atom you can use xml:lang and xml:base attributes in any element to set specific content language and base URL for their children. |
xml<content type="text"> Plain 'text' </content> <content type="application/rtf" xml:lang="ru"> {\rtf Некий \b RTF\b0 -документ.} </content> <content type="application/zip" src="posts/20110315.zip" /> This is a Text construct that in addition to standard text, html and xhtml types allows any valid MIME type – in this case:
|
|||
— | — | rights 0+ |
Atom feeds may have any number of rights nodes containing copyright strings for current story in Text construct format that allows text or HTML. The same element also exists in channel information – look here for details. |
— | — | contributor 0+ |
Atom feeds may have any number of contributor nodes containing information in Person construct format. The same element also exists in channel information – look here. |
Text constructs are nodes with text attribute that can have 3 values (defaults to text):
text | html | xhtml |
---|---|---|
xml<title type="text"> HTML chars like < are quoted. </title> This node contains plain text so that even <tag>-looking strings that might happen to be encoded there are treated as-is and not as HTML or any other markup. |
xml<title type="html"> "Quoted HTML" here. </title> This node represents a proper HTML that the aggregator renders after unquoting this its value. |
xml<summary type="xhtml"> <div xmlns="http://www.w3.org/1999/xhtml"> Unquoted XHTML <b>nodes</b> thanks to the <em>namespace</em>. </div> </summary> Since XHTML is a valid XML document its nodes can be embedded just like any other XML node. |
Take a note that you can only have one Text Construct with the same tag name even if they have different types. In other words, you can’t have this:
xml<entry> <content type="text">Content in plain text form.</content> <content type="html">Some <b>HTML</b> here.</content> </entry>
This node and its children describe a person. THere are 3 child elements: name (required), uri and email.
For example (taken from the Atom spec):
xml<author> <name>Mark Pilgrim</name> <uri>http://example.org/</uri> <email>f8dy@example.com</email> </author>
This element is much like HTML <link /> tag. It’s a childless node the following attributes:
Take a node that as with any other node in an Atom XML feed <link /> can also have xml:lang (content language code) and xml:base (base URL) attributes:
Possible relations (the rel attribute):