mckinley.cc Home Blog Notes Twtxt

Atom vs. RSS

November 9th, 2022

Atom is the superior XML standard for syndication on the Internet. Atom is very well thought-out, offering all of the common features of RSS while improving on versatility and extensibility. Conversely, many of the most important features of RSS appear to have been duct-taped to the specification as an afterthought. When starting a new feed today, there is no reason to choose RSS over Atom.

Atom is part of the XML ecosystem

When comparing applications of XML, RSS stands out as the odd one out. It doesn't conform to many conventions or standards commonly used with XML.

Atom is very much part of the XML ecosystem. It has an XML namespace, making it easy to include Atom elements in non-Atom XML. It also uses standards like xml:lang and xml:base which provide a universal format for certain metadata.

The Atom specification even includes rules for applying standard XML cryptographic signatures and encryption to feeds and entries, although this is very uncommon in the wild.

Atom has an official media type

Although application/rss+xml is now widely understood to be the media type for an RSS feed and it's used by the RSS Advisory Board, it isn't officially registered with the IANA. This has led to some confusion in the past.

Atom's media type is application/atom+xml, which is registered with the IANA.

Atom has better timestamps

RSS uses RFC 822 timestamps (Fri, 04 Nov 2022 12:21:47 -0700) which are difficult to parse and sort. Atom uses RFC 3339 timestamps (2022-11-04T12:21:47-07:00), a much more modern format which makes things easier for developers and users alike.

RSS uses the same element ambiguously

The <description> element in RSS is either a "synopsis" of the item or full text of the item. There is no good way to tell which convention is followed by any given <description> element.

Additionally, the contents of the <description> element can either be entity-encoded HTML (which we'll talk about later) or not. Feed readers must probe the text contents of the element to see if it looks like HTML before rendering it as such.

This could have been been fixed very easily in the specification by defining an XML attribute to specify the content type, but RSS has no such attribute.

Atom solves both of these problems.

  1. The content and the summary have been given their own elements: <atom:content> and <atom:summary>.
  2. The <atom:content> element supports multiple content types, but has a type attribute making it completely unambiguous which one is used.

RSS has confusing defaults

RSS has a <link> element to specify a URL pointing to the item on the Internet. There is also a <guid> element which is a string that uniquely identifies the item.

By default, the <guid> element also represents a "permalink" to the item on the Internet. Users must add the isPermaLink="false" attribute to the <guid> element to stop this behavior.

The specification does not define which should be preferred by a feed reader if an item has both a <link> element and a <guid> element with a missing or true isPermaLink attribute.

RSS also does not require an item to have a unique identifier, forcing feed readers to handle deduplication of incoming items based on the contents of other elements. Elements like <description> and <title> could change at any time, leading to duplicated entries in feed readers.

Atom has the mandatory <atom:id> element which must contain an IRI (an internationalized URI). However, this element is never used as a link to the resource; it exists purely as a unique identifier that should never change.[1]

Atom's <atom:link> element is significantly more versatile than the <link> element in RSS. It has several attributes which can be used to specify relationships, languages, media types, and information sources, among other things.

  1. ^ Some people consider it bad practice to use a 'permalink' in this field, because permalinks don't tend to be very permanent. It is sometimes suggested to use a 'tag' URI instead. Alternatively, you could use a UUID presented as a URN using the assigned uuid namespace.

RSS can't handle relative URIs

The RSS specification does not specify the handling of relative URIs, e.g. /blog/20221109.html. This means that the HTML content of an item must be transformed, changing all relative URIs to absolute URIs.

Atom supports the xml:base attribute for resolving relative URIs.

Atom is better at dealing with (X)HTML

RSS handles HTML content poorly. It has to be escaped, i.e. put into a form that can't be mistaken as XML data.

A common method for this is entity-encoding, meaning all the special characters used in XML (<>&") are encoded as XML character entities. Entity-encoded HTML looks like the following.

&lt;a href=&quot;https://mckinley.cc/&quot;&gt;My Website&lt;a&gt;

Another method is to put the HTML into a CDATA block, instructing the XML parser to treat anything within as text.

<![CDATA[<a href="https://mckinley.cc/">My Website<a>]]>

These methods are ugly, inefficient, error-prone, and they make the content difficult to work with.[2] For example, Firefox-based browsers cannot properly display the content of an RSS feed when using a client-side XSLT stylesheet. The stylesheet can show the content on Chromium-based browsers if you use the disable-output-escaping attribute.

There has been an open bug report[3] about this since before the first release of Phoenix, the Web browser that would eventually become Firefox.[4]

Since not all HTML is valid XML, Atom still supports escaped HTML. However, if your HTML snippet is valid XHTML,[5] it can be inserted directly into the document tree, like so.

<content type="xhtml">
    <div xmlns="http://www.w3.org/1999/xhtml">
        <a href="https://mckinley.cc/">My Website<a>
    </div>
</content>

When the content is just part of the XML, it can be easily read and transformed like anything else.

  1. ^ In my opinion, entity-encoding HTML goes against the whole point of XML. It should just be XHTML instead. Here is another example of entity-encoded HTML in XML. That document is an absolutely terrible use of XML and every hard drive containing a copy of it should be incinerated.
  2. ^ If you want to watch a bunch of people argue about XSLT for 10 years and then forget about the whole thing, that bug report is a great read. At the very least, you should read Tony Marston's thoughts on the situation.
  3. ^ 384 days (1 year, 19 days) before Phoenix 0.1, to be exact. It wouldn't be called Firefox for 888 days (2 years, 5 months, 5 days).
  4. ^ It's not difficult to make your content conform to the XHTML specification, especially if you're using HTML 5. The content of my blog posts are XHTML so my Atom feed's XSLT stylesheet works everywhere.

Atom is now widely supported

Historically, there was a compatibility benefit to providing an RSS feed. Atom is much newer than RSS and took some time to attain the level of adoption had by its adversary. Now, Atom is probably supported in any feed reader your visitors are likely to be using.

Wikipedia has a list of 57 feed readers, of which 41 have been checked for Atom support. 39 out of those 41 support Atom. This is definitely not the most scientific analysis, but it still shows that Atom is supported in at least 39 of the most popular feed readers.

You can get all the benefits of the Atom format without any significant downside where compatibility is concerned.[6]

  1. ^ It's probably not a good idea to discontinue your RSS feed entirely if you're already providing one. When I switched to Atom for my blog, I decided to continue providing the RSS feed, but without the full content of my posts. I put up a notice in the feed to notify RSS users of the change.

References