It no secret that RSS 2.0 format contains many questionable design decisions that where developed with an even more questionable approach. Guidelines for developing namespaces are virtually non-existent in the current documentation. The default namespace cannot be declared. Entity encoded HTML is not only permitted (and evil) but encouraged. (My thoughts here.) Overlapping tags such as language and webmaster are not depreciated and no guidelines are given to precedence between these tags and their modular equivalents. Then there is my favorite, all tags within an item are completely optional -- either a title or description must be present. The list goes on and is quite long. The RSS 2.0 design team format had the opportunity to rectify these issues and failed to do so. In fact in some cases it made the situation worse.
Some will argue that RSS's current design is advantageous as they provider publishers to choose what suits them. I disagree because they generally undermine predictability for consumers impeding them with a plethora of variables that seem to change daily. This is unfortunate because developers of RSS consumer software, such as aggregators or toolkits, will be forced to repeatedly put effort into addressing these odd, yet "legal," variants. This effort would better serve the community and the proliferation of RSS if it where focused on advancing RSS applications. I believe in flexibility and evolable formats to a degree, but these issue cross that line into design flaws.
Recently, while testing the latest release of my RSS Feed plugin for MovableType, I found a feed that uses a guid tag instead of the commonplace link tag most software expects will be present. Perfectly legal, but it breaks existing software. So what is the point?
Brent Simmons asked whether RSS feeds should allow relative hyperlink in embedded HTML and if so what should an aggregator like his NetNewsWire do? The general consensus was that hyperlinks should be absolute, but there where those who disagreed and felt like browsers, the aggregators should resolve relative URLs. The lack of documentation leaves the question open to interpretation. (For the record I believe hyperlinks should be absolute. Its easier for the publisher to resolve and RSS feeds often exist in different location then the HTML that is embedded.) The fact that this question needs to be asked seems silly to me.
Sam Ruby points out that "a number of things which are quite legal RSS, but are less than neighborly." He is calling for the community to discuss best practices that he and Mark Pilgrim will then implement as warnings in their RSS Vaildator service. I completely agree and back this as the next logical step.
The purpose of the XSS profile I drafted was to add additional constraints to the existing RSS 2.0 format that makes feeds more predictable and more "neighborly." The XSS profile draft attempts to balance ease of use and authoring with ease of consumption by applications while maintaining the richness necessary to be extended and adaptable. I resubmit XSS for the community's consideration in the discussion Sam has proposed.
Being valid RSS (or XSS for that matter) does not guarantee that a feed's content is well-formed enough to be useful to an end user. Here are some additional editorial recommendations to making feeds more useful and well-formed:
Use CDATA for embedding HTML in description tags. I can't advocate this enough. This is perhaps the most important recommendation I can make because it goes a long way to avoiding malformed XML/RSS files with almost no fuss. The method of entity-encoded HTML, also known as double entity-encoding, while quite common and not going away anytime soon you should consider avoiding it and saving yourself and others some headaches. Besides being a nonstandard practice within the XML specification, this method requires more processing cycles, adds to the file size unnecessarily. It’s also prone to occasional error.
Minimize the use of HTML in descriptions. John Postel's maxim on robust protocols says "be conservative in what you do..." and its in this same spirit that I make this recommendation. None of the RSS specification actually limit what you can embed in a description tag. While feed consumers should be prepared to strip out unwanted formatting, it's simply good manners as a content publisher to help avoid issues that could break their aggregator or layout.
Include a descriptive title for each item. Examine any collection of written thought, such as a magazine, newspaper or book, and you will note how information is organized in layers that can be easily scanned and processed by a reader. A good title, subtitle or summary (referred to as heads, desks and leads in media parlance) will not say anything that that isn't contained in the main body of the piece. Without scannability content consumption simply becomes too laborious and time intensive that most of us would hardly bother reading a thing. Try removing titles from any magazine or newspaper and you'll come to appreciate what I'm referring to. Besides being good for scannability, descriptive tiles are good for accessibility. Despite these time-tested best practices, many feeds fail to include titles let alone informative ones. Some webloggers claim its too time consuming and difficult to create a title for the numerous and short posts that they make daily. While I appreciate their standpoint a title, such as the site or collection name with a timestamp ("tima thinking outloud: September 1, 2002 20:13 -4:00") is more helpful and perhaps appropriate. The end user does not have to guess at a title usually by taking some number of characters or words from the beginning of description. ("Today I saw something that..." )
Avoid embedding HTML in the title. The channel and item titles in RSS, like its counterpart in HTML, are considered metadata and thereby is not expected to have display elements such as HTML tags. Embedding markup, even encoded with CDATA, could break an end user's application with your feed. Keep HTML in the description only – if at all.
Consider writing a meaningful and concise summary for the description. Like including a descriptive title, including a meaningful and descriptive summary improves the scannability, and thereby the utility, of your feeds. It helps readers determine if they want to continue reading and communicate the main point of the content for readers lacking time.
If you insist on including the full content of items in a feed, offer end users a choice. This can be a bit of a controversial issue as some users prefer include the full content of an item in the feed so they can read the content in their aggregator. Others prefer concise excerpts that can be quickly scanned or consumed over low bandwidth connections. These viewpoints are neither right nor wrong. It is the content publisher's decision based on the use of their content and needs of their intended audience. However publishers would be wise to offer end users a choice. Since most feeds are generated by a tool this is not difficult to provide. Also consider that end users may only be interested in a particular topic or resource. RSS is highly versatile and can be used to create feeds on a specific topic or resource like a calendar of events, mailing list archive, recent comments, or document repository.
Include contact information in your feed. With vague documentation and varying interpretations of RSS implementations issues will happen. Publishing an email contact responsible for the generation and management of the feed opens the lines of communication in rectifying these issues and collecting feedback.
I look forward to the feedback and insights that these community discussions with certainly produce.