Recently in Syndication Category
Commenting on the application of "desperate heuristics" to make sense of feeds like RSS, Phil Ringnalda adds:
In Atom, doing so is equivalent to gathering up everyone who spent years enduring the endless arguments on the mailing list, and urinating on them.
Brilliant! Well put! The original post about some Atom tests he ran and additional comments are here.
Nick Bradbury notes the that the "full vs. partial feed content debate has risen from the ashes again" in this post. My comment turned into a post of sorts that I thought I'd repost a lightly edited version here.
I'm with Nick. I'm rather tired of this whole argument really. Some of the full feed arguments I've read seem to want to replace going to web pages and viewing the HTML entirely. That seems a bit off and extreme to me. I don't think this is a black and white issue anyways. For some sites it makes sense (read: has value) to have full feeds and some it does not.Still I can see the need in both and agree it is a user interface issue more then anything.
I'm a proponent of Atom because you can include both in one feed in a clear and concise manner. Atom also provides the aggregator better meta data for interfaces that work consistently. My expectation in subscribing to Atom feeds is that the aggregator won't have to make guesses and make me suffer from issues like these highlighted in this post over on Signal vs. Noise. This entry illustrates an interface issue that is created by the ambiguities found in RSS feeds.
Just my view as a user and a developer.
During the syndication wars that eventually led to the Atom format emergence many argued that users should care about formats. That is true, but they are not entirely divorced from each other. If you use a crappy format making a good user interface becomes difficult if not impossible to achieve. This is why the suggestion of using OPML for attention data is concerning to me. I don't need to pound anymore nails into the floor with my forehead.
Tim O'Reilly notes a remix of Cory Doctrow's Someone Comes to Town, Someone Leaves Town using a syndication feed where, once subscribed, you get a couple pages every day. No matter when you subscribe to it, it sends you the book starting from the beginning.
This is a very clever example of syndication potential to do more then broadcast news headlines. Bravo!
Bottom line: the imprecise RSS specification resulted in a lot of guess work, which complicated things for developers, end users and feed producers. The solution? We clarified the RSS spec.
The solution? We clarified the RSS spec. While problems with entity-encoded HTML haven't disappeared completely, in my experience they're far less common than they used to be (and when they do occur, we now have examples to point to).
And that's all that's needed here, too. Clarify the OPML spec, and we can skip another prolonged format battle.
Clearly RSS feeds have gotten much better, but I think the Feed Validator had a much greater impact on that clean-up then the clarification Nick sites. Don't forget RSS 1.0 and 0.9x feeds were equally as busted as the 2.0 version he points to.
Having had my fair share of frustration with both RSS and OPML and OPML's "spec" is far more ridiculous then RSS ever was and that is saying something. I have to wonder -- is it worth saving OPML? I'm not so sure. By default OPML is used as an import/export format by aggregators -- there was nothing else proposed and it just spread as the market for tools exploded. This isn't too dissimilar to how RSS grew. The difference here though is that OPML's use is quite limited in comparison to its intended scope. OPML was specified to be a general purpose outline format however it is only really used for representing blog rolls and it does that poorly. Without a real specification of the values, attributes or even the attributes case there are many variants of the OPML blogroll format in the wild. So is it "really" fixable if it were specified? Not without a lot of breakage really and you'd still need to be ready for all the crazy variants from the void left by the lack of a good specification.
I guess what I'm saying is it doesn't really matter whether it gets better specified or not. Best of luck who are taking it on. My view is that the toothpaste is out of the tube and OPML as a blog roll format will only be a bit player a best.
I'm much more interested and intriuged by XHTML outlines -- the microformat better known as XOXO. I've used it before in a few instances and its worked very well. (X)HTML's system of providing outlines has been around longer, is better specified and more widely supported in the grand scheme of Internet software that I have to wonder what does OPML provide that makes it better? What does specifying and developing another format buy us? I can't think of anything. The current universal support of import/export by aggregators is nothing to sneer at, but how hard would it be to convert current blog rolls to XOXO? Trivial. OPML blog rolls don't contain any more information per entry then the XHTML link tag. Which require more effort: fixing the spec and then all the OPML blog rolls or just converting to XOXO? I think its a tie. Which would provides the better footing going forward? I think clearly XHTML because you can do more now and its already here, well specified and well supported.
It's understandable why OPML is supported in aggregators and that that should continue to be exploited, however I just don't see furthering OPML for blog rolls. Let sleeping dogs lie. It's served it purpose. Lets not get trapped in past foibles and move on to new and better things.
UPDATE: Sam Ruby who I believe is the most patient and persistent person you'll ever find in technology has entered into the OPML conversation. The Feed Validator that he was instrumental in driving has OPML validation with a call for more tests. This I believe supports my assertion that his validator was more instrumental in cleaning up bad RSS then clarify the specification as Nick Bradbury wrote. So perhaps there is a bit more hope of a clean up then I had before.
I'm on my way out to the O'Reilly Open Source Convention. Ben Hammersley and I will be presenting 45 syndication hacks in 45 minutes. I have the utmost faith in Ben to pull off something so mad and only hope I can keep up. We got the time slot I alwys get – last day, last session. Oh, well. It still should be fun.
Last week Tim Bray writes on his weblog I recently proposed to the IETF Atom Working Group that we might be nearly finished. Some people think that’s a mistake because, as they point out, Atom doesn’t have much more in the way of features than RSS.
He goes on to explain why he disagrees and relates the Atom work to his work on XML itself.
He concludes with an excellent observation that all standards groups should heed.
The worst thing the Atom WG could possibly do would be to spend another year or two trying to invent wonderful new syndication goodies. What on earth would give us the idea that we’re smart enough to predict what features the world is going to want? Our job is to write down what we already know works, to do it as cleanly and clearly as possible in as few pages as possible, then get out of the way.
You don’t think this can change the world? Just watch.
I've made it out to the west coast to attend to O'Reilly Mac OS X DevCon (quite happy to be here and to present) and listening to the first keynote by Chris Bourdon on the new features of Tiger, Apple's next version of Mac OS X. I like what I'm seeing a lot, but one slide bothered me as being a bit uncharacteristically off. It said something like RSS support – RSS 0.9x, RSS1, RSS2 and Atom.
I'm glad to see this support of course, but Atom is not RSS. Rather then pick one over the other, it seems to me the term syndication
would be more accurate and encompassing while being a bit less geeky for normal folk.
Phil Windley posted that there has been some interesting questions and discussions on his forum highlighting a readers post about RSS vs. Atom.
I thought it was worth reposting the reply I made on the forum here since I've been too occupied to post much else:
One thing that is always missing or overlooked in the discussion/debate/furor of RSS vs. Atom is that Atom provides a unified feed format and API – RSS, in any form, does not. The real value is when you need a publishing API and feed in your system. (The more powerful case for the Atom effort is the Blogger API vs. MetaWeblog API vs. Atom.) If all you need is a one-way syndication format then one of the RSS formats will suffice. That is why Google, SixApart and more recently Nokia are signing on to Atom.
As someone developing software for these syndication formats, I believe the eventual benefit of the Atom feed to the average Joe/Jane user will be that their aggregator software will provide a more reliable and consistent experience. Because of the extremely loose specs,
(many) multiple versions, and the large number of optional and overlapping tags with similar meanings, it requires a lot of work, independent research and trial-and-error to reliably present any and all feeds to the average user. I've also found it requires on-going tweaking as new patterns emerge. (I wince when someone refers to RSS as simple. I realize that its the part of me that is a developer doing the wincing though.) Given the effort and care going into the Atom feed format and its pending submission to the IETF as a formal standard, I'm fairly optimistic that Atom will be an improvement in this regard.
I too will use both and let users decide.
While I was out on my latest blogging hiatus, James Snell picks up on my response to earlier discussion by Jeremy Zawodny and Diego Doval for developing a means of more robust RSS auto-discovery. He writes:
As much as I like WSIL, it's pretty much officially dead. There is no further work going into it at all. So while I like where Tim is going with this, I think an alternative approach needs to be developed.
He goes on to layout an example of a new alternate WSIL-like format he calls The Automatic Discovery indeX (ADX).
I agree with James and had suspected as much about WSIL. Given James employer is one of the authors of that spec I'll take it as fact.
I also like where he's gone with ADX. Like WSIL, it's more RESTful then UDDI (which just needs to die) and relatively simple and versatility enough to integrate numerous format pointers into one mechanism – SOAP, RSS, OPML, Atom and so on. This could also be used as a more robust and eventually a better formed and documented blogroll format.
While its a good start, ADX as James has detailed it I think it could use some refinement – mostly what I think are nits.
- Keep the tags all lowercase. Most formats do it that way so I see nothing gained be switching to proper case tags names.
- I like the reuse of existing RSS modules such as Dublin Core however their use seems inconsistent. For instance James uses
dc:title, but does not namespaceDescription.Nameis also not namespaced and is about the equivalent ofdc:title. I realize that James was transcribing my WSIL examples, but since this is a new format we might as well clean that up. Perhaps if Dublin Core is going (and should) play such an important roll those elements should just be folded into the syntax of this format?
- In the spirit of Dublin Core, I'd reuse this element sets naming conventions as much as possible.
Service.EndPointbecomessourceor perhaps the RSS standardlinkand so on.
IndexRefshould be expanded to allow for additional meta data to be associated with a reference to another index. For instance, what type of index is on the other side of this link? Another ADX? Or perhaps a UDDI directory? Or perhaps a OPML file. This is also an important allowance for blogroll use.
- For argument sake, I'd like to see an example of a WSDL file and a UDDI pointer.
- Having a schema is good, but I think should be optional in an ADX document.
- I'm really hesitant of the DNS Service Discovery method because most users do not have the knowledge or access to implement such a thing.
Mostly nits. So here is my riff on James' original ADX proposal where I incorporate my feedback into an example:
<?xml version="1.0" encoding="UTF-8"?> <index xmlns:dcterms="http://purl.org/dc/terms/" xmlns="urn:temporary:uri"> <title>News4Humans feedOnFeeds</title> <dcterms:modified>2003-09-12T23:45:37-00:00</dcterms:modified> <source>http://news4humans.com/index.adx</source> <description>All the news preferred by highly evolved primates.</description> <language>en-us</language> <creator>newsfor@news4humans.com</creator> <service> <name>Latest News</name> <description>A syndication feed of the 15 most recent news posts.</description> <source>http://news4humans.com/feeds/index.xml</source> <format>http://purl.org/rss/2.0/</format> <dc:modified>2003-09-12T23:35:52-00:00</dcterms:modified> </service> <service> <name>Google Search</name> <description>A SOAP interface to the Google search engine.</description> <format>http://schemas.xmlsoap.org/wsdl/</format> <source>http://api.google.com/GoogleSearch.wsdl</source> </service> <!-- This was IndexRef –> <link> <title>News4Humans Technology News Feeds</title> <description>All the news preferred by highly evolved primates.</description> <source>http://news4humans.com/tech.adx</source> <format>urn:temporary:uri</format> <creator>News4Humans</creator> </link> </index>
Thoughts?
Summarizing the discussion on a more advanced RSS auto-discovery format that was recently started by Jeremy Zawodny, Diego Doval writes:
if Tima or someone else would have a bit of time to re-write my mock-up structure using WSIL, it would be most welcome!
Done. Here is a quick mockup of both approaches Diego used to representing hierarchical content in RSS.
The first is a single file example were I use the dc:subject element to define the category in which a client could group feed pointers.
http://www.timaoutloud.org/files/diego/index.wsil
I think this example is pretty self-explanatory. service is the equivalent of RSS's item.
The second example I constructed uses multiple files and WSIL's ability to point to feeds and other WSIL files.
http://www.timaoutloud.org/files/diego/index2.wsil
http://www.timaoutloud.org/files/diego/tech.wsil
http://www.timaoutloud.org/files/diego/world.wsil
http://www.timaoutloud.org/files/diego/various.wsil
index2.wsil contains links to the other files (I used the fictitious news4humans domain in the URLs so you'll have to do the mappings.)
I think the second option is the way to go because it scales for sites like Yahoo though I don't have a problem with looking at supporting both. I added a latest news feeds to index2.wsil just to demonstrate that services and links to other WSIL.
There are a few caveats to what I did here.
- Took a few liberties with the WSIL 1.0 spec, but are completely legal XML. Mainly I used RSS modules to bring in additional meta data where ever needed instead of making up my own tags.
- RSS 9x and 2.0 doesn't have am official namespace which continues to be an unfortunate and continuing design flaw. I made up one for the example – http://purl.org/rss/2.0/.
dc:anguageshould probably be on a per service basis, but for the sake of replicating Diego's example I left it were it was.service.dc:dateshould probably be dcterms:modified- Added an abstract, error reports to the examples.
- I could have very easily added pointers to web services via WSDL files or UDDI directories. I could have very easily have added Atom feeds or archives or OPML files for that matter.
I'm pretty convinced that WSIL is along the lines of what would be optimal in creating one scalable format that can be inclusive to handle many formats in addition to web services. This said, I think WSIL in its current form has much to be desired. For instance, I think supporting extensiblity through namespaces is the way to go, but there are probably too many elements with namespaces in my mockups. Some of the tags names area a bit off
and could be better. Let me go out on a limb here – I'm also not sure the RDF syntax is really much value in this non-RDF format. Could use to factor those out. (I'm sure the semantic web mob will be on me for that.)
