Parsing RSS Without A Parser.

| No Comments

The latest of Mark Pilgrim's Dive Into XML column is out. This month covers the topic of coping with invalid XML disguised as RSS feeds without an XML parser. Before you get too excited Mark concludes:

Hopefully we're trying to use a real XML parser first and only falling back on this messy regular expressions-based sgmllib parser when that fails. However, in flagrant abuse of all things pure and sacred, I have managed to extend this script into a full-fledged parse-at-all-costs RSS parser that supports all the advanced features of RSS, including namespaces. It even handles exotic variations of RSS 0.90 and 1.0, where everything is explicitly placed in a namespace (even the basic title, link, and description tags). I don't recommend it, but it works for me.

Mark makes excellent observations as he presents his case and shows that he has the balls to write an article for XML.com demonstrating how to parse an XML-based format without an XML parser. (His words from his weblog.)

At the same time this article is more of the same news we RSS-aware people have already heard. I suppose it can't be reiterated enough.

Incidentally, it was Mark's initial observations on RSS and the release of his ultra liberal RSS parser that lead me into my foray with RSS that still curses me to this day.

I still will foolishly continue to advocate well-formed and hope for the day where only 1% of feeds are malformed.

Leave a comment

About this Entry

This page contains a single entry by Timothy Appnel published on January 23, 2003 1:49 PM.

RSS: The Web Service We Already Have. was the previous entry in this blog.

RIAA Job Listing. is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Pages

Powered by Movable Type 4.2rc2-en