TagSoup is a library for parsing HTML/XML. It supports the HTML 5
specification, and can be used to parse either well-formed XML, or
unstructured and malformed HTML from the web. The library also
provides useful functions to extract information from an HTML
document, making it ideal for screen-scraping.
