I’ve used HTML Tidy for a looooong time and have always loved its power.
Need to clean out the crap older versions of MS Word tosses in when you save a document as HTML? Tidy to the rescue. Need to check HTML syntax, correctness, or proper authoring practices? Tidy’s your tool. Want to just get a human-readable, nicely indented file from SGML/HTML/XML input? Tidy’s just the thing.
Tidy’s long had the ability to convert HTML, even files which aren’t well-formed, into XML. That’s a pretty sweet feature, and newer versions of Tidy will even spit out lovely XHTML.
Tidy’s got billions of handy config options, like getting all those stinky elements properly indented or wrapping text automatically at a specific column.
I just grabbed the latest Win32 version this morning as part of some work I’m doing converting badly structured HTML into XML as part of a migration to DocBook format.
In a very, very weird karmic wheel of life experience I’m once again working on a technical manual viewer — awfully similar to the Air Force Common Viewer I worked on nearly a decade ago. But I digress. (Which is a frequent occurance around here, but I digress further yet.)
To the point: Grab Tidy if you’re ever in need of checking or manipulating XML or HTML files. It’s a sweet little tool. It’s awfully simple to use, but I see there are even GUI front ends for it if you feel the need.