One important detail I didn't mention in yesterday's post on converting between XML and HTML was the "disable-output-escaping" attribute. Using disable-output-escaping="yes" causes the result of the transform to keep "< " and "&" in the resulting document. Otherwise, they get turned in to < and &. You may have found that HTML tags, <, and & all get stripped from your XML every time you do a transform.
This happened to me and I fought with disable-output-escaping="yes" and tried enclosing my HTML tagged parts (the section from yesterday's examples) in a CDATA block. You need to use both to get your HTML tags to come through unaltered. In the XML, use a CDATA block for the section that uses tags you want to be carried through (not processed as XML or stripped). To do this with the fruit review, the XML would look like this:
<review>
<![CDATA[
Grapes can be <i>green or <b>purple</b>
]]>
</review>
Notice I used the old <i> and <b> tags, not the XHTML <strong> and <em> tags to emphasize where this should be applied. The HTML inside the CDATA section is treated as just a big text string, XML tags inside the block won't be processed. That's why it's great as an
intermediate step for moving your site toward a purely XML based format. When you're ready to start handling the formatting of all your content with HTML, then you can start doing that with new documents and minimize redoing old content when there's not so much benefit.
To recap, since the CDATA block causes < and & to be allowed into the transform, then disable-output-escaping="yes" can see those characters and output them in the final document. The result is XML transforms where you need them, dumb old text copying where you don't.