XML in PHP5, Let me Count the Ways

I've decided that I have to start using XML on PHP 5, so I thought I'd take some notes on my quick skim of the support available. Read on to get an idea of how the different PHP extensions relate to each other.

I've already had some success using the DOM XML approach from PHP4. If you've been to my SVG site (svgbasics.com) lately, the documents you read there are being transformed from XML input. The reason I took that approach is kind of convoluted, but it's working just great now. It also got me into parsing XML with PHP which is turning out to have quite a few practical applications for me.

I don't want to spend forever using DOM XML for PHP4 though. I understand PHP5 has improved XML support, and DOM XML for PHP4 is very slim on details at php.net. Most of the useful information is in the user comments. Couple that with the fact that the computer I develop my sites on has a webserver running PHP5 and I feel compelled to investigate XML on PHP5. No work before it's time, I always say.

With a quick call to phpinfo() on my local install of PHP5, I can count four distinct sections that seem to indicate different approaches to handling XML. I see DOM XML, libxml, SimpleXML, and just plain XML. I'm going to dig around the PHP manual and see if I can find out just what's what. Maybe I can save someone else the same effort.

The first find in the manual is XML Parser Functions
This one is an extension based on expat. I can't tell if it's PHP5 only or if it's available for PHP4. The idea is that you use the library to create an XML parser. Then you set some handlers which are essentially functions for the parser to call when it hits certain conditions. Sounds simple and the examples look straightforward.
My intent now is to use XSLT to transform documents (some kind of masochistic streak I guess), so this one doesn't look like what I need.
To find out if your PHP installation supports it, call phpinfo() and look for "disable-xml." It looks like you've got it (it's on by default) if that's not in the output.

In the phpinfo() output, there's a dom section that says DOM/XML is enabled.
The manual indicates this is in good shape in PHP 4.3, but in PHP 5 has been removed in favour of the DOM extension. I think this is the path that I need to follow. Maybe I can check the PHP version in my current code and add in some support for PHP 5 where appropriate. I don't really like this route, but it seems better than the alternative: writing and then rewriting my scripts.

The other name I want to investigate from the phpinfo() page is SimpleXML. This extension is also enabled by default and disabled only if the "disable-simplexml" configuration option is used. The idea of SimpleXML is that you can let the library parse out your XML and turn it into a PHP object. Sounds great, but the comments indicate some caveats when dealing with namespaces and the issue of adding nodes to the tree. That stuff's not exactly a simple use of XML anyways - I can see some applications for this library. The one function in this library that really stands out is simplexml_import_dom(). It pulls a node from a domDocument (the object created by the DOM extension mentioned above) and turns it into a SimpleXML object. That could be great in cases when you've got a huge XML document and need to deal extensively with just one part of it.

For my application, I think a combination of the DOM extension and the SimpleXML extension will do the job.

In searching around for this info, I get the impression that libxml seems to be the PHP5 (and maybe PHP4) foundation for XML access.

That about sums up all I wanted to find out for now. When I find out that everything I've said is wrong I'll come back and leave a note. Or you can leave a comment.

Your rating: None

Damn, I had a book out from the library at work on PHP 5 and it talked about the new XML stuff - the problem is that godaddy is still using PHP 4.3.x so I didn't bother reading it yet (much like you say: No work before it's time). Returned the book last week, otherwise I might have helped clarify some things.

Anyway, I found your entry a little confusing because you talk about 4 areas: "DOM XML, libxml, SimpleXML, and just plain XML" and, while you describe what "DOM XML" and "SimpleXML" are, you don't really describe what "libxml" and "just plain XML" give you (or what the difference is). Also, where does the "DOM extension" come into the picture (libxml or "just plain XML")?

If you ever rewrite this as a full-blown article, I'd suggest a little cleanup, but otherwise a good resource.

Yeah, I didn't spend a lot of time editing, I just wanted to take down my thoughts and throw some leads out. “libxml” is, like I mentioned, the foundation for all the XML access methods in PHP. I got that impression since it's mentioned in the documentation for every one of the others (dom, dom/xml and the XML Parser functions).

Just plain XML is the section of the phpinfo() output that's just titled "xml." I think it refers to support for the XML Parser Functions.

The DOM extension is the replacement in PHP5 for what was called the DOM XML (or dom/xml) extension in PHP4.

As it turns out, the DOM extension had everything I needed to pull out the data that I wanted. You can also use it to make XPath queries and apply XSLT. There's also a method called dom_import_simplexml() which is the invers of the simplexml_import_dom() method.

I think this post might be useful, but I expect it to stay a draft...