Converting XML Into A PHP Data Structure
XML received great attention because most languages support the parsing and extraction of data from XML documents. As a programmer it is useful to use it and probably you have to deal with XML in many situation. XML is actually a useful data storage structure for PHP programmers.
Why we use XML? Before you begin to use XML, you must first decide if the project needs what XML offers. There are also alternative data storage formats like CSV files and database tables but XML is stronger and useful in many situations. XML provides several additional benefits for programmers such us:
- data format abstraction;
- simple document tag/data validation;
- the ability to store data in a tree-like hierarchy;
- platform independence;
- ease of integration.
XML files are designed to be validated against a DTD and store data in a format similar to something you’d see in an HTML document. All tags are just made up on the fly (as defined by a DTD) and can represent a tree structure. See the next example:
<?xml version="1.0" encoding="UTF-8"?><!-- This is a comment --><drive desc="Letters and Numbers Harddrive"><folder name="folder01"><file name="file1.txt"/> <file name="file2.txt"></file> </folder> <folder name="folder02"> <file name="file3.txt"/> <file name="file4.txt" owner="john"> This is a comment about file4. We like comments.</file> </folder> </drive>
Look at this document and notice that we have one <drive> tag with two tags <folder> inside it. Also, each folder tag contains two tags <file>. This file creates a tree-like structure of data. I would like to access this data from within PHP. There are many ways to get to this data from within PHP including:
- manually parsing the XML file,
- parsing the file with the PHP SAX parser,
- using the XPath libraries to search and pull data,
- or using the DOM parser
The manual option is not our most robust solution, and the DOM support in PHP is still experimental. So, I’ve chosen to use the SAX parser route. However, unlike similar other solutions, I’d like to write an object in PHP that parses this XML document into a PHP data structure so that I can access the data like any other PHP data instead of having to write a custom parser each time I use XML in an application.
Knowing that the structure of the XML file is a tree, we need to find the best way to represent that “tree” data in PHP. Well, my first idea is to immediately consider a PHP array. Another option might be to build objects similar to the DOM parser approach. I’ve decided not to write a DOM parser, though (which you could easily do) because the DOM support is coming along quickly enough. Why duplicate their efforts?
For simple XML, PHP arrays are perfect for the task because you can create arrays of arrays of arrays and hence build a tree structure. Exactly what we need for this learning exercise. Besides, there already exists a plethora of functions built into the core PHP language for iterating through arrays, pushing, popping, shifting, unshifting, splitting, joining, slicing, etc.
To use the DOM model for inspiration, though, we’ll need to store several pieces of information about a given XML tag. Each tag in XML will contain 4 pieces of information that we want to store:
- name of the tag,
- tag attributes (keys and values),
- data (the content inside the tag open and close),
- and possibly other nested tags.
A PHP array that can represent this simple XML tag (also refered to as a node in the tree) might look as follows:
<?php$node = array();$node['_NAME'] = 'folder'; // stores the node (tag) name $node['_DATA'] = 'content'; // stores the text content inside tags $node['_ELEMENTS'] = array(); // stores sub-nodes in order $node['key1'] = 'value1'; // stores all other node attributes $node['key2'] = 'value2'; // stores all other node attributes $node['key3'] = 'value3'; // stores all other node attributes ?>
What I’ve done here is create an array of key and value pairs for all the attributes in the node. Then, I’ve created 3 internal-use-only keys called ‘_NAME’, ‘_DATA’, ‘_ELEMENTS’ to store the tag name, tag data, and sub-node array. By using the underscore (’_') I ensure that I’ll not conflict with an attribute name. Using the sub-node array, we can now create arrays of arrays of arrays and basically build our tree.
Using our XML example again, suppose you wanted to read in some information from the file where name is ‘d.txt’… You’d first convert the XML into a PHP array of arrays and then access the data with code like the following:
<?php $file_name = $data['drive'][0]['folder'][1]['file'][1]['name']; $owner = $data['drive'][0]['folder'][1]['file'][1]['owner']; $comment = $data['drive'][0]['folder'][1]['file'][1]['_DATA']; ?>
PHP has a built-in process for parsing your XML document. You pass a string to the xml_parse function with XML text in it and when the XML document is parsed, handlers for the configured events are called as many times as necessary.
Read the full tutorial here.

RSS/XML