DataStream
LANGUAGES: XML | C#
ASP.NET VERSIONS:
1.0 | 1.1
Get There
with XPathNavigator
Exploiting
the .NET XPath Query Engine to Navigate Hierarchical Data
By Brian
Noyes
In XPath
Basics I gave a quick introduction to the syntax of XPath expressions to
help the uninitiated get comfortable with XPath, which is a very important
technology to understand for working with XML data of many forms.
The
thing about XPath is that it can't do anything on its own; it needs a
processing engine to perform work based on the expressions. That processing
engine could come in many forms. In .NET 1.x, XPath comes into play both for
querying and navigating XML data in documents, and also for transforming XML
documents using XSLT. In this article I'm going to give a quick introduction to
working with the XPath processing engine that you use to query and navigate XML
data in .NET - specifically the XPathNavigator class and how to use it.
Getting to the Root of Things
One
reader correctly pointed out that in XPath Basics I did not cover an important
concept of XPath related to absolute and relatives paths. That was partially
intentional, so now let me set things straight on that account. In "XPath
Basics" I emphasized that the evaluation of an XPath statement is always
relative to the current context node. This is true whether you are talking
about an individual location step within an XPath statement, or about an entire
expression. The context node can be set by a previous location step, or it can
be set based on the context of the processing engine that's evaluating the
expression.
So if
XPath expressions are always relative to the current context node, how can you
have an absolute path? The answer is that you can still think of an absolute
path in XPath as being relative to the current context node. The way to specify
an absolute path in XPath is to use the "/" character at the beginning of the
expression. This basically says "start at the root of the document". So from
that perspective, an expression like /Music/Album is an absolute path that is
evaluated starting at the root of the document, looking for a root element
named Music, containing a child element named Album. The way you can view this
as still being relative to the current context node is that you can legally
evaluate this expression against a reference to a node anywhere within the
document, so the query is executed relative to the current node.
The
reason I waited to mention this, is that in order to make a statement like the
previous sentence, you're really starting to talk about the use of XPath with a
particular processing engine. Because I was going to wait until this article to
talk about the XPath processing engine in .NET, I thought I'd wait to clarify
the relative vs. absolute path issue. That being done, let's get on with some
processing!
XPathNavigator Knows the Way
The
primary object for querying and navigating XML in .NET is XPathNavigator.
If you've been using the Document Object Model (DOM) for dealing with XML for a
long time, you may feel more comfortable dealing with the XmlDocument
class and using the SelectNodes method to perform queries. The truth is
that, under the covers, SelectNodes is using XPathNavigator for
you. And if you start using XPathNavigator directly, you can adopt a
consistent programming approach that will work with XmlDocument, XmlDataDocument,
or XPathDocument objects. This will become even more important in .NET
2.0 when XPathDocument gets a serious overhaul to its implementation,
allowing it to track changes made to the document in a similar way that
DataSets do today.
The XPathNavigator
class basically encapsulates a cursor into an XML node set, and allows you to
navigate or perform queries relative to that node. The class exposes a set of
methods to move to sibling, parent, or child nodes, as well as a set of methods
focused on executing a query using an XPath expression. Using an XPathNavigator
object you can pre-compile an expression and use that compiled version to
perform repeated queries with the same expression much more effectively.
You can
get an XPathNavigator from any of the .NET XML document types by calling
the CreateNavigator method. What you get is an instance of an XPathNavigator with its underlying
cursor initialized to the root of the document. From there you can perform
queries to obtain sets of other XPathNavigator objects that point to the
results of the query, or you can move the current cursor through the document
using the navigation methods of the class. You can also access a number of
properties on a navigator to extract the data contained in the node to which it
is currently pointing so you can perform processing on that data.
To
perform a query with an XPathNavigator instance,
you can call its Select method, passing in an XPath expression. What you
get back is an XPathNodeIterator that allows you to step through the
results. This is another lightweight object that allows you to obtain an XPathNavigator reference to each of the
nodes that matched the query. Using these references, you can then either
extract data from the nodes, or you can use the navigator to perform subsequent
queries or navigation that will be done relative to the matching nodes.
Query for Music
Let's
look at an example. First we need some XML to work against. Say you have some
XML that contains information about music. If you had a schema as shown in
Figure 1, you would have a Music root element, Artist elements under that,
Album elements under Artist, and Track elements under Album. Each of those
elements has certain attributes, as shown in Figure 1, that you might be
interested in extracting for processing. The resulting XML looks like Figure 2.
Figure 1: The Music XML data schema.
<Music>
<Artist name="Evanescense">
<Album name="Fallen">
<Track number="1">Going
Under</Track>
<Track number="2">Bring
Me To Life</Track>
<Track
number="3">Everybody's Fool</Track>
</Album>
</Artist>
</Music>
Figure
2: A Music XML
file.
Given
that schema, let's say we first wanted to use an XPathNavigator to query for all the Album elements within a
document. The code for doing so would look like that shown in Figure 3.
public void
ProcessAlbums()
{
// Load a document.
XPathDocument doc = new
XPathDocument("Music.xml");
// Get a navigator initialized to the root.
XPathNavigator nav = doc.CreateNavigator();
// Perform a query.
XPathNodeIterator iter =
nav.Select("//Album");
// Iterate through the results.
while
(iter.MoveNext())
{
XPathNavigator navCurrent = iter.Current;
ProcessAlbum(navCurrent);
}
}
Figure
3: Querying the
XML document for Album nodes.
In the
code in Figure 3, I first load the XML into an instance of XPathDocument. The XPathDocument
class is the best to use in .NET if you don't need to modify the contents of
the document while processing it. I obtain an XPathNavigator from the document by calling CreateNavigator.
Using that navigator, I execute a simple XPath query for all descendant
elements named Album (using the XPath shorthand operator // for the
descendant:: axis). That query returns an XPathNodeIterator that can be
used to iterate through the results.
To use
the iterator, you call MoveNext, which returns true if there were
any more nodes to process in the iterator. If so, then the Current
property on the iterator will return a reference to an XPathNavigator positioned on the current node represented by the
iterator. I take that navigator reference and pass it off to another method to
process the results (which you can see in Figure 4).
public void
ProcessAlbum(XPathNavigator navAlbum)
{
// Clone navigator to move off axis.
XPathNavigator navArtist = navAlbum.Clone();
// Move to the parent (Artist) node.
navArtist.MoveToParent();
// Move to its name attribute.
navArtist.MoveToFirstAttribute();
// Output the artist name.
Console.WriteLine(navArtist.Value);
// Move to the album name attribute.
navAlbum.MoveToFirstAttribute();
Console.WriteLine("\t" + navAlbum.Value);
// Move back up to the parent element.
navAlbum.MoveToParent();
// Move down to first track element and
output its text.
navAlbum.MoveToFirstChild();
Console.WriteLine("\t\t" +
navAlbum.Value);
// Loop through the rest of the track
elements.
while (navAlbum.MoveToNext())
{
Console.WriteLine("\t\t" +
navAlbum.Value);
}
}
Figure
4: Navigating
results with the XPathNavigator.
In the ProcessAlbum
method, I switch from using a navigator as a query tool to using it to navigate
a known schema of nodes. The code embeds the knowledge of the schema in the
form of some explicit navigation steps from node to node using the navigator
that was passed into the method representing an Album.
The
first thing the code in Figure 4 does is to clone the navigator. If you are
going to move "off axis" to move up to a parent or down into a collection of
child nodes, and you want to resume processing where you started, you'll need
to clone the navigator before you start calling navigation methods. Remember
that the navigator maintains a single reference (or cursor) into the nodes
saying what the current context node is as far as it's concerned.
As soon
as you call a MoveXXX method,
that cursor has changed, and you'll have no easy way to get the context back to
where you started - short of reversing all the navigation steps you have taken.
So if you clone a navigator, you can hold onto either the original or cloned
navigator and use the other to move away from the current node. When you're
done with that processing path, you can simply resume using the cloned
navigator that's still where it was when you cloned it, and throw away the
other navigator.
Once the
code in Figure 4 has a cloned copy of the Album node navigator, it uses the
cloned copy to move up to the parent node, which, based on the schema, should
be an Artist node with a name attribute. So it uses a couple of MoveXXX methods to move to that
attribute, and then simply spits out to the console the name of the Artist for
the album.
After
that, it resumes using the original Album navigator and moves down to its first
attribute, which should be the Album name. After spitting that out to the
console, the code backs the navigator up to the parent, which is the original
Album element when you have moved to an attribute. That's one thing to get used
to when moving to attributes. They are not treated as child nodes of an
element, but the element itself is treated as a parent to the attribute node.
Once the cursor is back on the Album element, the code moves it down to the
first child element, which should be a Track element based on the schema.
From
there it extracts the Value property of the current node, which is
simply the contained text node when the element contains text like the Track
element. After processing the first child, it processes the remaining Tracks by
calling MoveNext on the navigator, which will keep moving the cursor to
the next sibling node until there are no more, at which point it will return false
and exit the loop.
The code
is very fast when you use the MoveXXX
methods to step through the nodes in the schema. So I could've used the Select
method repeatedly to get to each node of interest, issuing a different XPath
expression to ensure I got back the desired results. Performing a query,
however, is much less efficient than simply bumping the node reference using a Move
method.
Pre-compile
for Speed
There
are many other things you can do with XPathNavigator
to process the contents of an XML document. The first to be aware of is that if
you're going to perform the same query a number of times, perhaps on a
collection of documents, then the query will execute significantly faster if
you pre-compile the expression.
You do
this by calling the Compile method on the navigator, passing in an XPath
expression as a string and getting back an instance of an XPathExpression
object. You can pass that XPathExpression object to the Select
method, and the execution of the Select method will be much quicker than
if you passed in the XPath as a string every time. Figure 5 shows a variation
on the ProcessAlbums method that uses this approach.
public void
ProcessAlbumsCompiled()
{
// Load a document.
XPathDocument doc = new
XPathDocument("MusicBase.xml");
// Get a navigator initialized to the root.
XPathNavigator nav = doc.CreateNavigator();
// Compile the query first.
XPathExpression exp =
nav.Compile("//Album");
// Perform a query using the compiled
expression.
XPathNodeIterator iter = nav.Select(exp);
// Iterate through the results.
while (iter.MoveNext())
{
XPathNavigator navCurrent = iter.Current;
ProcessAlbum(navCurrent);
}
}
Figure
5: Executing a
compiled expression.
The last
thing to mention about XPathNavigator
is that if you're evaluating an XPathExpression that will result in a
value instead of a set of nodes, you can use the Evaluate method instead
of Select. Evaluate will return a value corresponding to the
value that results from the evaluation of the XPath expression. Remember from
last time that I said that XPath expressions can result in a numeric, string,
or Boolean value. The Evaluate method simply returns an object
reference, so you'll have to cast the result to the appropriate type. For
numeric values, the return result comes into the .NET code as a double, so
you'll have to cast appropriately there (see Figure 6).
int
GetAlbumCount()
{
// Load a document.
XPathDocument doc = new
XPathDocument("MusicBase.xml");
// Get a navigator initialized to the root.
XPathNavigator nav = doc.CreateNavigator();
// Compute the count of Album elements.
double d =
(double)nav.Evaluate("count(//Album)");
return (int)d;
}
Figure 6:
Returning a value from an XPath expression with Evaluate.
That's a
quick tour of using the XPath processing engine with the XPathNavigator class to query and navigate a document. This should
be your preferred mode of dealing with XML (over using SelectNodes in
the XmlNode class) because it's portable across all the XML document
types in .NET and will be the way of the future when XPathDocument in
.NET 2.0 introduces change tracking. I'll write more on that topic when we get
a little closer to the .NET 2.0 release.
The files referenced in this article are available for download.
Brian
Noyes is a
software architect with IDesign, Inc. (http://www.idesign.net),
a .NET-focused architecture and design consulting firm. Brian is a Microsoft
MVP in ASP.NET who specializes in designing and building data-driven
distributed Windows and Web applications. Brian writes for a variety of
publications and is working on a book for Addison-Wesley on building Windows
Forms Data Applications with .NET 2.0. Contact him at mailto:brian.noyes@idesign.net.