XPath

XPath uses path expressions to select nodes or node-sets in an XML document. As you will see, these expressions closely resemble what you use when working with traditional file systems.

Quick note on terminology: In this chapter, children nodes denote direct children of their parent nodes, i.e. with no intermediate nodes in between the two. Descendants mean both (direct) children and indirect descendants of their respective parents, i.e. with any number of intervening nodes in between.

Syntax

Here is the specs of the most basic XPath syntax:

Expression Description
nodename Select all children nodes of the current node, whose name is equal to nodename.
/root/nodename / denotes absolute path. Select all nodes specified by the given path.
parent//descendant Select all nodes with the name descendant who are descendants of the parent, no matter on which level, i.e. both children and descendants.
. Select the current node. Applicable mostly in templates, for-each loops, etc.
.. Select the parent of the current node. Applicable mostly in templates, for-each loops, etc.
nodename/@attribute Select the attribute of the node specified by the preceding expression.

As you can see, a single slash / at the beginning of an expression denotes an absolute path and a single slash / everywhere else denotes a direct child of the result of the preceding expression. Double slash // implies any descendant, i.e. both direct and indirect children.

These are the basics, let's look at some filtering predicates:

Expression Description
nodename[1] Select the first child element of the current node whose name equals nodename.
nodename[last() - 1] Select the penultimate child element of the current node whose name is nodename.
nodename[position()<4] Select the first three child elements of the current node whose name is equal to nodename.
nodename[@attribute='value'] Select all child elements of the current node whose name is equal to nodename and whose attribute value is equal to value.

As you can see, you can use position and conditional predicates in (square) brackets to filter the results of the preceding expression.

Learn by example

Enough of this theoretical obscure syntax, right? Let's have the following sample XML data as there is nothing like learning XPath while actually seeing the results of your queries:

<books>
  <book>
    <author>Brian Christian</author>
    <title data-type="string">The Most Human Human</title>
    <pages data-type="number">303</pages>
  </book>
  <book>
    <author>Aldous Huxley</author>
    <title data-type="string">Brave New World</title>
    <pages data-type="number">268</pages>
  </book>
  <book>
    <author>Ian Goodfellow</author>
    <title data-type="string">Deep Learning</title>
    <pages data-type="number">787</pages>
  </book>
</books>

The following table shows the results of some of the typical XPath queries you might (not want to, in come cases) be using with the data above:

Expression Result Comment
/books/book[1]/author "Brian Christian" Author of the first book, using absolute path.
books/book[2]/author "Aldous Huxley" Author of the second book, using path relative to the current node, in this case the document itself.
/book[3]/author "" (empty) Empty selection as no such absolute path exists in the XML document.
book[1]/author "" (empty) Empty selection as current node is the document which has no children of type book: its only child is books.
//book[3]/author "Ian Godfellow" Author of the 3rd book node anywhere in the document.
//book[3]/author[1] "Ian Godfellow" 1st author of the 3rd book anywhere in the document which, incidentally, is the only one.
//book[3]/author[2] "" (empty) Empty selection as the 3rd book in the document has no 2nd author node.
/books/book[1]/*[@data-type='string'] The Most Human Human All child nodes of the 1st book whose attribute data-type has the value of "string". That happens to be the book's title.
/books/book[1]/title/@data-type string Attribute data-type of the title of the 1st book.

Relative vs. Absolute: Which One to Use?

Absolute xpaths are more error-prone as even slight changes in DOM may render them invalid or make them refer to a wrong element. On the other hand, they are unambiguous by definition and perform better on larger data sets.

As a rule of thumb, we recommend using absolute paths as they are clearer and should changes in data structure occur, style sheets need to be revised regardless of whether they mostly use relative or absolute paths.

results matching ""

    No results matching ""