XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (545 page)

Expression
Possible Result
distinct-values((1, 2, 3, 3.5, 2.0, 1.0))
3.5, 2.0, 1, 3
distinct-values((“A”, “B”, “C”, “a”, “b”, “c”))
“B“, “c“, “a”
distinct-values((xs:time(“12:20:02Z”), xs:time(“13:20:02+01:00”)))
xs:time(“13:20:02+01:00”)
distinct-values((1, “a”, current-date()))
“a”, 1, 2008-05-08Z

Usage

The
distinct-values()
function provides the only direct way of eliminating duplicate values in XPath 2.0 and in XQuery 1.0. In XSLT 2.0, however, richer functionality is available in the form of the

instruction.

If you apply the function to a sequence of nodes, the result will be the distinct values present in those nodes, not the nodes themselves. To process the nodes, you will have to find the nodes having each value. The typical logic is the following, which returns a sequence of integers representing the number of employees in each department:

for $x in distinct-values(//employee/@dept)

   return count(//employee[@dept = $x])

In practice the processing of the result will probably be done in XSLT, XQuery, or some other host language, because it will usually involve generating nodes in the output, which XPath cannot do on its own.

Having found the distinct values that appear in a sequence, it is possible to determine the positions of each of these values using the
index-of()
function. For example, if you are using XQuery, then you can sort the distinct values in order of their first appearance in the sequence by writing:

(: XQUERY 1.0 EXAMPLE :)

 for $d in distinct-values($sequence)

 order by index-of($sequence, $d)[1]

 return $d

Alternatively, you could sort them in order of their frequency of occurrence by writing:

(: XQUERY 1.0 EXAMPLE :)

 for $d in distinct-values($sequence)

 order by count(index-of($sequence, $d))

 return $d

XPath 2.0 has no sorting capability, so this operation can only be done in the host language. In XSLT, it is usually more convenient to use the

instruction.

See Also

index-of()
on page 807


in Chapter 6 on page 326

doc, doc-available

The
doc()
function retrieves an external XML document by means of a URI, and returns the document node at the root of the tree representation of that XML document. Its companion function
doc-available()
determines whether an equivalent call on the
doc()
function would succeed in locating a document.

Changes in 2.0

These functions are new in XPath 2.0. The
doc()
function is a simplified version of the
document()
function that was provided in XSLT 1.0 and which remains available in XSLT 2.0 (see page 754). When combined with functions such as
resolve-uri()
and
base-uri()
, the
doc()
function provides most of the capability of the XSLT 2.0
document()
function, but with a much simpler interface.

Signatures

The doc() function

Argument
Type
Meaning
uri
xs:string?
The URI of the document to be loaded
Result
document-node()?
The document node of the document identified by this URI

The doc-available() function

Argument
Type
Meaning
uri
xs:string?
The URI of the document to be loaded
Result
xs:boolean
True if a call on the
doc()
function with the same argument would succeed; false if it would fail

Effect

The
doc()
function gives XPath a window on the outside world, by allowing it to retrieve documents identified by a URI. Potentially this makes any XML document anywhere on the Web available for processing.

However, because the
doc()
function is an interface between the XPath processor and the world outside, many aspects of its behavior depend on the implementation, or on the way that the implementation is configured. XPath 2.0 is expected to be used in a great variety of environments (for example, some XPath processors might only work with XML documents that have been preloaded into a purpose-designed database) and the spec therefore gives a great deal of freedom to implementors. In fact, the formal specification of this function simply says that the evaluation context for processing an XPath expression provides a mapping of URIs to document nodes; if you specify a URI for which a mapping exists, then you get back the corresponding document node, and if you specify a URI for which no mapping exists, you get an error.

The term
mapping
here is deliberately abstract. It actually allows the implementation to do anything it likes to get from the URI you specify to the tree that comes back. Many implementations will allow users to control the process, either by implementing user hooks like the
URIResolver
in Java's JAXP interface and the
XmlResolver
in .NET, or by setting options in configuration files or command line parameters.

Before the URI is used, it is first resolved into an absolute URI. You can resolve the URI yourself using the
resolve-uri()
function, in which case you have a free choice of the base URI to use, but if you pass a relative URI to the
doc()
function then it will always be resolved against the base URI from the static context of the XPath expression. In XSLT 2.0 this generally means the URI of the containing stylesheet module; in XQuery it means the base URI given in the query prolog. If the relative URI was read from a source document, then it should normally be resolved against the base URI of the document from where it was read, but this is left to the application to do.

One rule that the implementation must enforce is that if you call
doc()
twice with the same absolute URI, you get the same document node back each time. In XSLT, this rule applies for the duration of a transformation, not just for a single XPath expression evaluation.

What is likely to happen in a typical implementation is this:

  • The URI (once resolved into an absolute URI) is checked against a list of documents that are already loaded. If the URI is in the list, the same document node is returned again.
  • Otherwise, the absolute URI is used to identify and fetch an XML document, for example, by using the
    file
    or
    http
    URI schemes.
  • The XML document is parsed, and optionally validated using a DTD validator or schema processor.
  • A tree representation of the document is built in memory, and the document node at the root of this tree is returned as the result of the function.

Many processors are likely to allow users to control aspects of this process, including:

  • Locating the physical resource containing the source XML (if indeed it is source XML). Mechanisms such as catalogs or user hooks (like the JAXP
    URIResolver
    ) might be used to provide an indirection between the URI and the location of the resource.
  • Selecting an XML parser, and setting options to determine whether it performs DTD and/or schema validation.
  • Setting options that define whether
    XInclude
    directives in the source document are expanded, and whether any information in the source document (such as insignificant whitespace, comments, processing-instructions, or unused namespaces) is to be excluded from the tree representation.
  • Setting tuning options; for example, parameters that control space/time tradeoffs in the way the tree is built.
  • Setting error-handling options; for example, whether a parsing error is to be treated as fatal, or whether an empty sequence (or perhaps a fallback document) should be returned in such cases.

If a schema is used to validate the document, then it must be compatible with any schema that was used when compiling the XPath expression. Here again, the detailed rules have been left to the implementation. The processor may require that the input document is validated against a schema that was known at compile time, or it may allow validation using a different schema, provided that the tree that comes back contains enough information to allow the type definitions to be located at runtime. The processor is supposed to ensure that there is no version incompatibility between the compile time and runtime schemas, but it wouldn't be surprising to come across a processor that simply passes this responsibility back to the user.

The
doc-available()
function works exactly the same way as the
doc()
function, except that instead of returning a document node when a document can be loaded and throwing an error when it can't,
doc-available()
returns true in the first case and false in the second. In the absence of any try/catch capability either in XSLT or XPath, this allows you to test for errors before they occur, so that processing can continue when the required document does not exist or has invalid content.

Usage and Examples

There are three main ways an XPath expression can access nodes in input documents.

  • The input document (or a node within it) can be supplied as the context node.
  • A node can be included in the value of a variable available in the context.
  • The XPath expression can invoke the
    doc()
    function (or the
    collection()
    function) to access the document by URI.

Which of these three approaches is used is a matter of application convenience, and may be influenced by the facilities available in the host language or the processor API for configuring the behavior of the different options.

The following example shows an expression that uses a look-up table in an external document. The look-up table might have the form shown below and be held in a document called
countries.xml
:


   

   

   

   

   …

 

A query that uses this table to display the number of employees located in each country might look like this:

string-join(

   for $c in doc(“countries.xml”)/country return

     concat($c/@name, “: ”,

            count(//employee[location/country = $c/@code]))

   “ ”)

This will return a string of the form:

Andorra: 0

United Arab Emirates: 12

Afghanistan: 1

Antigua and Barbuda: 25


If you want to process a document if and only if it actually exists, you can use logic of the form


You should be aware of a few points:

  • It's reasonable to expect that the document will actually be read and parsed only once
  • If the document can't be read, or if it exists but can't be parsed as XML or validated, then you won't get any explanation as to why.
  • If the URI is invalid, or if a runtime error occurs in computing the value of the URI, then
    doc-available()
    will throw an error rather than returning false. To avoid this failure (for example, when reading the URI from an input document), add the test
    if ($uri castable as xs:anyURI)...
    .
  • Once a document URI has been reported as unavailable, it remains unavailable for the rest of the transformation. In theory at least, you can't use
    doc-available()
    repeatedly to see whether the document has been created yet, and you can't use it to test for the existence of files created during the transformation by the

    instruction.

Other books

Hip Deep in Dragons by Christina Westcott
Lucky by Vail, Rachel
Path of the She Wolf by Theresa Tomlinson
El Tribunal de las Almas by Donato Carrisi
Magic to the Bone by Devon Monk
Fatal Convictions by Randy Singer