XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (177 page)

This is the kind of expression that is used to sort out the sheep from the goats when doing XPath conformance testing, and you'll find lots of cases like this in the W3C test suites for XSLT and XQuery. In practice, however, the issues rarely arise in real applications.

Recursion

Stylesheet functions can be recursive: they can call themselves either directly or indirectly. Because XSLT is a functional programming language without side effects (at any rate, without updateable variables), recursion plays a key role in writing algorithms of any complexity.

In XSLT 1.0, such algorithms were written using recursive named templates. Apart from being rather long-winded, this had the drawback that it was difficult to return certain kinds of result: templates in XSLT 1.0 could only return results by constructing new nodes. In XSLT 2.0 both templates and functions offer much more flexibility in the types of result they can return, and stylesheet functions offer additional flexibility in the way they can be called (for example, they can be called from within a predicate used in a match pattern).

As it happens, many of the simple problems where recursion was needed in XSLT 1.0 can now be solved in other ways, because XPath 2.0 offers a wider range of aggregation functions, and functions such as
tokenize()
to break up a string into its parts. But there are still cases where recursion is needed, and even when it isn't, many people find the recursive solution more elegant than an iterative solution using

or the XPath
for
expression.

Here is a function that extracts the part of a string after the last
/
character:


   

   

         if (contains($in, ‘/’)

         then str:suffix(substring-after($in, ‘/’))

         else $in”/>


Note how all the logic is contained in a single XPath expression. I could have used

to express the same logic, but with this kind of function I personally find it clearer to write the whole algorithm at the XPath level.

The function result is returned using an

instruction. It sometimes feels a little strange to use

when the value being computed is a single integer or string, but in the XDM model a single value is the same as a sequence of length 1, so it's worth getting used to the idea that everything is a sequence. When returning an atomic value, one could just as easily use

, but I prefer

because it works in all cases; when you are returning nodes, you don't want to copy them unnecessarily. You might also be tempted to use

, and in this case it would work, but the semantics aren't quite what we want:

would convert the selected string into a text node, which would then be atomized back to a string by virtue of the function's declared return type. Even if the optimizer can sort this out, it's making unnecessary work.

Another observation about this function is that it is tail-recursive. This means that after calling itself, the function does nothing else before it returns to its caller. This property is important because tail-recursive functions can be optimized to save memory. Sometimes a problem with recursion is that if you recurse too deeply, say to 1000 levels or so, the system runs out of stack space and terminates with a fatal error. A good XSLT processor can optimize a tail-recursive function so that this doesn't happen. Basically the trick is that instead of making the recursive call and then unwinding the stack when it regains control, the function can unwind the stack first, and then make the recursive call. This is because the stack frame isn't needed once the recursive call returns.

I mentioned earlier that many functions can now be implemented easily without using recursion, and this one is no exception. It could also be written using regular expressions:


  

  
/]*)’, ‘$1’)”/>


But this is a matter of personal style. It's a good idea to have more than one tool in your kitbag, and recursion is the most powerful one available.

One kind of problem where you will need recursion is when you need to analyze a graph (that is, a network of related objects). The next example shows the technique.

Example: Looking for Cycles among Attribute Sets

This example examines a stylesheet module (we'll stick to a single module for simplicity) and determines whether it contains any cycles among its attribute set definitions. Also to keep it simple, I'll assume that attribute sets have simple names and ignore the complications caused by namespace prefixes.

Source

The source document is any stylesheet module. But the example is only interesting if you run it on a stylesheet module that contains attribute sets that are (incorrectly) defined in terms of themselves. I've included such a sample as
cyclic-stylesheet.xsl
.

This example requires a schema-aware processor and assumes that the source document is validated against the schema for XSLT 2.0 stylesheets, which is included in the download as
schema-for-xslt20.xsd
.

The AltovaXML 2008 schema processor, unfortunately, rejects this schema as invalid.

Output

The stylesheet will report the name of any attribute set that is defined directly or indirectly in terms of itself.

Stylesheet

This stylesheet is available as
find-cycles.xsl
. We'll start with a function that returns the attribute sets that are directly referenced from a given attribute set:


  

  

                                   [@name=$in/@use-attribute-sets]”/>


This returns any
attribute-set
in the containing document whose
name
attribute is equal to any of these strings. Notice what's going on: The schema for XSLT stylesheets tells us that the
use-attribute-sets
attribute is a sequence of strings. The
=
operator causes this attribute to be atomized, which returns the typed value of the attribute, namely this sequence of strings; it then returns true if any of the strings in this sequence is equal to the other operand. This will only work if the input document has been validated against its schema.

Now observe that an attribute set
A
references another attribute set
B
if either
cyc:direct(A)
includes
B
, or there is an attribute set in
cyc:direct(A)
that references
B
. This suggests the recursive function:


  

  

  

       (cyc:direct($A) intersect $B)

       or (some $X in cyc:direct($A) satisfies cyc:references($X, $B))”/>


Unfortunately, this doesn't quite work: if we apply it to a node that doesn't participate in a cycle, but which references a node that does, then we'll go into an infinite loop. To protect against this we need to remember the route by which a node was reached and avoid processing it if it is reached again. The corrected function is:


  

  

  

  

  

           (some $C in ($direct except $route)

                 satisfies cyc:references($C, $B, ($route, $C)))”/>


Now, finally, you can discover whether there are any cycles, that is, any attribute sets that reference themselves directly or indirectly:

                      satisfies cyc:references($X, $X, ())”/>

This function isn't tail-recursive. The recursive call is inside a loop (the
some
expression). However, this is unlikely to matter in this case, because the chains of references from one attribute set to another are likely to be short.

An interesting observation about this function is that it is written entirely using constructs that are new in XSLT 2.0 and XPath 2.0: stylesheet functions, type checking based on schema-defined types, atomization of a list-valued attribute, the

instruction that allows XSLT instructions to return values other than nodes, the XPath
intersect
and
except
operators, and the XPath
some..satisfies
expression.

Other books

Savage Smoke by Kay Dee Royal
Playing the Playboy by Noelle Adams
Asgard's Heart by Brian Stableford
The Photograph by Penelope Lively
Lucía Jerez by José Martí
The Skrayling Tree by Michael Moorcock