XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (7 page)

As you can see, XSLT is not just for converting XML documents to HTML.

How Does XSLT Transform XML?

By now you are probably wondering exactly how XSLT goes about processing an XML document in order to convert it into the required output. There are usually two aspects to this process:

1.
The first stage is a structural transformation, in which the data is converted from the structure of the incoming XML document to a structure that reflects the desired output.

2.
The second stage is formatting, in which the new structure is output in the required format such as HTML or PDF.

The second stage covers the ground we discussed in the previous section; the data structure that results from the first stage can be output as HTML, as a text file, or as XML. HTML output allows the information to be viewed directly in a browser by a human user or be input into any modern word processor. Plain text output allows data to be formatted in the way an existing application can accept, for example comma-separated values or one of the many text-based data interchange formats that were developed before XML arrived on the scene. Finally, XML output allows the data to be supplied to applications that accept XML directly. Typically, this will use a different vocabulary of XML tags from the original document: for example, an XSLT transformation might take the monthly sales figures as its XML input and produce a histogram as its XML output, using the XML-based SVG standard for vector graphics. Or, you could use an XSLT transformation to generate Voice XML output, for aural rendition of your data.

Information about VoiceXML can be found at
http://www.voicexml.org/
.

Let's now delve into the first stage, transformation—the stage with which XSLT is primarily concerned and which makes it possible to provide output in all of these formats. This stage might involve selecting data, aggregating and grouping it, sorting it, or performing arithmetic conversions such as changing centimeters to inches.

XSLT is not the only way of processing or transforming XML. For example, you can write applications in Java or C# that build a tree-like representation of an XML document (using the W3C-defined Document Object Model, or other similar models such as JDOM or XOM). Your program could interrogate this tree structure to find the specific information needed. It would do so by defining a specific
sequence of steps
to be followed in order to produce the required output.

So, how is using XSLT to perform transformations on XML better than writing custom applications? Well, the design of XSLT is based on a recognition that these programs are all very similar, and it should therefore be possible to describe what they do using a high-level
declarative
language rather than writing each program from scratch in Java or C#. The required transformation can be expressed as a set of rules. These rules are based on defining what output should be generated when particular patterns occur in the input. The language is declarative in the sense that you describe the transformation you require, rather than providing a sequence of procedural instructions to achieve it. XSLT describes the required transformation and then relies on the XSLT processor to decide the most efficient way to go about it.

XSLT still relies on an XML parser—it might be a DOM parser or a SAX-compliant one, or one of the new breed of “pull parsers”—to convert the XML document into a tree structure. It is the structure of this tree representation of the document that XSLT manipulates, not the document itself. If you are familiar with the DOM, then you will be happy with the idea of treating every item in an XML document (elements, attributes, processing instructions, and so on) as a node in a tree. With XSLT we have a high-level language that can navigate around a node tree, select specific nodes, and perform complex manipulations on these nodes.

The XSLT tree model is similar in concept to the DOM, but it is not the same. The full XSLT processing model is discussed in Chapter 2.

The description of XSLT given thus far (a declarative language that can navigate to and select specific data and then manipulate that data) may strike you as being similar to that of the standard database query language, SQL. Let's take a closer look at this comparison.

XSLT and SQL: An Analogy

In a relational database, the data consists of a set of tables. By themselves, the tables are not of much use, the data might as well be stored in flat files in comma-separated values format. The power of a relational database doesn't come from its data structure; it comes from the language that processes the data, SQL. In the same way, XML on its own just defines a data structure. It's a bit richer than the tables of the relational model, but by itself it doesn't actually do anything very useful. It's when we get a high-level language expressly designed to manipulate the data structure that we start to find we've got something interesting on our hands, and for XML data the main language that does that is XSLT.

Superficially, SQL and XSLT are very different languages. But if you look below the surface, they actually have a lot in common. For starters, in order to process specific data, be it in a relational database or an XML document, the processing language must incorporate a declarative query syntax for selecting the data that needs to be processed. In SQL, that's the
SELECT
statement. In XSLT, the equivalent is the
XPath expression
.

The XPath expression language forms an essential part of XSLT, though it is actually defined in a separate W3C Recommendation (
http://www.w3.org/TR/xpath
) because it can also be used independently of XSLT (the relationship between XPath and XSLT is discussed further on page 22).

The XPath syntax is designed to retrieve nodes from an XML document, based on a path through the XML document or the context in which the node appears. It allows access to specific nodes, while preserving the hierarchy and structure of the document. XSLT instructions are then used to manipulate the results of these queries, for example by rearranging selected nodes and constructing new nodes.

There are further similarities between XSLT and SQL:

  • Both languages augment the basic query facilities with useful additions for performing arithmetic, string manipulation, and comparison operations.
  • Both languages supplement the declarative query syntax with semiprocedural facilities for describing the processing to be carried out, and they also provide hooks to escape into conventional programming languages where the algorithms start to get too complex.
  • Both languages have an important property called
    closure
    , which means that the output has the same data structure as the input. For SQL, this structure is tables, for XSLT it is trees—the tree representation of XML documents. The closure property is extremely valuable because it means operations performed using the language can be combined end-to-end to define bigger, more complex operations: you just take the output of one operation and make it the input of the next operation. In SQL you can do this by defining views or subqueries; in XSLT you can do it by passing your data through a series of stylesheets, or by capturing the tree produced by one transformation phase in a variable and using that variable as the input of another transformation phase. This last feature is new in XSLT 2.0, though most XSLT 1.0 processors offered a similar capability as a language extension.

In the real world, of course, XSLT and SQL have to coexist. There are many possible relationships, but typically, data is stored in relational databases and transmitted between systems in XML. The two languages don't fit together as comfortably as one would like, because the data models are so different. But XSLT transformations can play an important role in bridging the divide. All of the major relational database vendors have released extensions that allow XML data to be stored and manipulated directly within what is still nominally a relational database.

Before we look at a simple working example of an XSLT transformation, we should briefly discuss a few of the XSLT processors that are available to effect these transformations.

XSLT Processors

The job of an XSLT processor is to apply an XSLT stylesheet to an XML source document and produce a result document.

With XSLT 1.0, there are quite a few good XSLT processors to choose from, and many of them can be downloaded free of charge (but do read the licensing conditions).

If you're using Microsoft technology, there is a choice of two products. The most widely used option is MSXML (Google for “Download MSXML” to find it—the current version is MSXML 6). Usually, I'm not Microsoft's greatest fan, but with this processor it's generally agreed that they have done an excellent job. This product comes as standard with Internet Explorer and is therefore the preferred choice for running transformations in the browser. For the .NET environment, however, Microsoft developed a new processor. This doesn't have a product name of its own, other than its package name within the .NET framework, which is
System.Xml.Xsl
. This processor was initially said to be significantly slower than the MSXML product, but that seems to be no longer true, and if you're writing your application using .NET technologies such as C# and ASP.NET, it's probably the one that you'll find more convenient.

In the Java world, there's a choice of XSLT 1.0 products. There's my own Saxon product (version 6.5.5 is the version that supports XSLT 1.0) available from
http://saxon.sf.net/
, and there's the Xalan-J product which is bundled with Sun's Java JDK software from JDK 1.4 onwards. There's also an Oracle implementation, and a new processor from Intel.

For C and C++ the most popular processor is the
libxslt
engine (
http://xmlsoft.org/XSLT/
).

Most of these products are XSLT interpreters, but there are two well-known XSLT compilers: XSLTC, which is distributed as part of the Xalan-J package mentioned earlier, and Gregor, from Jacek Ambroziak (
http://www.ambrosoft.com/gregor.html
). At the top end of the market, for those with a large budget, you can also get hardware-accelerated engines from IBM and Intel.

With XSLT 2.0, at the time of writing, the choice is more limited. There are currently three XSLT 2.0 processors available:

  • My own Saxon product. This is available in two variants, corresponding to the two conformance levels defined in the W3C specification: Saxon-B 9.x (
    http://saxon.sf.net/
    ) is an open-source implementation of a basic XSLT 2.0 processor, and Saxon-SA 9.x (
    http://www.saxonica.com/
    ) is a commercial implementation of a schema-aware XSLT processor. Both variants are available for both the Java and .NET platforms. You can run most of the examples in this book, using a basic XSLT processor, but Chapter 4 and Chapter 11 focus on the additional capability of XSLT when used with XML Schemas, with examples that will only run with a schema-aware XSLT processor.
  • Saxon as delivered offers a command line interface and a Java or .NET API. A number of toolsets provide graphical user interfaces on top: the simplest is the open-source Kernow product (
    kernowforsaxon.sourceforge.net
    ); other options are the comprehensive Stylus Studio development environment (
    www.stylusstudio.com
    ) and the oXygen XML editor (
    www.oxygenxml.com
    ), both of which support step-by-step Saxon debugging.
  • Altova (
    www.altova.com
    ) have released their own XSLT 2.0 processor, which is available either as a freestanding COM component with a command line interface and Windows API, or built in to the popular XML Spy development environment. As a standalone component the product is free (but not open source), and offers both schema-aware and non-schema-aware processing. The level of conformance of the Altova product has been improving with each release, and nearly all the examples in this book run successfully with the 2008 edition of this product.
  • Gestalt is an open-source XSLT 2.0 processor written in the Eiffel language by Colin Adams (
    http://sourceforge.net/projects/gestalt
    ). The product is work in progress, and although the software is under active development, the documentation is a little sketchy and this implementation is unlikely to appeal to you unless you are something of an Eiffel enthusiast. We have not tested the examples in this book against the Gestalt processor.

By reading papers published at research conferences and blog postings by employees, you can make an educated guess that implementations are underway within IBM, Oracle and Microsoft, but none of these companies has announced any official product plans or release dates at the time of writing—and of course, not everything those companies do in their research labs sees the light of day as a product.

I think it's a little unlikely that there will be quite as many XSLT 2.0 processors as there are for XSLT 1.0 (there is bound to be some shakeout in a maturing market), but I'm confident there will be four or five, which should be enough.

Meanwhile, you can use Altova or Saxon, and those are the primary vehicles I will be using for all the examples in this book.

A Simple XSLT Stylesheet

We're now ready to take a look at an example of using XSLT to transform a very simple XML document.

Example: A “Hello, world!” XSLT Stylesheet

Kernighan and Ritchie in their classic
The C Programming Language
(Prentice-Hall, 1988) originated the idea of presenting a trivial but complete program right at the beginning of the book, and ever since then the
Hello
world
program has been an honored tradition. Of course, a complete description of how this example works is not possible until all the concepts have been defined, and so if you feel I'm not explaining it fully, don't worry—the explanations will come later.

Input

What kind of transformation would we like to do? Let's try transforming the following XML document.



Hello, world!

This document is available as file
hello.xml
in the download directory for this chapter.

A simple node-tree representation of this document is shown in
Figure 1-2
.

There are four nodes in this tree: a document node that represents the document as a whole; an

processing instruction that identifies the stylesheet to be used; the

element; and the text within the

element.

The document node in the XSLT model performs the same function as the document node in the DOM model (it was called the root node in XSLT 1.0, but the nomenclature has been brought into line with the DOM). The XML declaration is not visible to the XSLT processor and, therefore, is not included in the tree.

I've deliberately made it easy by including an

processing instruction in the source XML file. Many XSLT processors will use this to identify the stylesheet if you don't specify a different stylesheet to use. The
href
attribute gives the relative URI of the default stylesheet for this document.

Output

Our required output is the following HTML, which will simply change the browser title to
“Today's
Greeting”
and display whatever greeting is in the source XML file:



   Today's greeting



   

Hello, world!




XSLT Stylesheet

Without any more ado, here's the XSLT stylesheet
hello.xsl
to effect the transformation. This is an XSLT 1.0 stylesheet, hence the
version = “1.0”
in the

element.


   version=“1.0”

   xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”>


   

   

      Today's greeting

   

   

      


   

   



Running the Stylesheet

There are a number of ways you can run this stylesheet, which we'll look at in the following sections. It may be worth trying several of the different approaches, to find out which you are most comfortable with—familiarity with the tools will help you master the examples in the rest of the book, as well as making you a more proficient XSLT developer.

Whichever approach you use, the first stage is to download the sample code for this book from the Wrox web site (
www.wrox.com
), as explained in the Introduction. The code is organized by chapter, so you'll find the two files
hello.xml
and
hello.xsl
in the directory
ch01
.

If you're a serious XSLT developer then your company may well have invested in an XML development environment such as XML Spy, Stylus Studio, or oXygen. All these products have the capability to run XSLT 2.0 stylesheets: XML Spy uses Altova's XSLT 2.0 processor (though it can also be configured to run Saxon), while Stylus Studio and oXygen both use Saxon. We won't cover the use of these development environments in this book.

Using the Browser

The easiest way to run this example is simply to load the XML file
hello.xml
into any recent version of Internet Explorer or Firefox (right-click the file and select
Open With…
). The browser will recognize the

processing instruction and will use this to fetch the stylesheet and execute it. The result is a display like the one in
Figure 1-3
.

We can run this example in the browser because it is actually an XSLT 1.0 stylesheet. Most modern browsers support XSLT 1.0 processing in this way, but at the time of writing none yet supports XSLT 2.0.

Using Saxon from Kernow

We'll be running most of the XSLT 2.0 examples in this book using Saxon, and one of the easiest ways to run Saxon is by using a graphical front-end called Kernow. You can download Kernow from
http://kernowforsaxon.sourceforge.net/
. It includes the Saxon JAR files, so the only prerequisite is that Java itself is installed. You will need Java Standard Edition 6 or later, which you can get from
http://java.sun.com/
. (Saxon itself works with JDK 1.4 or later, the requirement for Java SE 6 comes from Kernow.)

To install Kernow, the only thing you need to do is to unzip the download file into a suitable directory, and if you are running Windows you can then start the product by double-clicking the
Run.bat
file. In the dialog box that appears, select the “Single File” tab, and browse to the source and stylesheet files. Then click Run to run the transformation. The output is as shown in
Figure 1-4
.

Note that this time, the result is shown in terms of raw HTML. If you want to see what the HTML looks like in a browser, you can always save it to a file and then open the file in your browser. But Kernow is designed for developers, not for end users, and as a developer you need to see the HTML that you have generated. I wouldn't advise anyone to do serious XSLT development by testing directly in a browser, because it's hard to see what's gone wrong when you make a mistake.

Using Saxon on Java from the Command Line

If you're a professional software developer then you're probably not averse to running utilities from the command line, which is the natural way to test your stylesheets when you use Saxon “out of the box.” The steps are as follows:

1.
Ensure you have Java installed on your machine. Saxon will work with JDK 1.4 or later. Saxon is pure Java code, and so it will run on any platform that supports Java, but I will usually assume that you are using a Windows machine.

2.
Download the Saxon-B processor from
http://sf.net/saxon
. Choose the most recent version of Saxon-B for Java, and unzip the download file into a suitable directory, for example
c:\saxon
.

3.
Bring up an MSDOS-style console window (use Start ⇒ Run, and type “cmd”).

4.
Type the following at the command prompt:

java -jar c:\saxon\saxon9.jar -a -s:hello.xml

5.
Admire the HTML displayed on the standard output.

If you want to view the output using your browser, simply save the command line output as an HTML file, in the following manner:

java -jar c:\saxon\saxon8.jar -a -s:hello.xml -o:hello.html

(Using the command prompt in Windows isn't much fun. If you're a Unix fan, install Cygwin. Most editors have the ability to invoke a command line processor that is generally much more usable than the one provided by the operating system. I use UltraEdit, but there are many other choices available.)

Using Saxon on .NET

The Saxon XSLT processor also runs on the .NET platform. You'll need to make sure that .NET (either version 1.1 or 2.0) is installed—if it's not already present on your machine, it's a free download from Microsoft. Download the latest version of Saxon-B for .NET from
http://sf.net/saxon
, and unzip the download file into a directory such as
c:\saxon
.

You can then run it using the command line:

c:\saxon\Transform -a -s:hello.xml -o:hello.html

If you want to develop applications with this product, you'll need to install Saxon in the Global Assembly Cache, but for running from the command line, this isn't necessary.

Using the Altova XSLT processor

You can obtain the freestanding AltovaXML product from
www.altova.com
. The current version at the time of writing is AltovaXML 2008. This includes separate XSLT 1.0 and 2.0 processors, and an XQuery engine. The product installs by default into
c:\Program Files\Altova\AltovaXML2008
. Assuming this directory is on your PATH, you can then run our example transformation with the command line:

Other books

The Nightmare Charade by Mindee Arnett
La señal de la cruz by Chris Kuzneski
The Wine of Dreams by Brian Craig - (ebook by Undead)
The Forever Drug by Lisa Smedman
The Transformation of the World by Camiller, Patrick, Osterhammel, Jrgen
Wicked Autumn by G. M. Malliet
The Case of the Late Pig by Margery Allingham
Blood Red Roses by Lin Anderson
The Squad Room by John Cutter