XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (12 page)

In late 1995, a W3C-sponsored workshop on stylesheet languages was held in Paris. In view of the subsequent role of James Clark as editor of the XSLT Recommendation, it is interesting to read the notes of his contribution on the goals of DSSSL, which can be found at
http://www.w3.org/Style/951106_Workshop/report1.html#clark
.

Here are a few selected paragraphs from these notes:

DSSSL contains both a transformation language and a formatting language. Originally the transformation was needed to make certain kinds of styles possible (such as tables of contents). The query language now takes care of that, but the transformation language survives because it is useful in its own right.

The language is strictly declarative, which is achieved by adopting a functional subset of Scheme. Interactive stylesheet editors must be possible.

A DSSSL stylesheet very precisely describes a function from SGML to a flow object tree. It allows partial stylesheets to be combined (“cascaded” as in CSS): some rule may override some other rule, based on implicit and explicit priorities, but there is no blending between conflicting styles.

James Clark closed his talk with the remark:

Creating a good, extensible style language is hard!

One suspects that the effort of editing the XSLT 1.0 Recommendation didn't cause him to change his mind.

The First XSL Proposal

Following these early discussions, the W3C set up a formal activity to create a stylesheet language proposal. The remit for this group specified that it should be based on DSSSL.

As an output of this activity came the first formal proposal for XSL, dated August 27, 1997. Entitled
A Proposal for XSL
, it lists 11 authors: James Clark (who works for himself), five from Microsoft, three from Imso Corporation, one from ArborText, and one (Henry Thompson) from the University of Edinburgh. The document can be found at
http://www.w3.org/TR/NOTE-XSL.html
.

The section describing the purpose of the language is worth reading.

XSL is a stylesheet language designed for the Web community. It provides functionality beyond CSS (e.g. element reordering). We expect that CSS will be used to display simply structured XML documents and XSL will be used where more powerful formatting capabilities are required or for formatting highly structured information such as XML structured data or XML documents that contain structured data.

Web authors create content at three different levels of sophistication given as follows:

  • markup: relies solely on a declarative syntax
  • script: additionally uses code “snippets” for more complex behaviors
  • program: uses a full programming language

XSL is intended to be accessible to the “markup” level user by providing a declarative solution to most data description and rendering requirements. Less common tasks are accommodated through a graceful escape to a familiar scripting environment. This approach is familiar to the Web publishing community as it is modeled after the HTML/JavaScript environment.

The powerful capabilities provided by XSL allow:

  • formatting of source elements based on ancestry/descendency, position, and uniqueness
  • the creation of formatting constructs including generated text and graphics
  • the definition of reusable formatting macros
  • writing-direction independent stylesheets
  • extensible set of formatting objects

The authors then explained carefully why they had felt it necessary to diverge from DSSSL and described why a separate language from CSS (Cascading Style Sheets) was thought necessary.

They then stated some design principles:

  • XSL should be straightforwardly usable over the Internet.
  • XSL should be expressed in XML syntax.
  • XSL should provide a declarative language to do all common formatting tasks.
  • XSL should provide an “escape” into a scripting language to accommodate more sophisticated formatting tasks and to allow for extensibility and completeness.
  • XSL will be a subset of DSSSL with the proposed amendment.
    (As XSL was no longer a subset of DSSSL, they cannily proposed amending DSSSL so it would become a superset of XSL.)
  • A mechanical mapping of a CSS stylesheet into an XSL stylesheet should be possible.
  • XSL should be informed by user experience with the FOSI stylesheet language.
  • The number of optional features in XSL should be kept to a minimum.
  • XSL stylesheets should be human-legible and reasonably clear.
  • The XSL design should be prepared quickly.
  • XSL stylesheets shall be easy to create.
  • Terseness in XSL markup is of minimal importance.

As a requirements statement, this doesn't rank among the best. It doesn't read like the kind of list you get when you talk to users and find out what they need. It's much more the kind of list that designers write when they know what they want to produce, including a few political concessions to the people who might raise objections. But if you want to understand why XSLT became the language it did, this list is certainly evidence of the thinking.

The language described in this first proposal contains many of the key concepts of XSLT as it finally emerged, but the syntax is virtually unrecognizable. It was already clear that the language should be based on templates that handled nodes in the source document matching a defined pattern, and that the language should be free of side effects, to allow “progressive rendering and handling of large documents.” I'll explore the significance of this requirement in more detail on page 34, and discuss its implications on the way stylesheets are designed in Chapter 17. The basic idea is that if a stylesheet is expressed as a collection of completely independent operations, each of which has no external effect other than generating part of the output from its input (for example, it cannot update global variables), then it becomes possible to generate any part of the output independently if that particular part of the input changes. Whether the XSLT language actually achieves this objective is still an open question.

The first Working Draft of XSL (not to be confused with the Proposal) was published on August 18, 1998, and the language started to take shape, gradually converging on the final form it took in the November 16, 1999 Recommendation through a series of Working Drafts, each of which made radical changes, but kept the original design principles intact.

A Recommendation is the most definitive of documents produced by the W3C. It's not technically a standard, because standards can only be published by government-approved standards organizations. But I will often refer to it loosely as “the standard” in this book.

Saxon

At this point it might be a good idea to clarify how I got involved in the story. In 1998 I was working for the British computer manufacturer ICL, a part of Fujitsu. Fujitsu, in Japan, had developed an object database system, later marketed as Jasmine, and I was using this technology to build content management applications for large publishers. We developed a few successful large applications, but found that it was too complex for people who wanted something in six weeks rather than six months. So I was asked to look at what we could do with XML, which was just appearing on the horizon.

I came to the conclusion that XML looked like a good thing, but that there wasn't any software. So I developed the very first early versions of Saxon to provide a proof-of-concept demonstration. At that stage Saxon was just a Java library, not an XSLT processor, but as the XSL standards developed I found that my own ideas were converging more and more with what the W3C working group was doing, and I started implementing the language as it was being specified. ICL had decided that its marketing resources were spread thinly over too many products, and so the management took the imaginative decision to make the technology available as open source. Seventeen days after the XSLT 1.0 specification was published in November 1999, I announced the first conformant implementation. And on the day it was published, I started work on the first edition of this book.

When the book was published, the XSL Working Group invited me to join and participate in the development of XSLT 1.1. Initially, being based in the United Kingdom and with limited time available for the work, my involvement was fairly sporadic. But early in 2001 I changed jobs and joined Software AG, which wanted me to take a full role in the W3C work. The following year James Clark pulled out of the Working Group, and I stepped into his shoes as editor.

The reason I'm explaining this sequence of events is that I hope it will help you to understand the viewpoint from which this book is written. When I wrote the first edition I was an outsider, and I felt completely free to criticize the specification when I felt it necessary. I have tried to retain an objective approach in the present edition, but as editor of the language spec it is much more difficult to be impartial. I've tried to keep a balance: it wouldn't be fair to use the book as a platform to push my views over those of my colleagues of the working group, but at the same time, I've made no effort to be defensive about decisions that I would have made differently if they had been left to me.

Software AG continued to support my involvement in the W3C work (on the XQuery group as well as the XSL group), along with the development of Saxon and the writing of this book, through till February 2004, at which point I left to set up my own company, Saxonica.

Beyond XSLT 1.0

After XSLT 1.0 was published, the XSL Working Group responsible for the language decided to split the requirements for enhancements into two categories: XSLT 1.1 would standardize a small number of urgent features that vendors had already found necessary to add to their products as extensions, while XSLT 2.0 would handle the more strategic requirements that needed further research.

A working draft of XSLT 1.1 was published on December 12, 2000. It described three main enhancements to the XSLT 1.0 specification: the ability to produce multiple output documents, the ability to use temporary trees to create a multi-pass transformation, and standard bindings to extension functions written in Java or ECMAScript.

For a number of reasons XSLT 1.1 never got past the working draft stage. This was partly because of controversy surrounding the Java language bindings, but more particularly because it was becoming clearer that XSLT 2.0 would be a fairly radical revision of the language, and the Working Group didn't want to do anything in 1.1 that would get in the way of achieving the 2.0 goals. There were feelings, for example, that the facility for temporary trees might prejudice the ability to support sequences in 2.0, a fear which as it happens proved largely unfounded.

Convergence with XQuery

By the time work on XSLT 2.0 was starting, the separate XQuery Working Group in W3C had created a draft of its own language.

While the XSL Working Group had identified the need for a transformation language to support a self- contained part of the formatting process, XQuery originated from the need to search large quantities of XML documents stored in a database.

Work on an XML Query Language had started as early as 1998. A workshop was held in December 1998, and you can find all 66 position papers presented at this workshop at
http://www.w3.org/TandS/QL/QL98/pp.html
. It's interesting to see how the participants saw the relationship with XSL, as it was then known. The Microsoft position paper states the belief that a query language could be developed as an extension of XSLT, but in this it is almost alone. Many of the participants came from a database background, with ideas firmly rooted in the tradition of SQL and object database languages such as OQL, and to these people, XSL didn't look remotely like a query language. But in the light of subsequent events, it's interesting to read the position paper from the XSL Working Group, which states in its summary:

1.
The query language should use XSL patterns as the basis for information retrieval.

2.
The query language should use XSL templates as the basis for materializing query results.

3.
The query language should be at least as expressive as XSL is, currently.

4.
Development of the pattern and transformation languages should remain in the XSL Working Group.

5.
A coordination group should ensure either that a single query language satisfies all working group requirements or that all W3C query languages share an underlying query model.

(Remember that XPath had not yet been identified as a separate language, and that the expressions that later became XPath were then known as patterns.)

This offer to coordinate, and the strong desire to ensure consistency among the different W3C specifications, can be seen as directly leading to the subsequent collaboration between the two working groups to define XPath 2.0.

The XQuery group started meeting in September 1999. The first published requirements document was published the following January (
http://www.w3.org/TR/2000/WD-xmlquery-req-20000131
). It included a commitment to compatibility with XML Schema, and a rather cautiously worded promise to “take into consideration the expressibility and search facilities of XPath when formulating its algebra and query syntax.” July 2000 saw a revised requirements document that included a selection of queries that the language must be able to express. The first externally visible draft of the XQuery language was published in February 2001 (see
http://www.w3.org/TR/2001/WD-xquery-20010215/
), and it was at this stage that the collaboration between the two working groups began in earnest.

The close cooperation between the teams developing the two languages contrasts strangely with the somewhat adversarial position adopted by parts of the user community. XSLT users were quick to point out that XSLT 1.0 satisfied every single requirement in the first XQuery requirements document, and could solve all the use cases published in the second version in August 2000. At the same time, users on the XQuery side of the fence have often been dismissive about XSLT, complaining about its verbose syntax and sometimes arcane semantics. Even today, when the similarities of the two languages at a deep level are clearly apparent, there is little overlap between their user communities: I find that most users of the XQuery engine in Saxon have no XSLT experience. The difference between XSLT and XQuery is in many ways a difference of style rather than substance, but users often feel strongly about style.

Other books

World War IV: Empires by James Hunt
Chourmo by Jean-Claude Izzo, Howard Curtis
Mechanical by Bruno Flexer
Games Boys Play by Zoe X. Rider
Carola Dunn by The Fortune-Hunters
Tigerland by Sean Kennedy