XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (788 page)

1.
Use the value of the system property
javax.xml.parsers.DocumentBuilderFactory
if it is available. You can typically set system properties using the --
D
option on the Java command line, or by calling
System.setProperty()
from your application.

2.
Look for a properties file
$JAVA_HOME/lib/jaxp.properties
, and within this file, for the property named:

javax.xml.parsers.DocumentBuilderFactory

3.
Use the services API, which is part of the JAR specification. In practice, this means that the DOM implementation used will be the first one found on the classpath.

It is likely that when you install a particular DOM implementation, it will contain a file in its
.jar
archive that makes that particular implementation the default, so if you don't do anything to select a specific implementation, the one chosen will depend on the order of files and directories on your class path.

As with SAX, the default parser changed from Crimson to Xerces 2 with the introduction of Java 5.

Once you have a
DocumentBuilderFactory
, you can use a number of methods to configure it. Finally, you can call the
newDocumentBuilder()
method to return a
DocumentBuilder
. The methods available are:

Method
Description
Object getAttribute(String)
Gets information about the properties of the underlying implementation
boolean isCoalescing()
Determines whether the resulting
DocumentBuilder
will merge CDATA nodes into their adjacent text nodes
boolean isExpandEntityReferences()
Determines whether the resulting
DocumentBuilder
will expand entity references and merge their content into the adjacent text nodes
boolean isIgnoringComments()
Determines whether the resulting
DocumentBuilder
will ignore comments in the source XML
boolean isIgnoringElement ContentWhitespace()
Determines whether the resulting
DocumentBuilder
will ignore whitespace in element content
boolean isNamespaceAware()
Determines whether the resulting
DocumentBuilder
is namespace aware
boolean isValidating()
Determines whether the resulting
DocumentBuilder
will validate the XML source
DocumentBuilder newDocumentBuilder()
Returns a new
DocumentBuilder
configured as specified by previous calls
static DocumentBuilderFactory newInstance()
Returns a vendor-specific
DocumentBuilderFactory
selected according to the rules given above
setAttribute(String, Object)
Sets vendor-specific properties on the underlying implementation
void setCoalescing(boolean)
Determines whether the resulting
DocumentBuilder
will merge CDATA nodes into their adjacent text nodes
void setExpandEntityReferences (boolean)
Determines whether the resulting
DocumentBuilder
will expand entity references and merge their content into the adjacent text nodes
void setIgnoringComments (boolean)
Determines whether the resulting
DocumentBuilder
will ignore comments in the source XML
void setIgnoringElementContent Whitespace(boolean)
Determines whether the resulting
DocumentBuilder
will ignore whitespace in element content
void setNamespaceAware(boolean)
Determines whether the resulting
DocumentBuilder
is namespace aware
void setValidating(boolean)
Determines whether the resulting
DocumentBuilder
will validate the XML source

javax.xml.parsers.DocumentBuilder

A
DocumentBuilder
is always obtained by calling the
newDocumentBuilder()
method of a
DocumentBuilderFactory
.

A
DocumentBuilder
performs the task of parsing a source XML document and returning the resulting instance of
org.w3.dom.Document
, containing the root of a tree representation of the document in memory.

The source document is specified in similar ways to the input for a SAX parser. This doesn't mean that a
DocumentBuilder
has to use a SAX parser to do the actual parsing: some will work this way and others won't. It's defined this way to avoid unnecessary differences between the SAX and DOM approaches.

You might be aware that in the Microsoft DOM implementation, the
Document
class has a method
load()
that parses a source XML file and constructs a
Document
object. This is a Microsoft extension; there is no corresponding method in the W3 C DOM definition. This
DocumentBuilder
class fills the gap.

The methods available are:

Method
Description
boolean isNamespaceAware()
Indicates whether the parser understands XML namespaces.
boolean isValidating()
Indicates whether the parser validates the XML source.
Document newDocument()
Returns a new
Document
object with no content. The returned
Document
can be populated using DOM methods such as
createElement()
.
Document parse(File)
Parses the XML in the supplied file, and returns the resulting
Document
object.
Document parse(InputSource)
Parses the XML in the supplied SAX
InputSource
, and returns the resulting
Document
object.
Document parse(InputStream)
Parses the XML in the supplied
InputStream
, and returns the resulting
Document
object. Note that the System ID of the source document will be unknown, so it will not be possible to resolve any relative URIs contained in the document.
Document parse(InputStream, String)
Parses the XML in the supplied
InputStream
, and returns the resulting
Document
object. The second argument supplies the System ID of the source document, which will be used to resolve any relative URIs contained in the document.
Document parse(String)
Parses the XML in the document identified by the supplied URI, and returns the resulting
Document
object.
void setEntityResolver (EntityResolver)
Supplies a SAX
EntityResolver
to be used during the parsing.
void setErrorHandler(Error Handler)
Supplies a SAX
ErrorHandler
to be used during the parsing.

The JAXP Transformation API

The previous sections provided a summary of the classes and methods defined in JAXP to control XML parsing. This section covers the classes and methods used to control XSLT transformation.

These classes are designed so they could be used with transformation mechanisms other than XSLT; for example, they could in principle be used to invoke XQuery (however, a different API called XQJ is under development for XQuery, which has more in common with JDBC). But XSLT is the primary target and is the one we will concentrate on.

There is one other kind of transformation that's worth mentioning, however, and this is an identity transformation in which the result represents a copy of the source. JAXP provides explicit support for identity transformations. These are more useful than they might appear, because JAXP defines three ways of supplying the source document (SAX, DOM, or lexical XML) and three ways of capturing the result document (SAX, DOM, or lexical XML), so an identity transformation can be used to convert any of these inputs to any of the outputs. For example, it can take SAX input and produce a lexical XML file as output, or it can take DOM input and produce a stream of SAX events as output. An implementation of JAXP can also support additional kinds of
Source
and
Result
objects if it chooses. This allows the “unofficial” document models such as JDOM, DOM4 J, and XOM to coexist within the JAXP framework.

JAXP is also designed to control a composite transformation consisting of a sequence of transformation steps, each defined by an XSLT stylesheet in its own right. To do this, it builds on the SAX2 concept of an
XMLFilter
, which takes an input document as a stream of SAX events and produces its output as another stream of SAX events. Any number of such filters can be arranged end to end in a pipeline to define a composite transformation.

As with the JAXP
SAXParser
and
DocumentBuilder
interfaces, JAXP allows the specific XSLT implementation to be selected using a
TransformerFactory
object. Typically, the XSLT vendors will each provide their own subclass of
TransformerFactory
.

For performance reasons, the API separates the process of compiling a stylesheet from the process of executing it. A stylesheet can be compiled once and executed many times against different source documents, perhaps concurrently in different threads. The compiled stylesheet, following Microsoft's MSXML nomenclature, is known as a
Templates
object. To keep simple things simple, however, there are also methods that combine the two processes of compilation and execution into a single call.

The classes defined in the
javax.xml.transform
package fall into several categories:

Category
Class or interface
Description
Principal classes
TransformerFactory
Selects and configures a vendor's implementation
Templates
Represents a compiled stylesheet in memory
Transformer
Represents a single execution of a stylesheet to transform a source document into a result
SAXTransformerFactory
Allows a transformation to be packaged as a SAX
XMLFilter
Source
Represents the input to a transformation
Result
Represents the output of a transformation
Source classes
SAXSource
Transformation input in the form of a SAX event stream
DOMSource
Transformation input in the form of a DOM
Document
StreamSource
Transformation input in the form of a serial XML document
Result classes
SAXResult
Transformation output in the form of a SAX event stream
DOMResult
Transformation output in the form of a DOM
Document
StreamResult
Transformation output in the form of a serial XML document (or HTML, or a plain text file)
Helper classes
URIResolver
User-supplied object that takes a URI contained in the stylesheet (for example, in the
document()
function) and fetches the relevant document as a
Source
object
ErrorListener
User-supplied object that is notified of warnings and errors. The
ErrorListener
reports these conditions to the user and decides whether to continue processing.
SourceLocator
Used primarily to identify where in the stylesheet an error occurred.
DOMLocator
Subclass of
SourceLocator
, used when the source was a DOM.
OutputKeys
A collection of constants defining the names of properties for serial output files.
Error classes
Transformer ConfigurationException
Generally denotes an error in the stylesheet that is detected at compile time.
TransformerException
A failure occurring in the course of executing a transformation.
TransformerFactory ConfigurationError
A failure to configure the
Transformer
.

In the following sections I will describe each of these classes, in alphabetical order of the class name (ignoring the name of the package).

javax.xml.transform.dom.DOMLocator

A
DOMLocator
is used to identify the location of an error when the document is supplied in the form of a DOM. This object will normally be created by the processor when an error occurs, and can be accessed using the
getLocator()
method of the relevant
Exception
object. It specializes
SourceLocator
, providing one additional method:

Method
Description
org.w3c.dom.Node getOriginatingNode()
Returns the node at which the error or other event is located

javax.xml.transform.dom.DOMResult

Other books

The Listmaker by Robin Klein
Abandoned by Lee Shepherd
Domination in Pink by Holly Roberts
Candy Man by Amy Lane
The Prince in Waiting by John Christopher
Hapenny Magick by Jennifer Carson
Rabid by Jami Lynn Saunders