XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (679 page)

escape-uri-attributes
This attribute determines whether non-ASCII characters appearing in URI-valued attributes should be escaped using the %HH convention. The default is
yes
. Although HTML requires URIs to be escaped in this way, there are several reasons why you might choose to suppress this. Firstly, the URIs might already be in escaped form: you can do the escaping from within the stylesheet, with much greater control, using the
escape-html-uri()
function described in Chapter 13. Secondly, browsers do not always handle escaped URIs correctly. This is especially true when the URI is handled on the client side; for example, when it invokes JavaScript functions, or when it contains a fragment identifier.
include-content-type
If this attribute is set to
yes
(or if it is omitted), the serializer will add a

element as a child of the HTML

element, provided that the result tree contains a

element. This

element contains details of the media type and encoding of the document. Any existing

element containing this information will be replaced. You may want to suppress this by specifying the value
no
, for example if the stylesheet is copying a document that already includes such an element.
indent
If this attribute has the value

, the idea is that the HTML output should be indented to show its hierarchic structure. The XSLT processor is not obliged to respect this request, and if it does so, the precise form of the output is not defined.
When producing indented output, the processor has much more freedom to add or remove whitespace than in the XML case, because of the way whitespace is handled in HTML. The processor can add or remove whitespace anywhere it likes so long as it doesn't change the way a browser would display the HTML.
media-type
This attribute defines the media type of the output file (often referred to as its MIME type). The default value is
text/html
. The specification doesn't say what use is made of this information; it doesn't affect the contents of the output file, but it may affect the way it is named, stored, or transmitted, depending on the environment. For example, the information might find its way into an HTTP protocol header.
normalization-form
This attribute is used in the same way as for the XML output method, described on page 934.
use-character-maps
This attribute is used in the same way as for the XML output method, described on page 935.
version
This attribute determines the version of HTML used in the output document. It is up to the implementation to decide which versions of HTML should be supported, though all implementations can be expected to support the default version, namely version 4.0.

The XHTML Output Method

An XHTML document is an XML document, so when you specify
method=“XHTML”
, most of the rules for the XML output method are inherited without change. However, there are special guidelines for serializing XHTML so that it is rendered correctly in browsers that were designed originally to handle HTML, and in addition some of the features of HTML serialization, such as URI escaping and addition of

elements, are also applicable to XHTML. So the XHTML output method is essentially a blend of features from the XML and HTML methods.

It's worth asking yourself whether you really need to use this method. If the browser understands XHTML, then serializing the result tree as XML will work fine. If the browser doesn't understand XHTML, and is going to handle it as if it were HTML, then why not serialize the tree as HTML to start with?

In fact, the XHTML output method works in the same way as the XML output method (and uses all the serialization parameters that control the XML method) with specific exceptions. These exceptions are:

  • The way that empty elements are output depends on the way the element is declared in the XHTML DTD. For an element whose content model is empty, such as


    or


    or

    , the serializer should use an XML empty-element tag, taking care to include a space before the final
    />
    , so that the tag looks like


    or

    . For an element that is empty but allowed to have content, such as a


    element, the serializer should use a start tag followed by an end tag, thus:


    .
  • The entity reference
    '
    is not recognized by all browsers, so the serializer will probably use
    
    instead.
  • The serializer needs to take care with whitespace (for example newlines) appearing in attribute values. The specification doesn't say exactly how this should be handled, but it's probably safest, if there is any whitespace other than a single space character, to represent it using numeric character references.
  • The serializer must not output redundant namespace declarations, since these would violate the XHTML DTD. (At one time this rule was wider and encouraged the serializer to put XHTML elements in the default namespace. However, the serializer has no discretion in this area—namespace prefixes are chosen by the user, not by the serializer.) Because DTDs are not namespace-aware, it's always the case that if you choose the wrong prefix, the result document of a transformation may be invalid against the DTD.

Other books

Monstrous Beauty by Marie Brennan
Her Fill-In Fiancé by Stacy Connelly
Tunnel of Secrets by Franklin W. Dixon
WYVERN by Grace Draven
My One and Only by Kristan Higgins