XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (677 page)

  • Processing instructions are terminated with
    >
    rather than
    ?>
    . Processing instructions are not often used in HTML, but the HTML 4.0 standard recommends that any vendor extensions should be implemented this way, rather than by adding element tags to the language. So it is possible they will be seen more frequently in the future.
  • Attributes that are conventionally written with a keyword only, and no value, will be recognized and output in this form. Common examples are
    READONLY>
    and
    SELECTED>
    . This is shorthand, permitted in SGML but not in XML, for an attribute that has only one permitted value, which is the same as the attribute name. In XML, these tags must be written as

    and
    SELECTED=“SELECTED”>
    . The HTML output method will normally use the abbreviated form, as this is the only form that older HTML browsers will recognize.
  • The special use of the ampersand character in dynamic HTML attributes is recognized. For example, the tag

  • is correct HTML, though it would not be correct in XML, because of the ampersand character. To produce this output from a literal result element, the tag in the stylesheet would need to be written as

    : note the double curly braces, to prevent them being interpreted with their special meaning in attribute value templates.

    A common source of anxiety with HTML output is the use of ampersands in URLs. For example, suppose you want to generate the output:


    Spanish Widgets


    However you try to produce this using standard XSLT, the ampersand will always come out as
    &
    . The reason for this is simple:
    &
    , although commonly used and widely accepted, is not actually correct HTML, and according to the standard it must be escaped as
    &
    . All respectable browsers accept the correct escaped form, so the answer is: don't worry about it.

    Although the serializer won't generally check that the result tree is valid HTML, there is one exception: it must not use characters that are allowed in XML but not in HTML, notably Unicode characters in the range x80 to x9F. If these characters appear in your XML, the chances are that they got there by accident. Microsoft's cp1252 character set (sometimes called
    ANSI
    ) is generally similar to iso-8859-1 but uses codes in this range to refer to special characters such as the Euro currency symbol, dagger, em-dash, middle dot, and the trademark sign. If a document that uses these characters is correctly labeled, then these characters will be translated into their Unicode equivalents (for example the Euro sign will become x20AC), and all will be well. If, however, the document is wrongly labeled with
    encoding=“iso-8859-1”
    , then these characters will be represented in the XML with codes in the range x80 to x9F, which will cause an error when you try to serialize as HTML, because HTML does not allow characters in that range. The remedy is to change the XML declaration of the source document from
    encoding=“iso-8859-1”
    to
    encoding=“cp1252”
    .

    Other books

    Til Death Do Us Part by Beverly Barton
    Kane, Andrea by Scent of Danger
    False Alarm by Veronica Heley
    Totlandia: Winter by Josie Brown
    The Alley by Eleanor Estes

    © ThomasStone 2015 - 2025    Contact for me [email protected]