XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (102 page)

There is no special null value, as there is in SQL. Where no other value is appropriate, a zero-length string or an empty sequence is used. These are not the same thing: an empty sequence is a sequence containing no items (
count($x)
returns 0), while a zero-length string is a sequence containing a single item, whose type is
xs:string
and whose value has a string-length of zero (
count($x)
returns 1,
string-length($x)
returns 0). However, although zero-length strings and empty sequences aren't the same thing, most of the functions in the standard library (see Chapter 13) give the same answer when an empty sequence is supplied as when a zero-length string is supplied.

The specifications try always to use the term zero-length string for the value
“”
, to avoid any possible confusion, but occasionally the terms null string and empty string slip in by mistake.

The actual set of Unicode characters permitted in XML changes between XML 1.0 and XML 1.1. The XPath 2.0 specifications leave it to the implementor to decide which version of XML to align with.

In XML 1.0, the only ASCII control characters permitted (codes below
x20
) are the whitespace characters
x09
,
x0A
, and
x0D
(tab, newline, and carriage return). In XML 1.1, all control characters other than
x00
are allowed, though you have to write them using XML character references rather than in their native encoding. For example, the
BELL
character, which in former times was used to ring the bell on a teletype machine, but nowadays is more likely to result in an irritating electronic beep, is represented as

. The requirement to use this form is because some of these control characters have special meaning in communications protocols that may be used to carry XML documents. The exclusion of the
x00
character (sometimes called
NUL
) is probably a concession to programmers writing XML parsers and related software in C, where this character is treated as a string delimiter. It also has the effect—probably deliberate—that you still can't use XML directly to convey binary data, you have to encode it as characters. As we will see (on page 208), XML Schema provides two types to help with this,
xs:hexBinary
and
xs:base64Binary
.

Unicode was originally defined so that all characters would fit in two bytes (the highest code point was 65,535), but it has since outgrown that limit, and now defines characters using code points up to 1,114,111. In programming languages such as Java, there is poor support for Unicode characters above 65,535, and they appear in the application as a
surrogate pair
: two
char
values that have to be processed as a pair. (Java 5 provides library support for higher codepoints, but the basic types
char
and
String
have not changed). In XPath, you don't have to worry about surrogate pairs. Each character, even those above 65,535, is counted as a single character. This affects functions such as
string-length()
, which counts the number of characters in a string, and
substring()
, which extracts the characters at particular positions in the string.

Strings may be compared using the
=
and
!=
operators, as well as
<
,
>
,
<=
, and
>=
. The exact way in which these work is context-dependent. Strings are always compared using a collation, and it is up to the collation to decide, for example, whether the two strings
naive
and
naïve

Other books

Amy, My Daughter by Mitch Winehouse
UnholyCravings by Suzanne Rock
Insignia by Kelly Matsuura
Sin by Violetta Rand
Summer Girl by Casey Grant
Candor by Pam Bachorz
Merit Badge Murder by Leslie Langtry
Chicken Little by Cory Doctorow