XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (321 page)

Collations

The static context for XPath expressions includes a set of collations, one of which is marked as the default collation. A collation is essentially a set of rules for comparing and sorting strings. One collation might decide that
pass
and
Paβ
are equal, another that they are distinct.

As far as XPath is concerned, collations are defined outside the system, and a collation is treated as a black box. The XPath processor knows which collations exist (because they are listed in the static context), but it doesn't know anything about their characteristics, beyond the fact that it can use the collation to compare two strings.

Collations are identified by URIs. These are like namespace URIs, in that they don't necessarily identify real resources on the Web: they are just globally unique names, ensuring that collations defined by one vendor can't be confused with those defined by a different vendor. There is only one collation whose name has been standardized, namely:

http://www.w3.org/2005/xpath-functions/collation/codepoint

This collation, called the Unicode Codepoint Collation, compares strings character by character, using the numeric values assigned to each character in the Unicode standard. So, for example,
“Z”
<
“a”
is true when using this collation, because the numeric code for
Z
is 90, and the code for
a
is 97.

As with other aspects of the static context, it's up to the host language to say what collations are available and how they are defined. In this area, however, XSLT as a host language has nothing to say: it leaves it entirely up to the implementation. Many implementations are likely to devise a scheme whereby URIs identify collations provided by the programming language environment, by a database system, or by the operating system.

In Java, for example, you can define a collator by creating an object of class
java.text.Collator
. You can obtain a collator for a particular Locale, which will give you the basic rules for a language (for example,
ä
collates after
z
in Swedish, but not in German). You can then parameterize the collator: for example you can set its strength, which determines whether or not it ignores accents and case, and you can control whether it applies Unicode normalization to the characters before comparison: this process recognizes that there are alternative ways of coding the same character in Unicode, either as combined characters (one codepoint representing lower-case-c-with-cedilla) or as separate characters (separate codepoints for the
c
and the cedilla). Saxon allows you to specify a collation URI that specifies these parameters explicitly, for example the URI:

Other books

Kpop Club by YR Choi
Full Court Devotion by Cami Checketts
The Supernaturals by David L. Golemon
Vanilla Salt by Ada Parellada