Collation Classes
This file contains information about the following subjects:
In most cases, the ordering of Unicode
values does not produce correct ordering results. For example, in
the ASCII-based character sets, Z is ordered before a,
and z is ordered before ñ. Open Class
collation classes, however, support collation objects that
compare strings based not on the Unicode values of each
character, but on the rules of a natural language. This is what
enables language-sensitive string comparison.
Each International Framework collation
objects is based on a set of rules that define the results for
alphabetizing and comparing text in a particular natural
language. These rules define not only a ranking (such as a < b
< c) but three levels of priority within the ranking.
For many European languages, the difference
between two base letters (a and b) is a primary
difference, the difference between an unaccented and an accented
base letter (ä and a) is secondary, and the
difference between an uppercase and lowercase letter (A
and a) is tertiary. These distinctions allow you to set
the level of comparison for more sophisticated sorting and
searching.
The ICollation
interface is based on the protocols in the ANSI C++ standard
library collate
class, which provides string
comparison and hashing functions. The
ICollation comparison functions take two strings or substrings
and return a value that indicates whether the source string is
greater than (later in the alphabet), less than (earlier in the
alphabet), or equal to the target string. You can specify the
ordering strength of the comparison to control how differences
such as case and accents are handled.
You can compare styled text in an IText
object, but styling information is ignored.
This figure shows the collation class
architecture:

The collation classes include the
abstract base class ICollation, which defines the protocol for
language-sensitive string comparison and several concrete
subclasses, and ICollationIterator,
which lets you iterate through the list of
available localized collation objects.
ICollation provides the protocols you use to
create both language-sensitive and language-insensitive collation
objects. The ICollation interface is a superset of the interface
of the ANSI C++ Standard collate
class. Based on the
locale you specify, the ICollation::createCollation function can
return:
This figure shows the ICollation interface:

Use ICollationIterator to iterate through
the list of international collation objects currently available
on the system. You can set the iterator to enumerate only host
collation objects, only portable collation objects, or both (both
is the default).
This figure shows the interface for
ICollationIterator:

The correct collation for each language or
script is determined by a set of rules that define a ranking,
from least to greatest, for each character. To allow more
comparison options, each character is assigned an ordering
priority within the ranking: primary, secondary, or tertiary. For
example, in an English collation:
In English, then, you can implement
case-insensitive comparison by setting the ordering strength to
kSecondaryDifference. Primary and secondary differences are
considered but any tertiary (case) differences are ignored-thus,
"pat," "Pat," and "PAT" would be
considered equivalent strings.
When you create a collation object, you
specify an ordering strength that determines whether all
differences, both primary and secondary differences, or only
primary differences are considered. The
types of differences that are considered primary, secondary, and
tertiary may vary based on the language you are working with.
This table shows the results for English
strings compared with different ordering strengths:
When you are using the collation object
for the POSIX locale (a portable collation) specifying an
ordering strength has no effect.

Locale Names