Collation Classes

This file contains information about the following subjects:

Overview of Collation Classes

In most cases, the ordering of Unicode values does not produce correct ordering results. For example, in the ASCII-based character sets, Z is ordered before a, and z is ordered before ñ. Open Class collation classes, however, support collation objects that compare strings based not on the Unicode values of each character, but on the rules of a natural language. This is what enables language-sensitive string comparison.

Each International Framework collation objects is based on a set of rules that define the results for alphabetizing and comparing text in a particular natural language. These rules define not only a ranking (such as a < b < c) but three levels of priority within the ranking.

For many European languages, the difference between two base letters (a and b) is a primary difference, the difference between an unaccented and an accented base letter (ä and a) is secondary, and the difference between an uppercase and lowercase letter (A and a) is tertiary. These distinctions allow you to set the level of comparison for more sophisticated sorting and searching.

The ICollation interface is based on the protocols in the ANSI C++ standard library collate class, which provides string comparison and hashing functions. The ICollation comparison functions take two strings or substrings and return a value that indicates whether the source string is greater than (later in the alphabet), less than (earlier in the alphabet), or equal to the target string. You can specify the ordering strength of the comparison to control how differences such as case and accents are handled.

You can compare styled text in an IText object, but styling information is ignored.

This figure shows the collation class architecture:

Collation Subclasses

The collation classes include the abstract base class ICollation, which defines the protocol for language-sensitive string comparison and several concrete subclasses, and ICollationIterator, which lets you iterate through the list of available localized collation objects.

Class Description
IBitwiseCollation Provides bitwise, language-insensitive string comparison.
ICollation Provides access to either host-specific or portable collation for a given language as available. Primary class for language-sensitive string comparison.
ICollationIterator Lets you iterate through the available collation objects.
IPortableCollation Provides portable (non-host-specific) language-sensitive collation.

ICollation provides the protocols you use to create both language-sensitive and language-insensitive collation objects. The ICollation interface is a superset of the interface of the ANSI C++ Standard collate class. Based on the locale you specify, the ICollation::createCollation function can return:

This figure shows the ICollation interface:

Collation Iteration

Use ICollationIterator to iterate through the list of international collation objects currently available on the system. You can set the iterator to enumerate only host collation objects, only portable collation objects, or both (both is the default).

This figure shows the interface for ICollationIterator:

Ordering Strength

The correct collation for each language or script is determined by a set of rules that define a ranking, from least to greatest, for each character. To allow more comparison options, each character is assigned an ordering priority within the ranking: primary, secondary, or tertiary. For example, in an English collation:

In English, then, you can implement case-insensitive comparison by setting the ordering strength to kSecondaryDifference. Primary and secondary differences are considered but any tertiary (case) differences are ignored-thus, "pat," "Pat," and "PAT" would be considered equivalent strings.

When you create a collation object, you specify an ordering strength that determines whether all differences, both primary and secondary differences, or only primary differences are considered. The types of differences that are considered primary, secondary, and tertiary may vary based on the language you are working with.

This table shows the results for English strings compared with different ordering strengths:

Source Target Ordering strength Comparison result
abc abc kPrimaryDifference kSourceEqual
äbc abc kSecondaryDifference kSourceEqual
Abc abc kTertiaryDifference kSourceEqual
abc def kPrimaryDifference kSourceLess
abc äbc kSecondaryDifference kSourceLess
abc Abc kTertiaryDifference kSourceLess
def abc kPrimaryDifference kSourceGreater
äbc abc kSecondaryDifference kSourceGreater
Abc abc kTertiaryDifference kSourceGreater

When you are using the collation object for the POSIX locale (a portable collation) specifying an ordering strength has no effect.



Locale Names