Converting Text from Character Format to Unicode

Use ITranscoder to convert text data from a char-based format (either char* or IString) to Unicode (either UniChar* or IText).

To use ITranscoder to convert from char text data into Unicode text:

  1. Call ITranscoder::createTranscoder to create a transcoder for the desired character set. Use the transcoder name provided in the Transcoder Names table. You can also specify a mapping proximity. ITranscoder::kSupersetMapping is the default.
  2. Set the behavior for handling exception characters if you want the transcoder to do something other than use substitution characters. You can use ITranscoder::setUnmappedBehavior to specify exception handling behavior.
  3. Transcode the text using the toUnicode function.
  4. Postprocess the line-breaking characters by calling ILineBreakConverter::convertInPlace or convert.

For example, this code shows how to transcode text from the Microsoft ShiftJIS character set (charText) into Unicode:

// Create the transcoder

ITranscoder* transcoder = ITranscoder::createTranscoder("Shift-
JIS",

			ITranscoder::kExactMapping);

// Transcode the string

IText unicodeText;

ITranscoder::result res = transcoder->toUnicode(charText, 
unicodeText);

if (res == codecvt_base::ok) {

	// transcoding was successful

}

// Postprocess any line-breaking characters

ILineBreakConverter::convertInPlace(unicodeText);

delete transcoder;


Transcoder Names