Which Encoding Scheme Should I Use?

yakisha

New member
Specific character sets for a particular region or language OR Unicode. What are advantages and disadvantages of each one ?
 
Specific character sets have these advantages over Unicode:
? Your current software probably uses them already.
? Language-specific character sets are small (255 characters), unlike the Unicode character set. Version 4.0 of Unicode contains approximately 99,000 distinct characters.
? The software can have typefaces that are esthetically pleasing and optimized for each particular language.With Unicode, on the other hand, the software has to rely on whatever typeface is available. For example, of all the faces supplied with Microsoft?s Internet Explorer, only Arial Unicode MS and Lucida Sans Unicode contain the entire Unicode range of characters.
 
If users need to be able to enter or see more than one language at a time for example, if they need to enter names and addresses in, say, Chinese and Japanese, then the software must use Unicode. Only Unicode lets people input and display text in multiple languages at the same time.
 
Any time data are passed from one program to another, the first program’s character set should be indicated explicitly; otherwise, the browser may use the default, whether or not the default is correct. The charset can be defined as part of the protocol or API or provided explicitly in the HTML header.
 
Below statement indicates that the text was encoded in Russian (charset 1251):

<META http-equiv=”Content-Type”
content=”text/html; CHARSET=windows-1251”>

This statement indicates that it was encoded using Unicode, 8-bit:

<META http-equiv=“content-type”
content=“text/html; CHARSET=UTF-8”>

However, keep in mind that if the browser is too old to recognize Unicode, it probably won’t display non-ASCII characters correctly. The ASCII characters are the first 128 characters in the Unicode set.
 
Some internationalization tools available from development platforms and packages. Apple, Sun, Microsoft, and others have developed APIs specifically addressing many of the most difficult internationalization problems.
 
Make sure that the right charset is set

A web page can easily be saved with the wrong character set if the designer forgets to reset the defaults and/or to check what was the development package put into the META tag. A mismatch between the user?s locale and the page?s charset can mean garbage on the screen.
 
If you?re using Java or other programming languages, put all text messages, labels, and localized terminology in resource files (also called resource bundles). Never embed text in software controls. HTML files can be translated easily and don?t need to employ resource files. However, make sure that commands don?t get translated by mistake?the application won?t work if they do!
 
Back
Top