Table E-1lists the suggested charset(s) for a number of languages. Charsets are used by servlets that generate multilingual output; they determine which character encoding a servlet's PrintWriter is to use. By default, the PrintWriter uses the ISO-8859-1 (Latin-1) charset, appropriate for most Western European languages. To specify an alternate charset, the charset value must be passed to the setContentType() method before the servlet retrieves its PrintWriter. For example:
res.setContentType("text/html; charset=Shift_JIS"); // A Japanese charset
PrintWriter out = res.getWriter(); // Writes Shift_JIS Japanese
Note that not all web browsers support all charsets or have the fonts available to represent all characters, although at minimum all clients support ISO-8859-1. Also, the UTF-8 charset can represent all Unicode characters and may be assumed a viable alternative for all languages.
|
Language |
Language Code |
Suggested Charsets |
|---|---|---|
|
Albanian |
sq |
ISO-8859-2 |
|
Arabic |
ar |
ISO-8859-6 |
|
Bulgarian |
bg |
ISO-8859-5 |
|
Byelorussian |
be |
ISO-8859-5 |
|
Catalan (Spanish) |
ca |
ISO-8859-1 |
|
Chinese (Simplified/Mainland) |
zh |
GB2312 |
|
Chinese (Traditional/Taiwan) |
zh (country TW) |
Big5 |
|
Croatian |
hr |
ISO-8859-2 |
|
Czech |
cs |
ISO-8859-2 |
|
Danish |
da |
ISO-8859-1 |
|
Dutch |
nl |
ISO-8859-1 |
|
English |
en |
ISO-8859-1 |
|
Estonian |
et |
ISO-8859-1 |
|
Finnish |
fi |
ISO-8859-1 |
|
French |
fr |
ISO-8859-1 |
|
German |
de |
ISO-8859-1 |
|
Greek |
el |
ISO-8859-7 |
|
Hebrew |
he (formerly iw) |
ISO-8859-8 |
|
Hungarian |
hu |
ISO-8859-2 |
|
Icelandic |
is |
ISO-8859-1 |
|
Italian |
it |
ISO-8859-1 |
|
Japanese |
ja |
Shift_JIS, ISO-2022-JP, EUC-JP[1] |
|
Korean |
ko |
EUC-KR[2] |
|
Latvian, Lettish |
lv |
ISO-8859-2 |
|
Lithuanian |
lt |
ISO-8859-2 |
|
Macedonian |
mk |
ISO-8859-5 |
|
Norwegian |
no |
ISO-8859-1 |
|
Polish |
pl |
ISO-8859-2 |
|
Portuguese |
pt |
ISO-8859-1 |
|
Romanian |
ro |
ISO-8859-2 |
|
Russian |
ru |
ISO-8859-5, KOI8-R |
|
Serbian |
sr |
ISO-8859-5, KOI8-R |
|
Serbo-Croatian |
sh |
ISO-8859-5, ISO-8859-2, KOI8-R |
|
Slovak |
sk |
ISO-8859-2 |
|
Slovenian |
sl |
ISO-8859-2 |
|
Spanish |
es |
ISO-8859-1 |
|
Swedish |
sv |
ISO-8859-1 |
|
Turkish |
tr |
ISO-8859-9 |
|
Ukranian |
uk |
[1] First supported in JDK 1.1.6. Earlier versions of the JDK know the EUC-JP character set by the name EUCJIS, so for portability you can set the character set to EUC-JP and manually construct an EUCJIS PrintWriter.
[2] First supported in JDK 1.1.6. Earlier versions of the JDK know the EUC-KR character set by the name KSC_5601, so for portability you can set the character set to EUC-KR and manually construct a KSC_5601 PrintWriter.

Copyright © 2001 O'Reilly & Associates. All rights reserved.