Index: head/en_US.ISO8859-1/books/handbook/l10n/chapter.xml =================================================================== --- head/en_US.ISO8859-1/books/handbook/l10n/chapter.xml (revision 42505) +++ head/en_US.ISO8859-1/books/handbook/l10n/chapter.xml (revision 42506) @@ -1,977 +1,978 @@ Andrey Chernov Contributed by Michael C. Wu Rewritten by Localization - <acronym>i18n</acronym>/<acronym>L10n</acronym> Usage and Setup Synopsis &os; is a distributed project with users and contributors located all over the world. This chapter discusses the internationalization and localization features of &os; that allow non-English speaking users to get real work done. Since there are many aspects of the i18n implementation in both the system and application levels, more specific sources of documentation are referred to, where applicable. After reading this chapter, you will know: How different languages and locales are encoded on modern operating systems. How to set the locale for a login shell. How to configure the console for non-English languages. How to use Xorg effectively with different languages. Where to find more information about writing i18n-compliant applications. Before reading this chapter, you should: Know how to install additional third-party applications. The Basics What Is <acronym>i18n</acronym>/<acronym>L10n</acronym>? internationalization localization localization The term internationalization has been shortened to i18n, which represents the number of letters between the first and the last letters of internationalization. L10n uses the same naming scheme, coming from localization. Combined together, i18n/L10n methods, protocols, and applications allow users to use languages of their choice. i18n applications are programmed using i18n kits under libraries. These allow developers to write a simple file and translate displayed menus and texts to each language. Why Use <acronym>i18n</acronym>/<acronym>L10n</acronym>? Using i18n/L10n allows a user to view, input, or process data in non-English languages. Which Languages Are Supported? i18n and L10n are not &os; specific. Currently, one can choose from most of the major languages, including but not limited to: Chinese, German, Japanese, Korean, French, Russian, and Vietnamese. Using Localization locale Localization settings are based on three main terms: Language Code, Country Code, and Encoding. Locale names are constructed from these parts as follows: LanguageCode_CountryCode.Encoding Language and Country Codes language codes country codes In order to localize a &os; system to a specific language, the user needs to determine the codes for the specific country and language as the country code tells applications which variation of the given language to use. The following are examples of language/country codes: Language/Country Code Description en_US English - United States ru_RU Russian for Russia zh_TW Traditional Chinese for Taiwan A complete listing of available locales can be found by typing: &prompt.user; locale -a Encodings encodings ASCII Some languages use non-ASCII encodings that are 8-bit, wide, or multibyte characters. For more information on these encodings, refer to &man.multibyte.3;. Older applications do not recognize these encodings and mistake them for control characters. Newer applications usually recognize 8-bit characters. Depending on the implementation, users may be required to compile an application with wide or multibyte character support, or configure it correctly. To provide application support for wide or multibyte characters, the &os; Ports Collection contains programs for several languages. Refer to the i18n documentation in the respective &os; port. Specifically, the user needs to look at the application documentation to decide how to configure it correctly or to determine which compile options to use when building the port. Some things to keep in mind are: Language specific single C chars character sets such as ISO8859-1, ISO8859-15, KOI8-R, and CP437. These are described in &man.multibyte.3;. Wide or multibyte encodings such as EUC and Big5. The active list of character sets can be found at the IANA Registry. &os; uses Xorg-compatible locale encodings instead. In the &os; Ports Collection, i18n applications include i18n in their names for easy identification. However, they do not always support the language needed. Setting Locale Usually it is sufficient to export the value of the locale name as LANG in the login shell. This could be done in the user's ~/.login_conf or in the startup file of the user's shell: (~/.profile, ~/.bashrc, or ~/.cshrc). There is no need to set the locale subsets such as LC_CTYPE or LC_CTIME. Refer to language-specific &os; documentation for more information. Each user should set the following two environment variables in their configuration files: LANG for &posix; POSIX &man.setlocale.3; family functions MIME MM_CHARSET for applications' MIME character set These should be set in the user's shell configuration, the specific application configuration, and the Xorg configuration. Setting Locale Methods locale login class This section describes the two methods for setting locale. The first is recommended and assigns the environment variables in the login class. The second method adds the environment variable assignments to the system's shell startup file. Login Classes Method This method allows environment variables needed for locale name and MIME character sets to be assigned once for every possible shell instead of adding specific shell assignments to each shell's startup file. User Level Setup can be performed by each user while Administrator Level Setup requires superuser privileges. User Level Setup This provides a minimal example of a .login_conf located in a user's home directory which has both variables set for the Latin-1 encoding: me:\ :charset=ISO-8859-1:\ :lang=de_DE.ISO8859-1: Traditional Chinese BIG-5 encoding Here is an example of a user's .login_conf that sets the variables for Traditional Chinese in BIG-5 encoding. More variables are set because some applications do not correctly respect locale variables for Chinese, Japanese, and Korean. #Users who do not wish to use monetary units or time formats #of Taiwan can manually change each variable me:\ :lang=zh_TW.Big5:\ :setenv=LC_ALL=zh_TW.Big5:\ :setenv=LC_COLLATE=zh_TW.Big5:\ :setenv=LC_CTYPE=zh_TW.Big5:\ :setenv=LC_MESSAGES=zh_TW.Big5:\ :setenv=LC_MONETARY=zh_TW.Big5:\ :setenv=LC_NUMERIC=zh_TW.Big5:\ :setenv=LC_TIME=zh_TW.Big5:\ :charset=big5:\ :xmodifiers="@im=gcin": #Set gcin as the XIM Input Server See Administrator Level Setup and &man.login.conf.5; for more details. Administrator Level Setup Verify that the user's login class in /etc/login.conf sets the correct language: language_name|Account Type Description:\ :charset=MIME_charset:\ :lang=locale_name:\ :tc=default: The previous Latin-1 example would look like this: german|German Users Accounts:\ :charset=ISO-8859-1:\ :lang=de_DE.ISO8859-1:\ :tc=default: Whenever this file is edited, execute the following command to update the capability database: &prompt.root; cap_mkdb /etc/login.conf Changing Login Classes with &man.vipw.8; vipw When using vipw to add new users, use language to set the language: user:password:1111:11:language:0:0:User Name:/home/user:/bin/sh Changing Login Classes with &man.adduser.8; adduser login class When using adduser to add new users, configure the language as follows: If all new users use the same language, set defaultclass = language in /etc/adduser.conf. Alternatively, input the specified language at this prompt: Enter login class: default []: when creating a new user using &man.adduser.8;. Another alternative is to use the following when creating a user that uses a different language than the one set in /etc/adduser.conf: &prompt.root; adduser -class language Changing Login Classes with &man.pw.8; pw If &man.pw.8; is used to add new users, call it in this form: &prompt.root; pw useradd user_name -L language Shell Startup File Method This method is not recommended because it requires a different setup for each shell. Use the Login Class Method instead. MIME locale To add the locale name and MIME character set, set the two environment variables shown below in the /etc/profile or /etc/csh.login shell startup files. This example sets the German language: In /etc/profile: LANG=de_DE.ISO8859-1; export LANG MM_CHARSET=ISO-8859-1; export MM_CHARSET Or in /etc/csh.login: setenv LANG de_DE.ISO8859-1 setenv MM_CHARSET ISO-8859-1 Alternatively, add the above settings to /usr/share/skel/dot.profile or /usr/share/skel/dot.login. To configure Xorg, add one of the following to ~/.xinitrc, depending upon the shell: LANG=de_DE.ISO8859-1; export LANG setenv LANG de_DE.ISO8859-1 Console Setup For all single C chars character sets, set the correct console fonts in /etc/rc.conf for the language in question with: font8x16=font_name font8x14=font_name font8x8=font_name The font_name is taken from /usr/share/syscons/fonts, without the .fnt suffix. sysinstall keymap screenmap The keymap and screenmap for the single C chars character set can be set using sysinstall. Once inside sysinstall, choose Configure, then Console. Alternatively, add the following to /etc/rc.conf: scrnmap=screenmap_name keymap=keymap_name keychange="fkey_number sequence" The screenmap_name is taken from /usr/share/syscons/scrnmaps, without the .scm suffix. A screenmap with a corresponding mapped font is usually needed as a workaround for expanding bit 8 to bit 9 on a VGA adapter's font character matrix. This will move letters out of the pseudographics area if the screen font uses a bit 8 column. If moused is enabled in /etc/rc.conf, review the mouse cursor information in the next paragraph. moused By default, the mouse cursor of the &man.syscons.4; driver occupies the 0xd0-0xd3 range in the character set. If the language uses this range, move the cursor's range. To enable this workaround for &os;, add the following line to /etc/rc.conf: mousechar_start=3 The keymap_name in the above example is taken from /usr/share/syscons/keymaps, without the .kbd suffix. When uncertain as to which keymap to use, &man.kbdmap.1; can be used to test keymaps without rebooting. The keychange is usually needed to program function keys to match the selected terminal type because function key sequences cannot be defined in the key map. Be sure to set the correct console terminal type in /etc/ttys for all virtual terminal entries. Current pre-defined correspondences are: Character Set Terminal Type ISO8859-1 or ISO8859-15 cons25l1 ISO8859-2 cons25l2 ISO8859-7 cons25l7 KOI8-R cons25r KOI8-U cons25u CP437 (VGA default) cons25 US-ASCII cons25w For languages with wide or multibyte characters, use the correct &os; port in /usr/ports/language. Some applications appear as serial terminals to the system. Reserve enough terminals in /etc/ttys for both Xorg and the pseudo-serial console. Here is a partial list of applications for using other languages in the console: Language Location Traditional Chinese (BIG-5) chinese/big5con Japanese japanese/kon2-16dot or japanese/mule-freewnn Korean korean/han Xorg Setup Although Xorg is not installed with &os;, it can be installed from the Ports Collection. Refer to for more information on how to do this. This section discusses how to localize Xorg once it is installed. Application specific i18n settings such as fonts and menus can be tuned in ~/.Xresources. Displaying Fonts Xorg True Type font server After installing x11-servers/xorg-server, install the language's &truetype; fonts. Setting the correct locale should allow users to view their selected language in graphical application menus. Inputting Non-English Characters X Input Method (XIM) The X Input Method (XIM) protocol is an input standard for Xorg clients. All Xorg applications should be written as XIM clients that take input from XIM input servers. There are several XIM servers available for different languages. Printer Setup Some single C chars character sets are hardware coded into printers. Wide or multibyte character sets require special setup using a utility such as apsfilter. Documents can be converted to &postscript; or PDF formats using language specific converters. Kernel and File Systems The &os; fast filesystem (FFS) is 8-bit clean, so it can be used with any single C chars character set. However, character set names are not stored in the filesystem as it is raw 8-bit and does not understand encoding order. Officially, FFS does not support any form of wide or multibyte character sets. However, some wide or multibyte character sets have independent patches for enabling support on FFS. Refer to the respective languages' web sites for more information and the patch files. DOS Unicode &os;'s support for the &ms-dos; filesystem has the configurable ability to convert between &ms-dos;, Unicode character sets, and chosen &os; filesystem character sets. Refer to &man.mount.msdosfs.8; for details. Compiling <acronym>i18n</acronym> Programs Many applications in the &os; Ports Collection have been ported with i18n support. Some of these include -i18n in the port name. These and many other programs have built in support for i18n and need no special consideration. MySQL However, some applications such as MySQL need to have their Makefile configured with the specific charset. This is usually done in the port's Makefile or by passing a value to configure in the source. Localizing &os; to Specific Languages Andrey Chernov Originally contributed by Russian Language (KOI8-R Encoding) localization Russian For more information about KOI8-R encoding, refer to KOI8-R References (Russian Net Character Set). Locale Setup To set this locale, put the following lines into each user's ~/.login_conf: me:My Account:\ :charset=KOI8-R:\ :lang=ru_RU.KOI8-R: Console Setup Add the following lines to /etc/rc.conf: keymap="ru.koi8-r" scrnmap="koi8-r2cp866" font8x16="cp866b-8x16" font8x14="cp866-8x14" font8x8="cp866-8x8" mousechar_start=3 For each ttyv entry in /etc/ttys, use cons25r as the terminal type. Printer Setup printers Since most printers with Russian characters come with hardware code page CP866, a special output filter is needed to convert from KOI8-R to CP866. &os; installs a default filter as /usr/libexec/lpr/ru/koi2alt. A Russian printer /etc/printcap entry should look like: lp|Russian local line printer:\ :sh:of=/usr/libexec/lpr/ru/koi2alt:\ :lp=/dev/lpt0:sd=/var/spool/output/lpd:lf=/var/log/lpd-errs: Refer to &man.printcap.5; for a more detailed description. &ms-dos; and Russian Filenames The following example &man.fstab.5; entry enables support for Russian filenames in mounted &ms-dos; filesystems: /dev/ad0s2 /dos/c msdos rw,-Lru_RU.KOI8-R 0 0 selects the locale name. Refer to &man.mount.msdosfs.8; for more details. <application>Xorg</application> Setup First, configure the non-X locale setup. When using &xorg;, install the x11-fonts/xorg-fonts-cyrillic package. Check the "Files" section in /etc/X11/xorg.conf. The following line must be added before any other FontPath entries: FontPath "/usr/local/lib/X11/fonts/cyrillic" Search the Ports Collection for more Cyrillic fonts. To activate a Russian keyboard, add the following to the "Keyboard" section of /etc/xorg.conf: Option "XkbLayout" "us,ru" Option "XkbOptions" "grp:toggle" Make sure that XkbDisable is commented out in that file. For grp:toggle use Right Alt, for grp:ctrl_shift_toggle use CtrlShift. For grp:caps_toggle use CapsLock. The old CapsLock function is still available in LAT mode only using ShiftCapsLock. grp:caps_toggle does not work in &xorg; for some unknown reason. If the keyboard has &windows; keys, and some non-alphabetical keys are mapped incorrectly, add the following line to /etc/xorg.conf: Option "XkbVariant" ",winkeys" The Russian XKB keyboard may not work with non-localized applications. Minimally localized applications should call a XtSetLanguageProc (NULL, NULL, NULL); function early in the program. See KOI8-R for X Window for more instructions on localizing Xorg applications. Traditional Chinese Localization for Taiwan localization Traditional Chinese The &os;-Taiwan Project has a Chinese HOWTO for &os; at using many Chinese ports. The current editor for the &os; Chinese HOWTO is Shen Chuan-Hsing statue@freebsd.sinica.edu.tw. German Language Localization for All ISO 8859-1 Languages localization German Slaven Rezic eserte@cs.tu-berlin.de wrote a tutorial on using umlauts on &os;. The tutorial is written in German and is available at . Greek Language Localization localization Greek Nikos Kokkalis nickkokkalis@gmail.com has written a complete article on Greek support in &os;. It is available here, in Greek only, as part of - the official &os; Greek documentation. + url="&url.doc.base;/el_GR.ISO8859-7/articles/greek-language-support/index.html">here, + in Greek only, as part of the official &os; Greek + documentation. Japanese and Korean Language Localization localization Japanese localization Korean For Japanese, refer to , and for Korean, refer to . Non-English &os; Documentation Some &os; contributors have translated parts of the &os; documentation to other languages. They are available through links on the main site or in /usr/share/doc.