Page MenuHomeFreeBSD

Fix generations of colldef source files for non-UTF-8 locales

Authored by hrs on Dec 29 2020, 3:52 PM.


  • Fix generation of colldef files. They were generated by duplicating UTF-8 collation files for each language and included invalid characters in the non-UTF-8 encodings. localedef(1) does not allow those characters. now checks if the characters are valid based on charmap files.

    TODO: ja_JP.UTF-8 locale should not be generated solely from CLDR because it was standardized in a document "UI-OSF Application Platform Profile for Japanese Environment" which was incompatible with information in CLDR. Most of commercial Unix vendors adopt this pre-Unicode-era document as the reference even for UTF-8 locale. And then newer versions of Solaris have added a CLDR version as ja_JP.UTF-8@cldr, and IBM AIX has used JA_JP.UTF-8 for the UI-OSF specification and ja_JP.UTF-8 for CLDR. FreeBSD should also have two versions for compatibility at least.

    Note that this commit does not change generation of ja_JP.UTF-8. Changes related to this issue will be committed separately later.
  • Generate POSIX charamap for UTF-32 as a reference. It was confusing that charmap.xml used Unicode names defined in UnicodeData.txt though POSIX charmap used slightly different names for the same code points. now uses as single information source for Unicode symbol names and code points. Charset.xml is also updated to use them.
  • Fix a bug in get_encodings() in which did not understand 0x00+0x00 notation correctly in charmaps/ISCII-DEV.TXT.
  • Do not regenerate posix/xx_Comm_C.UTF-8.src every time when doing "make build".

Diff Detail

R10 FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.