Page MenuHomeFreeBSD

Fix generations of colldef source files for non-UTF-8 locales
ClosedPublic

Authored by hrs on Dec 29 2020, 3:52 PM.
Tags
None
Referenced Files
Unknown Object (File)
Feb 14 2024, 10:02 PM
Unknown Object (File)
Jan 15 2024, 6:00 PM
Unknown Object (File)
Jan 15 2024, 1:16 AM
Unknown Object (File)
Dec 20 2023, 7:17 AM
Unknown Object (File)
Dec 18 2023, 3:47 AM
Unknown Object (File)
Dec 14 2023, 8:26 PM
Unknown Object (File)
Sep 24 2023, 6:29 AM
Unknown Object (File)
Sep 20 2023, 9:50 AM
Subscribers

Details

Summary
  • Fix generation of colldef files. They were generated by duplicating UTF-8 collation files for each language and included invalid characters in the non-UTF-8 encodings. localedef(1) does not allow those characters. cldr2def.pl now checks if the characters are valid based on charmap files.

    TODO: ja_JP.UTF-8 locale should not be generated solely from CLDR because it was standardized in a document "UI-OSF Application Platform Profile for Japanese Environment" which was incompatible with information in CLDR. Most of commercial Unix vendors adopt this pre-Unicode-era document as the reference even for UTF-8 locale. And then newer versions of Solaris have added a CLDR version as ja_JP.UTF-8@cldr, and IBM AIX has used JA_JP.UTF-8 for the UI-OSF specification and ja_JP.UTF-8 for CLDR. FreeBSD should also have two versions for compatibility at least.

    Note that this commit does not change generation of ja_JP.UTF-8. Changes related to this issue will be committed separately later.
  • Generate POSIX charamap for UTF-32 as a reference. It was confusing that charmap.xml used Unicode names defined in UnicodeData.txt though POSIX charmap used slightly different names for the same code points. cldr2def.pl now uses UTF-32.cm as single information source for Unicode symbol names and code points. Charset.xml is also updated to use them.
  • Fix a bug in get_encodings() in cldr2def.pl which did not understand 0x00+0x00 notation correctly in charmaps/ISCII-DEV.TXT.
  • Do not regenerate posix/xx_Comm_C.UTF-8.src every time when doing "make build".

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 35743
Build 32632: arc lint + arc unit