Page MenuHomeFreeBSD

Implement NLS catalog encoding conversion
Needs ReviewPublic

Authored by AMDmi3 on Apr 5 2015, 11:20 PM.
Tags
None
Referenced Files
F108449851: D2232.diff
Fri, Jan 24, 9:55 PM
Unknown Object (File)
Thu, Jan 9, 5:07 PM
Unknown Object (File)
Nov 28 2024, 5:26 PM
Unknown Object (File)
Nov 19 2024, 8:39 PM
Unknown Object (File)
Nov 13 2024, 11:25 PM
Unknown Object (File)
Nov 11 2024, 12:12 PM
Unknown Object (File)
Oct 30 2024, 3:31 PM
Unknown Object (File)
Oct 25 2024, 6:14 AM

Details

Summary

In most base utilities which support NLS, translation catalogs are tied to specific encodings (for example, ee has catalog for ru_RU.KOI8-R, but not ru_RU.UTF-8). To not duplicate catalogs for different encodings, implement a mechanism which allows encoding conversion during the build. This works as follows:

  • New locales are added to NLS as usual
  • For each such locale, it's source file should be defined via NLSSRCDIR_${locale} or NLSSRCFILE_${locale} (as usual), and the source locale name should be defined in NLSICONV_${localename}. Thus, if NLSICONV_${localename} is defined, iconv is called to convert the character set. Encodings are take from source and destination locale names.

To demonstrate and use new mechanism, add ru_RU.UTF-8 nls support for ee. Confirmed to work on recent current: with LC_ALL=ru_RU.UTF-8 ee shows localized interface.

Another solution would be to convert all catalogs to UTF-8 by default and call iconv automatically based on locale name. That'll make NLSICONV_${localename} unneeded, but is more intrusive.

Question for reviewers: is it safe to use verbatim iconv in bsd.nls.mk?

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

AMDmi3 retitled this revision from to Implement NLS catalog encoding conversion.
AMDmi3 updated this object.
AMDmi3 edited the test plan for this revision. (Show Details)
AMDmi3 added reviewers: glebius, antoine, brooks, ed.

Usually the mechanism in such case is to directly add a converted file into the sources

share/mk/bsd.nls.mk
48

This is unsafe because iconv might not be in base on the host where you are build, it would at least need to be in the bootstrap-tools

I added @imp as a reviewer as he may have an opinion on this, introducing a new bootstrap tool may open a can of worm.

Adding converted file to svn would imply data duplication, so I don't really like this.

It looks like iconv is a simple program with no dependencies (other than libc) so adding it to bootstrap-tools shouldn't be painful.

share/mk/bsd.nls.mk
48

I think that build-tools would be enough, since we disable man pages, and the like, during the early build (and if we don't, we should).

yes build-tool should be enough

Do I need to do anything beside adding iconv to the list in build-tools: target in Makefile.inc1?

AMDmi3 edited edge metadata.

I've added iconv to build-tools:, however it still doesn't build
without iconv on host system.

iconv is built:

/usr/obj/<path to src>/usr.bin/iconv/iconv

however build still complains:

iconv -f KOI8-R -t UTF-8 /home/amdmi3/projects/external/freebsd-head/usr.bin/ee/nls/ru_RU.KOI8-R/ee.msg > ru_RU.UTF-8.msg
/bin/sh: iconv: not found

As suggested by bapt@, iconv should be in bootstrap-tools, not build-tools. Successfull build confirmed on host without iconv binary

You would need to add a condition on MK_ICONV in ee/Makefile otherwise if you do build without iconv then you will have a failure

Makefile.inc1
1428

Looks like not desired change :)

In D2232#19, @bapt wrote:

You would need to add a condition on MK_ICONV in ee/Makefile otherwise if you do build without iconv then you will have a failure

It builds fine here WITHOUT_ICONV

Makefile.inc1
1428

I've just sorted the list. Should I omit this?

No luck with cross-building. Simple cross-build:

iconv -f KOI8-R -t UTF-8 /home/amdmi3/projects/external/freebsd-head/usr.bin/ee/nls/ru_RU.KOI8-R/ee.msg > ru_RU.UTF-8.msg
iconv: iconv_open(UTF-8, KOI8-R): Invalid argument

cross-build WITHOUT_ICONV:

/home/amdmi3/projects/external/freebsd-head/lib/libc_nonshared/../libc/iconv/iconv.c:(.text+0x0): undefined reference to `__bsd_iconv'
cc: error: linker command failed with exit code 1 (use -v to see invocation)
In D2232#21, @AMDmi3 wrote:

No luck with cross-building. Simple cross-build:

iconv -f KOI8-R -t UTF-8 /home/amdmi3/projects/external/freebsd-head/usr.bin/ee/nls/ru_RU.KOI8-R/ee.msg > ru_RU.UTF-8.msg
iconv: iconv_open(UTF-8, KOI8-R): Invalid argument

cross-build WITHOUT_ICONV:

/home/amdmi3/projects/external/freebsd-head/lib/libc_nonshared/../libc/iconv/iconv.c:(.text+0x0): undefined reference to `__bsd_iconv'
cc: error: linker command failed with exit code 1 (use -v to see invocation)

The issue at hand is that iconv() is integrated into the C library and probably also uses a collection of data files (not sure). Just building the iconv tool itself is not sufficient. I hate to say it, but my suspicion is that what you're trying to achieve is not trivial and may not be worth the effort as long as we want to support building from systems that don't include iconv yet. What about adding a tiny shell script that iconvs the files manually and checking in its output for now?

Sidenote: ideally we should just patch up our system to only support UTF-8 locale data and having it converted to the locale's character set on the fly...

The issue at hand is that iconv() is integrated into the C library and probably also uses a collection of data files (not sure).

I've had the same suspiction. This is probably related to Symbol.map mentioned in lib/libc/iconv/Makefile.inc.

Just building the iconv tool itself is not sufficient. I hate to say it, but my suspicion is that what you're trying to achieve is not trivial and may not be worth the effort as long as we want to support building from systems that don't include iconv yet. What about adding a tiny shell script that iconvs the files manually and checking in its output for now?

That should be easy, but I really don't like data duplication and that changes to one locale may be missed from another one. However I've just had an idea: is it possible to create a test which checks for equality of nls files for the same language but different encodings?

Sidenote: ideally we should just patch up our system to only support UTF-8 locale data and having it converted to the locale's character set on the fly...

I've though of this, however while 8bit is always convertible to utf-8, the opposite is not true and we may have problems because of it if.

ed removed a reviewer: ed.

Add @yuripv who I believe is taking an interest in related things.

Reading through the discussion, I do agree with @ed -- we should have just one version in UTF-8, and convert to current encoding on the fly, though I do agree with @AMDmi3 as well -- making sure we use only the characters that can be converted to the related single-byte locale shouldn't be that hard, we don't need to use the fancy unicode chars in messages.

Oops, I didn't mean to add Ed back to this review :-)