Page MenuHomeFreeBSD

Implement NLS catalog encoding conversion
Needs ReviewPublic

Authored by AMDmi3 on Apr 5 2015, 11:20 PM.

Details

Summary

In most base utilities which support NLS, translation catalogs are tied to specific encodings (for example, ee has catalog for ru_RU.KOI8-R, but not ru_RU.UTF-8). To not duplicate catalogs for different encodings, implement a mechanism which allows encoding conversion during the build. This works as follows:

  • New locales are added to NLS as usual
  • For each such locale, it's source file should be defined via NLSSRCDIR_${locale} or NLSSRCFILE_${locale} (as usual), and the source locale name should be defined in NLSICONV_${localename}. Thus, if NLSICONV_${localename} is defined, iconv is called to convert the character set. Encodings are take from source and destination locale names.

To demonstrate and use new mechanism, add ru_RU.UTF-8 nls support for ee. Confirmed to work on recent current: with LC_ALL=ru_RU.UTF-8 ee shows localized interface.

Another solution would be to convert all catalogs to UTF-8 by default and call iconv automatically based on locale name. That'll make NLSICONV_${localename} unneeded, but is more intrusive.

Question for reviewers: is it safe to use verbatim iconv in bsd.nls.mk?

Diff Detail

Repository
rS FreeBSD src repository
Lint
No Linters Available
Unit
No Unit Test Coverage

Event Timeline

AMDmi3 updated this revision to Diff 4689.Apr 5 2015, 11:20 PM
AMDmi3 retitled this revision from to Implement NLS catalog encoding conversion.
AMDmi3 updated this object.
AMDmi3 edited the test plan for this revision. (Show Details)
AMDmi3 added reviewers: glebius, antoine, brooks, ed.
AMDmi3 updated this object.Apr 6 2015, 1:28 AM
AMDmi3 updated this object.Apr 6 2015, 1:31 AM
AMDmi3 added a reviewer: bapt.Apr 7 2015, 7:53 PM
bapt edited edge metadata.Apr 7 2015, 7:56 PM

Usually the mechanism in such case is to directly add a converted file into the sources

share/mk/bsd.nls.mk
48

This is unsafe because iconv might not be in base on the host where you are build, it would at least need to be in the bootstrap-tools

bapt added a reviewer: imp.Apr 7 2015, 7:57 PM

I added @imp as a reviewer as he may have an opinion on this, introducing a new bootstrap tool may open a can of worm.

AMDmi3 added a comment.Apr 7 2015, 8:18 PM

Adding converted file to svn would imply data duplication, so I don't really like this.

brooks edited edge metadata.Apr 7 2015, 8:32 PM

It looks like iconv is a simple program with no dependencies (other than libc) so adding it to bootstrap-tools shouldn't be painful.

imp added inline comments.Apr 8 2015, 1:46 AM
share/mk/bsd.nls.mk
48

I think that build-tools would be enough, since we disable man pages, and the like, during the early build (and if we don't, we should).

bapt added a comment.Apr 8 2015, 6:41 AM

yes build-tool should be enough

AMDmi3 added a comment.Apr 8 2015, 2:03 PM

Do I need to do anything beside adding iconv to the list in build-tools: target in Makefile.inc1?

AMDmi3 updated this revision to Diff 4751.Apr 9 2015, 11:16 AM
AMDmi3 edited edge metadata.

I've added iconv to build-tools:, however it still doesn't build
without iconv on host system.

iconv is built:

/usr/obj/<path to src>/usr.bin/iconv/iconv

however build still complains:

iconv -f KOI8-R -t UTF-8 /home/amdmi3/projects/external/freebsd-head/usr.bin/ee/nls/ru_RU.KOI8-R/ee.msg > ru_RU.UTF-8.msg
/bin/sh: iconv: not found
AMDmi3 updated this revision to Diff 4757.Apr 9 2015, 3:30 PM

As suggested by bapt@, iconv should be in bootstrap-tools, not build-tools. Successfull build confirmed on host without iconv binary

bapt added a comment.Apr 9 2015, 3:34 PM

You would need to add a condition on MK_ICONV in ee/Makefile otherwise if you do build without iconv then you will have a failure

Makefile.inc1
1433

Looks like not desired change :)

AMDmi3 added a comment.Apr 9 2015, 5:42 PM
In D2232#19, @bapt wrote:

You would need to add a condition on MK_ICONV in ee/Makefile otherwise if you do build without iconv then you will have a failure

It builds fine here WITHOUT_ICONV

Makefile.inc1
1433

I've just sorted the list. Should I omit this?

No luck with cross-building. Simple cross-build:

iconv -f KOI8-R -t UTF-8 /home/amdmi3/projects/external/freebsd-head/usr.bin/ee/nls/ru_RU.KOI8-R/ee.msg > ru_RU.UTF-8.msg
iconv: iconv_open(UTF-8, KOI8-R): Invalid argument

cross-build WITHOUT_ICONV:

/home/amdmi3/projects/external/freebsd-head/lib/libc_nonshared/../libc/iconv/iconv.c:(.text+0x0): undefined reference to `__bsd_iconv'
cc: error: linker command failed with exit code 1 (use -v to see invocation)
ed edited edge metadata.Apr 16 2015, 8:50 AM
In D2232#21, @AMDmi3 wrote:

No luck with cross-building. Simple cross-build:

iconv -f KOI8-R -t UTF-8 /home/amdmi3/projects/external/freebsd-head/usr.bin/ee/nls/ru_RU.KOI8-R/ee.msg > ru_RU.UTF-8.msg
iconv: iconv_open(UTF-8, KOI8-R): Invalid argument

cross-build WITHOUT_ICONV:

/home/amdmi3/projects/external/freebsd-head/lib/libc_nonshared/../libc/iconv/iconv.c:(.text+0x0): undefined reference to `__bsd_iconv'
cc: error: linker command failed with exit code 1 (use -v to see invocation)

The issue at hand is that iconv() is integrated into the C library and probably also uses a collection of data files (not sure). Just building the iconv tool itself is not sufficient. I hate to say it, but my suspicion is that what you're trying to achieve is not trivial and may not be worth the effort as long as we want to support building from systems that don't include iconv yet. What about adding a tiny shell script that iconvs the files manually and checking in its output for now?

Sidenote: ideally we should just patch up our system to only support UTF-8 locale data and having it converted to the locale's character set on the fly...

The issue at hand is that iconv() is integrated into the C library and probably also uses a collection of data files (not sure).

I've had the same suspiction. This is probably related to Symbol.map mentioned in lib/libc/iconv/Makefile.inc.

Just building the iconv tool itself is not sufficient. I hate to say it, but my suspicion is that what you're trying to achieve is not trivial and may not be worth the effort as long as we want to support building from systems that don't include iconv yet. What about adding a tiny shell script that iconvs the files manually and checking in its output for now?

That should be easy, but I really don't like data duplication and that changes to one locale may be missed from another one. However I've just had an idea: is it possible to create a test which checks for equality of nls files for the same language but different encodings?

Sidenote: ideally we should just patch up our system to only support UTF-8 locale data and having it converted to the locale's character set on the fly...

I've though of this, however while 8bit is always convertible to utf-8, the opposite is not true and we may have problems because of it if.

ed resigned from this revision.Aug 2 2015, 9:59 AM
ed removed a reviewer: ed.
brooks resigned from this revision.Feb 8 2018, 6:48 PM
emaste added a subscriber: yuripv.Nov 15 2018, 8:57 PM

Add @yuripv who I believe is taking an interest in related things.

yuripv added a subscriber: ed.Dec 8 2018, 9:15 PM

Reading through the discussion, I do agree with @ed -- we should have just one version in UTF-8, and convert to current encoding on the fly, though I do agree with @AMDmi3 as well -- making sure we use only the characters that can be converted to the related single-byte locale shouldn't be that hard, we don't need to use the fancy unicode chars in messages.

yuripv removed a subscriber: ed.Dec 8 2018, 9:16 PM

Oops, I didn't mean to add Ed back to this review :-)