Page MenuHomeFreeBSD

Add a way to get version information from querylocale(3).
Needs ReviewPublic

Authored by munro_ip9.org on Sep 13 2018, 11:16 PM.

Details

Reviewers
bapt
0mp
Group Reviewers
manpages
Summary

Allow user programs to ask for a version string for the components of a locale_t. The structure of the version string is undefined, but can be used to detect changes in the locale's definition.

The intended use-case is databases that use libc collations to implement indexes. When collation definitions change, database indexes are frequently corrupted. By exposing the CLDR version string, databases can detect the change and issue an error or force an index rebuild. Without such versioning, database projects (such as PostgreSQL) are forced to move to other collation implementations (eg libicu) just to protect themselves from silent collation definition changes. Motivation discussed in slightly more detail here: https://lists.freebsd.org/pipermail/freebsd-hackers/2018-September/053289.html

For now only LC_COLLATE components have a way to report their version. The LC_COLLATE file format is backwards and forwards compatible. Where previously 24 bytes held "BSD 1.0\n" followed by NUL characters, there are now 12 bytes for the file format identifier and then 12 bytes for the data version string.

I plan to file a POSIX defect report that proposes this new interface, but that process requires an existing implementation in a real operating system.

Test Plan

We'll make a fake locale xx_XX.UTF-8. Start by copying en_US.UTF-8:

# cp -r /usr/share/locale/en_US.UTF-8 /usr/share/locale/xx_XX.UTF-8

Now create a new LC_COLLATE file that has a version string stamped on it, and install it:

# localedef \
  -i /usr/src/share/colldef/en_US.UTF-8.src \
  -f /usr/src/tools/tools/locale/etc/final-maps/map.UTF-8 \
  -V '30.0.2' \
  xx_XX
# cp xx_XX/LC_COLLATE /usr/share/locale/xx_XX.UTF-8/

Now we can access the version string with querylocaleversion:

# cat test.c 
#include <locale.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
        const char *name;
        locale_t loc;

        if (argc < 2)
                return (1);
        name = argv[1];

        loc = newlocale(LC_ALL_MASK, name, NULL);
        if (!loc) {
                printf("failed to open locale\n");
                return (1);
        }
        printf("name = [%s], version = [%s]\n",
                querylocale(LC_COLLATE_MASK, loc),
                querylocale(LC_COLLATE_MASK | LC_VERSION_MASK, loc));
        freelocale(loc);

        return (0);
}
# cc -Wall test.c
# ./a.out xx_XX.UTF-8
name = [xx_XX.UTF-8], version = [000:34.0]

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

munro_ip9.org created this revision.Sep 13 2018, 11:16 PM
munro_ip9.org retitled this revision from Add querylocaleversion(). to Add querylocaleversion(3) to libc..Sep 14 2018, 1:06 AM
munro_ip9.org added subscribers: yuripv, kib.

Please provide the diff with full context, e.g. by using -U with number of lines big enough to include full file contents if you are using git diff or git log -p.

yuripv added inline comments.Sep 14 2018, 6:17 AM
lib/libc/locale/querylocaleversion.3
46

Please run mandoc -Tlint on the man pages you are changing, e.g. it would tell you to start new sentences from new lines for this one :-) Also, it's likely should be .Dv LC_COLLATE_MASK.

50

Again, mandoc -Tlint would tell you that these needs to be sorted correctly, first by section, then alphabetically.

usr.bin/localedef/localedef.1
135

New sentence, new line; use .Xr querylocaleversion 3.

203

Sort.

Man page tidying + more context, based on feedback from yuripv.

munro_ip9.org marked 4 inline comments as done.Sep 14 2018, 7:12 AM
munro_ip9.org marked an inline comment as not done.

New version because I missed one detail from yuripv's feedback (.Xr querylocaleversion 3).

munro_ip9.org marked an inline comment as done.Sep 14 2018, 7:23 AM
0mp accepted this revision as: 0mp.Sep 14 2018, 8:12 AM
0mp added a subscriber: 0mp.
0mp added inline comments.
lib/libc/locale/querylocaleversion.3
3

Historically, the phrase 'All Rights Reserved.' was included in all copyright notices. The BSD releases had it to comply with the Buenos Aires Convention of 1910 in the Americas. With the ratification of the Berne Convention in 2000, it became obsolete. As such, the FreeBSD project recommends that new code omit the phrase and encourages existing copyright holders to remove it. In 2018, the project updated its templates to remove it.

https://www.freebsd.org/internal/software-license.html

52

Could you cross reference this new manual page from querylocale(3) as well?

usr.bin/localedef/localedef.1
36

Remember to dump Dd.

This revision is now accepted and ready to land.Sep 14 2018, 8:12 AM

Adjust man pages based on feedback from 0mp. While removing "All rights reserved", and after checking some nearby files and the documentation, it seemed that the thing to do with new files was to use my own name on the copyright message. I also added a new line to the querylocaleversion.3 page to mention the purpose of the version string.

This revision now requires review to proceed.Sep 14 2018, 9:53 AM
munro_ip9.org marked 3 inline comments as done.Sep 14 2018, 9:55 AM
yuripv added inline comments.Sep 14 2018, 11:00 AM
lib/libc/locale/collate.c
159

You likely want strlcpy() here to make sure it's NUL-terminated.

usr.bin/localedef/collate.c
1162

Just a nit: I know that current localedef code does that a lot, but still version != NULL would be a proper comparison here.

strncpy->strlcpy, as suggested by yuripv.

munro_ip9.org marked 2 inline comments as done.Sep 14 2018, 11:53 AM
munro_ip9.org added inline comments.
usr.bin/localedef/collate.c
1162

Hmm well it'd look odd next to all the nearby uses of implicit NULL comparison in this file, so I think it's probably better to keep it that way.

munro_ip9.org marked an inline comment as done.Sep 14 2018, 11:54 AM

Looks good to me, FWIW.

munro_ip9.org marked an inline comment as done.Sep 14 2018, 11:57 AM
kib added a comment.Sep 14 2018, 1:59 PM

I am not sure is it a good idea at all, but still cannot resist explaining it.

You could add a bit to the LC_XXX_MASK and make querylocale() returning either string or numeric representation of the version when the bit is set together with the real locale category bit.
BTW, how does querylocaleversion() behaves WRT locales created dynamically by newlocale(3) ?

In D17166#366077, @kib wrote:

You could add a bit to the LC_XXX_MASK and make querylocale() returning either string or numeric representation of the version when the bit is set together with the real locale category bit.

That is a nice idea. querylocale(LC_COLLATE_MASK | LC_VERSION_MASK, my_locale). It's attractive because it avoids introducing a new libc function, and yet it still makes a lot of sense as an interface. Note that unlike uselocale(3) etc, querylocale(3) is not part of POSIX 1003.1-2018 (we apparently borrowed it from macOS because libc++ needed it, see r227753). If we go this way my proposal to POSIX will be slightly wider in scope. Maybe that's a good thing. I will try this idea out tomorrow and report back.

BTW, how does querylocaleversion() behaves WRT locales created dynamically by newlocale(3) ?

It works. You are allowed to mix and match components from different locales, which is why my proposal requires you to ask for the version for one specific category. So if you start with en_US and replace its LC_COLLATE with the one from fr_FR, you'll have a locale that does American style money formatting but French style sorting. That is: querylocale(LC_MONETARY_MASK, loc) -> "en_US", querylocale(LC_COLLATE_MASK, loc) -> "fr_FR", querylocaleversion(LC_COLLATE_MASK, loc) -> [the version from fr_FR's LC_COLLATE file].

munro_ip9.org retitled this revision from Add querylocaleversion(3) to libc. to Add a way to get version information from querylocale(3)..
munro_ip9.org edited the test plan for this revision. (Show Details)

Instead of introducing a new function querylocaleversion(3), teach the existing querylocale(3) to return versions, as suggested by kib. I think I prefer it this way, but I'm not sure if it's more or less likely to survive contact with the standards process.

yuripv removed a subscriber: yuripv.Sep 15 2018, 9:01 AM
kib added inline comments.Sep 15 2018, 1:28 PM
lib/libc/locale/querylocale.3
53

Do you intent to only allow to detect change, or also to define a 'newer than' relation ?

munro_ip9.org added inline comments.Sep 15 2018, 9:26 PM
lib/libc/locale/querylocale.3
53

I don't think 'newer than' would be very useful on its own, and it would require a complicated specification of the version string which wouldn't work well with my goal of convincing other operating systems to adopt this via a the standards process. Consider that not everyone uses CLDR, and sometimes OSes change to a different source of collation data (as we did). (A much more ambitious project would be to support multiple versions installed at the same time, and then you might want to be able to order them, but I don't have the resources to propose that; I merely want to fix the problem of silent changes that result in corrupted persistent data, because that is very arguably a "defect" in the standard).

bapt accepted this revision.Sep 18 2018, 9:02 AM
This revision is now accepted and ready to land.Sep 18 2018, 9:02 AM
feld added a subscriber: feld.Jan 12 2019, 7:43 PM
munro_ip9.org edited the test plan for this revision. (Show Details)

Sorry for the long delay. Here is a new version that addresses a theoretical problem not handled before: what if our code changes for some reason, but the CLDR version doesn't? So I added an extra part to the version string. Changes:

  • If the collation algorithm changes (because we fix a major bug in libc that affects collating in a user-visible way), the value of the macro COLLATE_ALG_VERSION in collate.c should be incremented, though I expect that would be rare. The value returned by querylocale() is <algorithm version> + ":" + <definition version>, where that last thing is taken directly from CLDR. The initial algorithm version is "000". So the complete version string is "000:34.0" currently.
  • I updated the locale generation scripts to extract the CLDR version and put it into all generated collation files.
  • The CLDR version finishes up in a variable in the generated-but-present-in-the-tree file share/colldef/Makefile alongside the generated .src files, so you can see that in the patch. I thought about putting the version into the .src files instead, but I wasn't sure that would be OK, since localedef's input file syntax is controlled by the standard.
This revision now requires review to proceed.Thu, Oct 3, 10:45 AM
tmunro added a subscriber: tmunro.Fri, Oct 4, 12:43 AM
tmunro edited the summary of this revision. (Show Details)Fri, Oct 4, 1:59 AM