Page MenuHomeFreeBSD

Add a way to get version information from querylocale(3).
AcceptedPublic

Authored by munro_ip9.org on Sep 13 2018, 11:16 PM.

Details

Reviewers
bapt
0mp
Group Reviewers
manpages
Summary

Allow user programs to ask for a version string for the components of a locale_t. The structure of the version string is undefined, but can be used to detect changes in the locale's definition.

The intended use-case is databases that use libc collations to implement indexes. When collation definitions change, database indexes are frequently corrupted. By exposing the CLDR version string, databases can detect the change and issue an error or force an index rebuild. Without such versioning, database projects (such as PostgreSQL) are forced to move to other collation implementations (eg libicu) just to protect themselves from silent collation definition changes. Motivation discussed in slightly more detail here: https://lists.freebsd.org/pipermail/freebsd-hackers/2018-September/053289.html

For now only LC_COLLATE components have a way to report their version. The LC_COLLATE file format is backwards and forwards compatible. Where previously 24 bytes held "BSD 1.0\n" followed by NUL characters, there are now 12 bytes for the file format identifier and then 12 bytes for the data version string.

I plan to file a POSIX defect report that proposes this new interface, but that process requires an existing implementation in a real operating system.

Not yet done in this patch: the procedure for building the definitions that ship with the base system (see README and scripts under tools/tools/locale) should be updated so that some representation of the CLDR version used is passed to localedef -V (the new switch added by this patch). The exact format of the string isn't important, but whenever new CLDR version is used the string must change.

Test Plan

We'll make a fake locale xx_XX.UTF-8. Start by copying en_US.UTF-8:

# cp -r /usr/share/locale/en_US.UTF-8 /usr/share/locale/xx_XX.UTF-8

Now create a new LC_COLLATE file that has a version string stamped on it, and install it:

# localedef \
  -i /usr/src/share/colldef/en_US.UTF-8.src \
  -f /usr/src/tools/tools/locale/etc/final-maps/map.UTF-8 \
  -V '30.0.2' \
  xx_XX
# cp xx_XX/LC_COLLATE /usr/share/locale/xx_XX.UTF-8/

Now we can access the version string with querylocaleversion:

# cat test.c 
#include <locale.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
        const char *name;
        locale_t loc;

        if (argc < 2)
                return (1);
        name = argv[1];

        loc = newlocale(LC_ALL_MASK, name, NULL);
        if (!loc) {
                printf("failed to open locale\n");
                return (1);
        }
        printf("name = [%s], version = [%s]\n",
                querylocale(LC_COLLATE_MASK, loc),
                querylocale(LC_COLLATE_MASK | LC_VERSION_MASK, loc));
        freelocale(loc);

        return (0);
}
# cc -Wall test.c
# ./a.out xx_XX.UTF-8
name = [xx_XX.UTF-8], version = [30.0.2]

Collation files that don't have a version return an empty string:

# ./a.out fr_FR.UTF-8
name = [fr_FR.UTF-8], version = []

The new LC_COLLATE files can still be successfully loaded by unpatched libc (because they only read the BSD 1.0 header up to to the first NUL).

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

munro_ip9.org created this revision.Sep 13 2018, 11:16 PM
munro_ip9.org retitled this revision from Add querylocaleversion(). to Add querylocaleversion(3) to libc..Sep 14 2018, 1:06 AM
munro_ip9.org added subscribers: yuripv, kib.

Please provide the diff with full context, e.g. by using -U with number of lines big enough to include full file contents if you are using git diff or git log -p.

yuripv added inline comments.Sep 14 2018, 6:17 AM
lib/libc/locale/querylocaleversion.3
45 ↗(On Diff #48024)

Please run mandoc -Tlint on the man pages you are changing, e.g. it would tell you to start new sentences from new lines for this one :-) Also, it's likely should be .Dv LC_COLLATE_MASK.

49 ↗(On Diff #48024)

Again, mandoc -Tlint would tell you that these needs to be sorted correctly, first by section, then alphabetically.

usr.bin/localedef/localedef.1
135

New sentence, new line; use .Xr querylocaleversion 3.

203

Sort.

Man page tidying + more context, based on feedback from yuripv.

munro_ip9.org marked 4 inline comments as done.Sep 14 2018, 7:12 AM
munro_ip9.org marked an inline comment as not done.

New version because I missed one detail from yuripv's feedback (.Xr querylocaleversion 3).

munro_ip9.org marked an inline comment as done.Sep 14 2018, 7:23 AM
0mp accepted this revision as: 0mp.Sep 14 2018, 8:12 AM
0mp added a subscriber: 0mp.
0mp added inline comments.
lib/libc/locale/querylocaleversion.3
2 ↗(On Diff #48035)

Historically, the phrase 'All Rights Reserved.' was included in all copyright notices. The BSD releases had it to comply with the Buenos Aires Convention of 1910 in the Americas. With the ratification of the Berne Convention in 2000, it became obsolete. As such, the FreeBSD project recommends that new code omit the phrase and encourages existing copyright holders to remove it. In 2018, the project updated its templates to remove it.

https://www.freebsd.org/internal/software-license.html

51 ↗(On Diff #48035)

Could you cross reference this new manual page from querylocale(3) as well?

usr.bin/localedef/localedef.1
36

Remember to dump Dd.

This revision is now accepted and ready to land.Sep 14 2018, 8:12 AM

Adjust man pages based on feedback from 0mp. While removing "All rights reserved", and after checking some nearby files and the documentation, it seemed that the thing to do with new files was to use my own name on the copyright message. I also added a new line to the querylocaleversion.3 page to mention the purpose of the version string.

This revision now requires review to proceed.Sep 14 2018, 9:53 AM
munro_ip9.org marked 3 inline comments as done.Sep 14 2018, 9:55 AM
yuripv added inline comments.Sep 14 2018, 11:00 AM
lib/libc/locale/collate.c
159

You likely want strlcpy() here to make sure it's NUL-terminated.

usr.bin/localedef/collate.c
1162

Just a nit: I know that current localedef code does that a lot, but still version != NULL would be a proper comparison here.

strncpy->strlcpy, as suggested by yuripv.

munro_ip9.org marked 2 inline comments as done.Sep 14 2018, 11:53 AM
munro_ip9.org added inline comments.
usr.bin/localedef/collate.c
1162

Hmm well it'd look odd next to all the nearby uses of implicit NULL comparison in this file, so I think it's probably better to keep it that way.

munro_ip9.org marked an inline comment as done.Sep 14 2018, 11:54 AM

Looks good to me, FWIW.

munro_ip9.org marked an inline comment as done.Sep 14 2018, 11:57 AM
kib added a comment.Sep 14 2018, 1:59 PM

I am not sure is it a good idea at all, but still cannot resist explaining it.

You could add a bit to the LC_XXX_MASK and make querylocale() returning either string or numeric representation of the version when the bit is set together with the real locale category bit.
BTW, how does querylocaleversion() behaves WRT locales created dynamically by newlocale(3) ?

In D17166#366077, @kib wrote:

You could add a bit to the LC_XXX_MASK and make querylocale() returning either string or numeric representation of the version when the bit is set together with the real locale category bit.

That is a nice idea. querylocale(LC_COLLATE_MASK | LC_VERSION_MASK, my_locale). It's attractive because it avoids introducing a new libc function, and yet it still makes a lot of sense as an interface. Note that unlike uselocale(3) etc, querylocale(3) is not part of POSIX 1003.1-2018 (we apparently borrowed it from macOS because libc++ needed it, see r227753). If we go this way my proposal to POSIX will be slightly wider in scope. Maybe that's a good thing. I will try this idea out tomorrow and report back.

BTW, how does querylocaleversion() behaves WRT locales created dynamically by newlocale(3) ?

It works. You are allowed to mix and match components from different locales, which is why my proposal requires you to ask for the version for one specific category. So if you start with en_US and replace its LC_COLLATE with the one from fr_FR, you'll have a locale that does American style money formatting but French style sorting. That is: querylocale(LC_MONETARY_MASK, loc) -> "en_US", querylocale(LC_COLLATE_MASK, loc) -> "fr_FR", querylocaleversion(LC_COLLATE_MASK, loc) -> [the version from fr_FR's LC_COLLATE file].

munro_ip9.org retitled this revision from Add querylocaleversion(3) to libc. to Add a way to get version information from querylocale(3)..
munro_ip9.org edited the test plan for this revision. (Show Details)

Instead of introducing a new function querylocaleversion(3), teach the existing querylocale(3) to return versions, as suggested by kib. I think I prefer it this way, but I'm not sure if it's more or less likely to survive contact with the standards process.

yuripv removed a subscriber: yuripv.Sep 15 2018, 9:01 AM
kib added inline comments.Sep 15 2018, 1:28 PM
lib/libc/locale/querylocale.3
52

Do you intent to only allow to detect change, or also to define a 'newer than' relation ?

munro_ip9.org added inline comments.Sep 15 2018, 9:26 PM
lib/libc/locale/querylocale.3
52

I don't think 'newer than' would be very useful on its own, and it would require a complicated specification of the version string which wouldn't work well with my goal of convincing other operating systems to adopt this via a the standards process. Consider that not everyone uses CLDR, and sometimes OSes change to a different source of collation data (as we did). (A much more ambitious project would be to support multiple versions installed at the same time, and then you might want to be able to order them, but I don't have the resources to propose that; I merely want to fix the problem of silent changes that result in corrupted persistent data, because that is very arguably a "defect" in the standard).

bapt accepted this revision.Sep 18 2018, 9:02 AM
This revision is now accepted and ready to land.Sep 18 2018, 9:02 AM
feld added a subscriber: feld.Jan 12 2019, 7:43 PM