Page MenuHomeFreeBSD

locale: use per-thread-locale's __mb_sb_limit
Needs ReviewPublic

Authored by yuripv on Sep 21 2023, 1:39 AM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 1 2024, 4:54 PM
Unknown Object (File)
Oct 3 2024, 4:01 PM
Unknown Object (File)
Sep 26 2024, 4:41 PM
Unknown Object (File)
Sep 4 2024, 3:06 PM
Unknown Object (File)
Aug 6 2024, 3:29 PM
Unknown Object (File)
Jul 6 2024, 12:30 PM
Unknown Object (File)
Jul 3 2024, 12:12 PM
Unknown Object (File)
Jul 3 2024, 11:44 AM
Subscribers

Details

Reviewers
bapt
kevans
kib
Summary

As this adds (yet more stupid) public symbols to libc, I want to make sure this is acceptable before going further with testing on linux cross build.

__mb_sb_limit which defines the highest character that isctype
functions can process describes the global locale as it was introduced
before the xlocale work. It becomes a problem when global locale is set
to the one with e.g. UTF-8 encoding which sets __mb_sb_limit to 128 from
the original 256, and if per-thread locale is set to e.g. ISO8859-1 one,
all of the isctype functions will fail early for [128-255] range even
if the type check should succeed for a character.

To fix this, make isctype functions query the per-thread locale via the
added ___mb_sb_limit function. Sadly this requires making that symbol
public to not expose even more of xlocale internals as isctype helper
functions are inlined.

The change broke bootstrap localedef build (on FreeBSD, Linux cross
buld not tested yet) as it was including _ctype.h from the source tree
which it should not really do anyway. Fix this by extracting the needed
_CTYPE_* defines from the include/_ctype.h.

PR: 265950

Test Plan

See added test case.

buildworld/installworld/run the system

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

yuripv edited the summary of this revision. (Show Details)
yuripv edited the test plan for this revision. (Show Details)
yuripv added reviewers: bapt, kevans, kib.

complete change (except for linux cross build testing)

yuripv retitled this revision from pr265950 draft to locale: use per-thread-locale's __mb_sb_limit.
include/_ctype.h
97
lib/libc/Versions.def
52 ↗(On Diff #127619)

Please commit Versions.def part separately, for easier MFC.

  • Versions.def updated in separate commit
  • drop extern as suggested by Konstantin

Seems reasonable to me... kib@ is a lot pickier on this stuff (for good reason) than I. Once he's happy, I'm happy.
Are there any locale maintainers we need to loop into the review? I can't think of any, but we should check before rather than after the commit :)

I do not know much about runes/locales either.

include/_ctype.h
99

Why do you need this definition? Since you have to change all places, why not call the function by its real name?

In D41927#956261, @kib wrote:

I do not know much about runes/locales either.

I'm not that worried about the fix itself, more about adding the public symbol to libc, and wanted to hear from you if this is acceptable.

include/_ctype.h
99

This is not really needed, I was going after MB_CUR_MAX design at first, and then just kept it here as "looks better in the checks".

In D41927#956261, @kib wrote:

I do not know much about runes/locales either.

I'm not that worried about the fix itself, more about adding the public symbol to libc, and wanted to hear from you if this is acceptable.

If the symbol is needed, why not? The concern is that we must support it forever, with the same or compatible semantic. But then again, this is the question for the locale code maintenance.

In D41927#956266, @kib wrote:
In D41927#956261, @kib wrote:

I do not know much about runes/locales either.

I'm not that worried about the fix itself, more about adding the public symbol to libc, and wanted to hear from you if this is acceptable.

If the symbol is needed, why not? The concern is that we must support it forever, with the same or compatible semantic. But then again, this is the question for the locale code maintenance.

I *think* I have a better idea that does not need adding new symbols in D41985.