Page MenuHomeFreeBSD

printf(1): implement POSIX 2024 \u and \U escape sequences
Needs ReviewPublic

Authored by olivier on Sun, Jun 14, 7:09 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Jun 15, 12:23 PM
Unknown Object (File)
Mon, Jun 15, 12:21 PM
Unknown Object (File)
Sun, Jun 14, 9:37 PM
Subscribers

Details

Summary

printf '\u2A7D\n' emitted "u2A7D" instead of ⩽. Add \uHHHH and \UHHHHHHHH handling to escape(), encoding via wctomb(3);
reject surrogates and values above 0x10FFFF. The /bin/sh builtin, which shares this source, is fixed too.

Test Plan

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

Am I missing something or does POSIX at https://pubs.opengroup.org/onlinepubs/9799919799/utilities/printf.html not list this feature? This version of POSIX does list dollar-single quotes at https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html#tag_19_02_04 but even there without \u and \U. I'm not really opposed to this feature but we shouldn't say it's specified by POSIX when it isn't.

printf/printf.c
551–563

In most cases, POSIX allows messages to standard error ("diagnostic messages") only if the utility's exit status is also non-zero. For example, printf %@ writes a message to standard error and returns status 1 and all existing unrecognized or invalid backslash sequences are handled without a message.

557–569

Octal sequences above make sure printf "\045" writes a percent sign without errors. I would say the same should apply to \u0025 and \U00000025 and that should also be tested.

558–559

I'm curious what this does for historical locales like en_US.ISO8859-1 and en_US.ISO8859-15. For example, ideally, \u20AC should expand to a question mark in `en_US.ISO8859-1 and to a byte A4 in en_US.ISO8859-15.

567–569

Perhaps use memcpy(store, mb, n); and store += n - 1; instead of a loop.

567–569

If you have exotic character encodings, could it be possible that something like \uA1 expands to more than four bytes? This would cause data corruption and/or buffer overflow.

printf/tests/regress.sh
30

Apart from \u0025 and \U00000025 mentioned above, we should also test \U00002A7D, \U2A7D, \u alone, \U alone, \u25 and \U25.