printf '\u2A7D\n' emitted "u2A7D" instead of ⩽. Add \uHHHH and \UHHHHHHHH handling to escape(), encoding via wctomb(3);
reject surrogates and values above 0x10FFFF. The /bin/sh builtin, which shares this source, is fixed too.
Details
regression test included (cf https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=263210)
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
Am I missing something or does POSIX at https://pubs.opengroup.org/onlinepubs/9799919799/utilities/printf.html not list this feature? This version of POSIX does list dollar-single quotes at https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html#tag_19_02_04 but even there without \u and \U. I'm not really opposed to this feature but we shouldn't say it's specified by POSIX when it isn't.
| printf/printf.c | ||
|---|---|---|
| 551–563 | In most cases, POSIX allows messages to standard error ("diagnostic messages") only if the utility's exit status is also non-zero. For example, printf %@ writes a message to standard error and returns status 1 and all existing unrecognized or invalid backslash sequences are handled without a message. | |
| 557–569 | Octal sequences above make sure printf "\045" writes a percent sign without errors. I would say the same should apply to \u0025 and \U00000025 and that should also be tested. | |
| 558–559 | I'm curious what this does for historical locales like en_US.ISO8859-1 and en_US.ISO8859-15. For example, ideally, \u20AC should expand to a question mark in `en_US.ISO8859-1 and to a byte A4 in en_US.ISO8859-15. | |
| 567–569 | Perhaps use memcpy(store, mb, n); and store += n - 1; instead of a loop. | |
| 567–569 | If you have exotic character encodings, could it be possible that something like \uA1 expands to more than four bytes? This would cause data corruption and/or buffer overflow. | |
| printf/tests/regress.sh | ||
| 30 | Apart from \u0025 and \U00000025 mentioned above, we should also test \U00002A7D, \U2A7D, \u alone, \U alone, \u25 and \U25. | |