month abbreviation can be of various length, hence it breaks the alignement of
ls -l
Pad abbreviated month with space to the maximum possible length of the
abbreviated month for the given locale.
This way for locale C the output is preserved.
Details
- Reviewers
jilles ache ed - Commits
- rS291959: Fix ls -l alignement with new locales
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
Use wcwidth instead of wcswidth given we are testing only 1 wchar
if wcwidth fails (return -1) then tuncate the provided month to the
last valid characters found.
Use more efficient and readable method for padding with spaces
Rename max_width_month into max_month_width for consistency
Only populate abbreviated month if %b is actually used in the strftime format
If anything goes wrong with mbstowcs/wcstombs then fallback on non aligned
localized output
if wcwidth fails, fallback like when other multibyte functions fails.
Use wcsncat rather than wcslcat because we do not care about the size of the
buffer but rather care about the number of spaces we have to append
Hi Baptiste,
I hope these comments help. Be sure to get in touch if I haven't been clear enough. :-)
Ed
bin/ls/print.c | ||
---|---|---|
110 ↗ | (On Diff #10493) | As it is used for the array sizes, this should be called MAX_ABMON_SIZE instead; not _WIDTH. Most other limits usually use _MAX as a suffix (<limits.h>). Maybe ABMON_SIZE_MAX is better? That said, it may well be the case that implementing this functionality without any fixed limits does not require substantially more code... |
113 ↗ | (On Diff #10493) | It looks like this array is only an intermediate product of populate_abbreviated_month(). Maybe move it into that function instead? The multiplication by MB_LEN_MAX also looks a bit odd and should likely be removed. |
158 ↗ | (On Diff #10493) | My criticism on this function is that it seems overly complex. In theory, we only need to do two things:
In the end the only thing that you should theoretically yield is a table that says: for month x we need y additional spaces at the end. This function currently does a conversion back and forth, which isn't needed. You could just use the original strings that were returned by nl_langinfo(). Computing the visible width can be done without keeping track of a wide character string buffer. You could just call mbrtowc() and wcwidth() in a loop. Maybe put that in a separate function above called mbswidth()? Also, please don't do arithmetic on ABMON_1. Maybe you could consider adding a utility function like this: const char *get_abmon(int mon) { switch (mon) { case 0: return nl_langinfo(ABMON_1); ... The optimizer is smart enough to rewrite it to nl_langinfo(ABMON_1 + i) in case all twelve constants are consecutive. |
509 ↗ | (On Diff #10493) | Possible alternative: instead of storing strings with space padding at the end, you could also use printf() to do the padding for you: snprintf(...., "%s%*s", ab_months[tm->tm_mon], amount_of_padding_needed_for_month[tm->tm_mon]); |
513 ↗ | (On Diff #10493) | What happens if the name of the month contains an ampersand? Fortunately it's strftime(), so it won't try to fetch random garbage from the stack... More and more I get the feeling we're trying to solve a problem in ls(1) that should be solved in strftime(3), for example by adding a special modifier. But that may be too much work for now. |
I tried various locales and the output looks good to me. I found one minor nit in the code.
bin/ls/print.c | ||
---|---|---|
508 ↗ | (On Diff #10493) | This can be nested in the previous if, avoiding the extra posb != NULL check. |
Trying to use phabricator :)
bin/ls/print.c | ||
---|---|---|
509 ↗ | (On Diff #10493) | I agree that doing all padding directly in snprintf() will be better than prepare them earlier. It not only save space, but eliminate one array with really unknown element length - wab_months[]. Second array ab_months[] with unknown element length still exists and better should be eliminated too with malloc()ed elements. |
shorten get_abmon and return abort() is passed an argument higher than 12
Discussed with: @jilles
Fine by me, but please take a look at my comment. :-)
bin/ls/print.c | ||
---|---|---|
111 ↗ | (On Diff #10794) | Do we need to keep track of the maximum size explicitly? We currently only test against this to see whether we've already initialized. Maybe it would be better to replace it by a static bool padding_for_month_initialized or something? |
bin/ls/print.c | ||
---|---|---|
111 ↗ | (On Diff #10796) | Well we need 3 states: initialized, non initialized, fail to initialize (to not try using it in that case) |
bin/ls/print.c | ||
---|---|---|
500 ↗ | (On Diff #10796) | Maybe BUFSIZ is not the right limit here? Maybe NL_TEXTMAX? |
bin/ls/print.c | ||
---|---|---|
500 ↗ | (On Diff #10796) | NL_TEXTMAX surprisingly not for nl_* but for catgets() |
bin/ls/print.c | ||
---|---|---|
196 ↗ | (On Diff #10796) | You can just call wcswidth(get_abmon(i), INT_MAX) and drop its replacement. |
bin/ls/print.c | ||
---|---|---|
196 ↗ | (On Diff #10796) | No get_abmon(i) is not widechar |
bin/ls/print.c | ||
---|---|---|
196 ↗ | (On Diff #10796) | Of course with conversion. I am not sure what we win here right now, but potentially wcswidth() can be faster than wcwidth() wchar by wchar. |
bin/ls/print.c | ||
---|---|---|
111 ↗ | (On Diff #10796) | As discussed on IRC briefly: you can still treat it as two states. If we fail to initialize, you could just treat it as if the initialization succeeded, but that all padding values are set to 0. |
bin/ls/print.c | ||
---|---|---|
196 ↗ | (On Diff #10796) | Forget it. Conversion will need another buffer with unknown length. |
bin/ls/print.c | ||
---|---|---|
196 ↗ | (On Diff #10796) | yup, well I can malloc, but I have tested both and haven't seen any performance change. So are you ok with the current version of the patch? |
The fix itself seems good to me. It fixes alignment for the locales which abbreviated month are left aligned.
Unfortunately, abbreviated month of the CJK locales are right aligned. So, we still need existing left padding for the CJK locales.
Just FYI.