Page MenuHomeFreeBSD

Fix alignment in ls -l with cldr locales
ClosedPublic

Authored by bapt on Nov 21 2015, 12:34 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mar 19 2024, 6:14 AM
Unknown Object (File)
Mar 2 2024, 12:06 PM
Unknown Object (File)
Mar 1 2024, 1:30 AM
Unknown Object (File)
Feb 22 2024, 1:50 PM
Unknown Object (File)
Feb 11 2024, 3:24 PM
Unknown Object (File)
Feb 7 2024, 8:45 AM
Unknown Object (File)
Jan 28 2024, 4:27 AM
Unknown Object (File)
Jan 28 2024, 12:49 AM
Subscribers

Details

Summary

month abbreviation can be of various length, hence it breaks the alignement of
ls -l
Pad abbreviated month with space to the maximum possible length of the
abbreviated month for the given locale.
This way for locale C the output is preserved.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

bapt retitled this revision from to Fix alignment in ls -l with cldr locales.
bapt updated this object.
bapt edited the test plan for this revision. (Show Details)
bapt added a reviewer: jilles.

Test multibyte functions return
ensure we do count visible columns and not the len

Really handle double width and zero width unicode

Fix with locale ja_JP.UTF-8

Use wcwidth instead of wcswidth given we are testing only 1 wchar
if wcwidth fails (return -1) then tuncate the provided month to the
last valid characters found.
Use more efficient and readable method for padding with spaces
Rename max_width_month into max_month_width for consistency
Only populate abbreviated month if %b is actually used in the strftime format
If anything goes wrong with mbstowcs/wcstombs then fallback on non aligned
localized output

Fix the fallback on mbstowcs

Fix the tuncate fallback (wcwidth == -1 case)

if wcwidth fails, fallback like when other multibyte functions fails.
Use wcsncat rather than wcslcat because we do not care about the size of the
buffer but rather care about the number of spaces we have to append

Hi Baptiste,

I hope these comments help. Be sure to get in touch if I haven't been clear enough. :-)

Ed

bin/ls/print.c
110 ↗(On Diff #10493)

As it is used for the array sizes, this should be called MAX_ABMON_SIZE instead; not _WIDTH. Most other limits usually use _MAX as a suffix (<limits.h>). Maybe ABMON_SIZE_MAX is better?

That said, it may well be the case that implementing this functionality without any fixed limits does not require substantially more code...

113 ↗(On Diff #10493)

It looks like this array is only an intermediate product of populate_abbreviated_month(). Maybe move it into that function instead? The multiplication by MB_LEN_MAX also looks a bit odd and should likely be removed.

158 ↗(On Diff #10493)

My criticism on this function is that it seems overly complex. In theory, we only need to do two things:

  • Compute the visible width of all the months.
  • Based on that, determine the amount of spaces we need to add to the end of the month names, so that they are all aligned.

In the end the only thing that you should theoretically yield is a table that says: for month x we need y additional spaces at the end. This function currently does a conversion back and forth, which isn't needed. You could just use the original strings that were returned by nl_langinfo().

Computing the visible width can be done without keeping track of a wide character string buffer. You could just call mbrtowc() and wcwidth() in a loop. Maybe put that in a separate function above called mbswidth()?

Also, please don't do arithmetic on ABMON_1. Maybe you could consider adding a utility function like this:

const char *get_abmon(int mon) {
  switch (mon) {
  case 0:
    return nl_langinfo(ABMON_1);
  ...

The optimizer is smart enough to rewrite it to nl_langinfo(ABMON_1 + i) in case all twelve constants are consecutive.

509 ↗(On Diff #10493)

Possible alternative: instead of storing strings with space padding at the end, you could also use printf() to do the padding for you:

snprintf(...., "%s%*s", ab_months[tm->tm_mon], amount_of_padding_needed_for_month[tm->tm_mon]);
513 ↗(On Diff #10493)

What happens if the name of the month contains an ampersand? Fortunately it's strftime(), so it won't try to fetch random garbage from the stack...

More and more I get the feeling we're trying to solve a problem in ls(1) that should be solved in strftime(3), for example by adding a special modifier. But that may be too much work for now.

I tried various locales and the output looks good to me. I found one minor nit in the code.

bin/ls/print.c
508 ↗(On Diff #10493)

This can be nested in the previous if, avoiding the extra posb != NULL check.

Trying to use phabricator :)

bin/ls/print.c
509 ↗(On Diff #10493)

I agree that doing all padding directly in snprintf() will be better than prepare them earlier. It not only save space, but eliminate one array with really unknown element length - wab_months[]. Second array ab_months[] with unknown element length still exists and better should be eliminated too with malloc()ed elements.

bapt edited edge metadata.

Complete change the approach to match @ed proposal

bapt marked 4 inline comments as done.Dec 5 2015, 2:18 PM

Rename populate_ to compute_

Fix error checking for mbrtowc

shorten get_abmon and return abort() is passed an argument higher than 12

Discussed with: @jilles

ache requested changes to this revision.Dec 5 2015, 3:37 PM
ache edited edge metadata.
ache added inline comments.
bin/ls/print.c
180 ↗(On Diff #10794)

wcwidth() can return negative here still. bugcheck is needed (return(-1))

504 ↗(On Diff #10794)

Debugging printf in the line above?

This revision now requires changes to proceed.Dec 5 2015, 3:37 PM
bapt edited edge metadata.

Remove debugging printf
Test return of wcwidth

ed edited edge metadata.

Fine by me, but please take a look at my comment. :-)

bin/ls/print.c
111 ↗(On Diff #10794)

Do we need to keep track of the maximum size explicitly? We currently only test against this to see whether we've already initialized. Maybe it would be better to replace it by a static bool padding_for_month_initialized or something?

bapt marked 2 inline comments as done.Dec 5 2015, 3:52 PM
bin/ls/print.c
111 ↗(On Diff #10796)

Well we need 3 states: initialized, non initialized, fail to initialize (to not try using it in that case)

bin/ls/print.c
500 ↗(On Diff #10796)

Maybe BUFSIZ is not the right limit here? Maybe NL_TEXTMAX?

bin/ls/print.c
500 ↗(On Diff #10796)

NL_TEXTMAX surprisingly not for nl_* but for catgets()

ache requested changes to this revision.Dec 5 2015, 3:59 PM
ache edited edge metadata.
ache added inline comments.
bin/ls/print.c
196 ↗(On Diff #10796)

You can just call wcswidth(get_abmon(i), INT_MAX) and drop its replacement.

This revision now requires changes to proceed.Dec 5 2015, 3:59 PM
bin/ls/print.c
196 ↗(On Diff #10796)

No get_abmon(i) is not widechar

bin/ls/print.c
196 ↗(On Diff #10796)

Of course with conversion. I am not sure what we win here right now, but potentially wcswidth() can be faster than wcwidth() wchar by wchar.

bin/ls/print.c
111 ↗(On Diff #10796)

As discussed on IRC briefly: you can still treat it as two states. If we fail to initialize, you could just treat it as if the initialization succeeded, but that all padding values are set to 0.

bin/ls/print.c
196 ↗(On Diff #10796)

Forget it. Conversion will need another buffer with unknown length.

bin/ls/print.c
196 ↗(On Diff #10796)

yup, well I can malloc, but I have tested both and haven't seen any performance change.

So are you ok with the current version of the patch?

This revision was automatically updated to reflect the committed changes.
In D4239#90029, @bapt wrote:

Fix with locale ja_JP.UTF-8

The fix itself seems good to me. It fixes alignment for the locales which abbreviated month are left aligned.
Unfortunately, abbreviated month of the CJK locales are right aligned. So, we still need existing left padding for the CJK locales.
Just FYI.