Page MenuHomeFreeBSD

humanize_number(3): fix edge case for rounding 999.5+
ClosedPublic

Authored by kaktus on Dec 21 2017, 9:30 PM.
Tags
Referenced Files
Unknown Object (File)
Wed, Feb 28, 10:46 PM
Unknown Object (File)
Feb 17 2024, 5:49 PM
Unknown Object (File)
Feb 1 2024, 10:34 PM
Unknown Object (File)
Feb 1 2024, 10:34 PM
Unknown Object (File)
Feb 1 2024, 10:33 PM
Unknown Object (File)
Jan 30 2024, 6:59 AM
Unknown Object (File)
Jan 30 2024, 6:59 AM
Unknown Object (File)
Jan 30 2024, 6:59 AM

Details

Summary

Fix for remainder overflow, when in rare cases adding remainder to divider exceeded 1 and turned the total to 1000 in final formatting, taking up the space for the unit character.

Issue was caused by the way the final value was calculated in snprintf call, where remainder and divisor/2 was added back to the divided number. If remainder + divisor/2 was larger than 1024, it added 1 to the final value. If the final value as already 999 (as in the example reported), that brought it to 1000. If the buffer length provided was 4+1 (as is the case with ls), that left no space for the unit character.
Same issue was also present for other numbers if too small buffer lengths where used.

The fix continues the division of the original number if the above case happens -- added the appropriate check to the for loop performing the division. This lowers the value shown, to make it fit into the buffer space provided (1.0M for 4+1 character buffer, as used by ls).

Add test case for the reported bug and extend test program to support providing buffer length (ls -lh uses 5, tests hard-coded 4).

Reported by: Masachika ISHIZUKA <ish@amail.plala.or.jp>
Submitted by: Pawel Biernacki <pawel.biernacki@gmail.com>
Sponsored by: Mysterious Code Ltd.
PR: 224498
MFC: 1 week

Test Plan

As per PR:

dd if=/dev/zero of=test-around-1mb bs=1023500 count=1

Return:
Old version:
-rw-r--r-- 1 kaktus kaktus 1000 Dec 21 21:26 test-around-1mb

New version:
-rw-r--r-- 1 kaktus kaktus 1.0M Dec 21 21:27 test-around-1mb

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

I don't understand the actual algorithmic change, but given the strong test coverage, I think I'm ok with this. Some style nits below.

lib/libutil/humanize_number.c
148–150

I think the previous indentation was correct.

lib/libutil/tests/humanize_number_test.c
212

Extra space after HN_AUTOSCALE

446–449

This style change seems unrelated

Address @cem comments about style(9).

In D13578#283989, @cem wrote:

I don't understand the actual algorithmic change, but given the strong test coverage, I think I'm ok with this. Some style nits below.

Issue was caused by the way the final value was calculated in snprintf call, where remainder and divisor/2 was added back to the divided number. If remainder + divisor/2 was larger than 1024, it added 1 to the final value. If the final value as already 999 (as in the example reported), that brought it to 1000. If the buffer length provided was 4 (as is the case with ls), that left no space for the unit character.
Same issue was also present for other numbers if too small buffer lengths where used.

The fix continues the division of the original number if the above case happens -- added the appropriate check to the for loop performing the division. This lowers the value shown, to make it fit into the buffer space provided (1.0M for 4 character buffer, as used by ls).

Thank you for patch.
I apply this patch to 12.0-CURRENT r327074, the right value is shown with ls -lh.

In D13578#283995, @pawel.biernacki-gmail.com wrote:
In D13578#283989, @cem wrote:

I don't understand the actual algorithmic change, but given the strong test coverage, I think I'm ok with this. Some style nits below.

Issue was caused by the way the final value was calculated in snprintf call, where remainder and divisor/2 was added back to the divided number. If remainder + divisor/2 was larger than 1024, it added 1 to the final value. If the final value as already 999 (as in the example reported), that brought it to 1000. If the buffer length provided was 4 (as is the case with ls), that left no space for the unit character.
Same issue was also present for other numbers if too small buffer lengths where used.

The fix continues the division of the original number if the above case happens -- added the appropriate check to the for loop performing the division. This lowers the value shown, to make it fit into the buffer space provided (1.0M for 4 character buffer, as used by ls).

Please add this description to the proposed commit log. Looks good to me!

This revision is now accepted and ready to land.Dec 22 2017, 3:55 AM
In D13578#283995, @pawel.biernacki-gmail.com wrote:
In D13578#283989, @cem wrote:

I don't understand the actual algorithmic change, but given the strong test coverage, I think I'm ok with this. Some style nits below.

Issue was caused by the way the final value was calculated in snprintf call, where remainder and divisor/2 was added back to the divided number. If remainder + divisor/2 was larger than 1024, it added 1 to the final value. If the final value as already 999 (as in the example reported), that brought it to 1000. If the buffer length provided was 4 (as is the case with ls), that left no space for the unit character.
Same issue was also present for other numbers if too small buffer lengths where used.

The fix continues the division of the original number if the above case happens -- added the appropriate check to the for loop performing the division. This lowers the value shown, to make it fit into the buffer space provided (1.0M for 4 character buffer, as used by ls).

In one branch of the snprintf() use, we add divisor / 2 and remainder * 10, not remainder. Does this case need to be handled differently ?

In D13578#284349, @kib wrote:

In one branch of the snprintf() use, we add divisor / 2 and remainder * 10, not remainder. Does this case need to be handled differently ?

No, as the change only affects the number of times large values are divided - like in the ls example, 1000K is divided once more into 1.0MB to allow it to fit into a small buffer. If the function is executed with a bigger buffer (length 6, instead of ls default 5), the return value will go back to "1000K". The for loop never lets the value go below 9 (quotient >= max || (quotient == max - 1- and max is never lower than 10) - which is what the other branch with snprintf is expecting. I tested that values that trigger that branch (for example, 8499 = 8.3K) return correct results.

This fix does however affect how too short buffers are handled. If buffer length of 4 or less is provided for numbers larger than 999, those cannot be fitted into the 3 characters (3 + null = 4) and will always return incorrect results. For example, value 102298 with buffer length 4 in the old code returned "100" - "100K" cut to 3 characters. This cannot in anyway be fitted into 3 characters, as even dividing it further (which gives 0.09M) will not fit into 3 characters. The new code, does force the division to happen, so the return value changes from "100" to "0.1" - rounding up of 0.09 (when HN_DECIMAL is used) or "0M" - cutting the decimal part (when HN_DECIMAL is not used).
In all cases the return value is not correct (and cannot be correct), but after the fix it's just incorrect in another way. This is probably another bug that could be looked at - the function should maybe return -1 to report an error to the user if the buffer is not long enough to fit the number? The same happens with buffer lengths of 5 or less when HN_IEC_PREFIXES flag is used, as the unit takes away another character space.

kaktus edited the summary of this revision. (Show Details)
kaktus added a reviewer: manpages.

Add CAVEATS section to the man page to point out the rounding issue with too short buffers.

This revision now requires review to proceed.Dec 23 2017, 6:08 PM

Man: start a sentence from new line.

This revision is now accepted and ready to land.Dec 28 2017, 3:34 PM