Page MenuHomeFreeBSD

Avoid overflow in blist_create, remove swap pager checks before blist_create
ClosedPublic

Authored by dougm on Jul 19 2020, 10:44 PM.

Details

Summary

In blist.h, BLIST_BMAP_RADIX and BLIST_META_RADIX have the same
value. There's no longer a need for both names, so rename them both to
BLIST_RADIX. Simplify formulas that compute with both of them, and
rewrite comments that describe them.

Make a similar change for BLIST_BMAP_MASK and BLIST_META_MASK.

In blist_create, modify the use of the variable radix so that it takes
on values 1, 64, 4096, ... instead of 64, 4096, 262144, ..., in order
to avoid overflow when the next power of 64 greater than blocks is too
big to represent. The value of bl->bl_radix is similarly reduced by a
factor of BLIST_RADIX.

Change the blist_meta_* functions that take a radix parameter to
handle one reduced by a factor of BLIST_RADIX. When necessary, handle
overflows in masking off all but the first radix*BLIST_RADIX bits of a
number, when that's masking off all the bits.

Fix compiler warnings in blist test code. Add a test for blist_create
failure to the blist test code. Change the blist_test code size
parameter to allow for larger test cases.

Remove from swaponsomething the checks for overflow in the blist
code. The only failure I can think of is from a failed malloc.

Test Plan

I seek to avoid breaking anything, while fixing something I broke a while ago, when I inadvertently reduced max swap size from 256GB to 65536MB. I hope I've eliminated any restriction that the blist code imposes on max swap size.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

dougm created this revision.
dougm added a subscriber: pho.

I ran a preliminary 2 hour test with D25736.74672.diff.

$ swapinfo
Device          1K-blocks     Used    Avail Capacity
/dev/da0p4       70254592        0 70254592     0%
$ gpart show da0 | grep swap
  1027604514   140509184    4  freebsd-swap  (67G)
 $
sys/kern/subr_blist.c
40–41 ↗(On Diff #74672)

"... in a leaf node and the number of descendants of a meta (interior) node."

66 ↗(On Diff #74672)

"... lower-level ..."

72 ↗(On Diff #74672)

"... lower-level ..."

249 ↗(On Diff #74672)

for (radix = 1; ...?

913 ↗(On Diff #74672)

This is to handle overflow in the radix * BLIST_RADIX computation? If so, then say so in a comment.

1024 ↗(On Diff #74672)

Ditto.

dougm edited the summary of this revision. (Show Details)

Make suggested changes.

Transform a loop in blst_next_leaf_alloc from while to do..while.

Remove a redundant assignment before a loop. Change a while loop to a for loop in blst_stats.

Reformat a comment. Change a couple of while loops into for loops.

I can't provide a competent review of the blist changes on the fly, but the description makes sense. I'd offer to do more swap testing, but you've tested to about the same extent I meaningfully can. Do we have a tool which can meaningfully exercise swap for test purposes (maybe using mdconfig(8)?)? I have a ton of swap allocated, but don't touch it because I also have an abundance of RAM. Anything useful I can do? Thanks for digging in to this!!

I have started a full stress2 test, which will take two days to complete.

sys/kern/subr_blist.c
76 ↗(On Diff #74719)

BLIST_RADIX ?

I completed a full stress2 test with D25736.74719.diff on r363390.
No problems seen.

In D25736#570868, @pho wrote:

I completed a full stress2 test with D25736.74719.diff on r363390.
No problems seen.

How large the configured swap size was ? And how much of it was used (approximately) ?

I think this patch mostly needs Marius' test sort /dev/zero.

In D25736#570891, @kib wrote:
In D25736#570868, @pho wrote:

I completed a full stress2 test with D25736.74719.diff on r363390.
No problems seen.

How large the configured swap size was ? And how much of it was used (approximately) ?

$ ./swapused.sh 
FreeBSD t2.osted.lan 13.0-CURRENT FreeBSD 13.0-CURRENT #0 r363443M: Thu Jul 23 13:01:56 CEST 2020     pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO  amd64
swapinfo -h
Device              Size     Used    Avail Capacity
/dev/da0p4           67G       0B      67G     0%

swap disk
  1027604514   140509184    4  freebsd-swap  (67G)
13:08:48 0%
13:10:02 1%
13:10:09 2%
:
13:18:33 32%
13:18:55 33%

I think this patch mostly needs Marius' test sort /dev/zero.

This test is included in stress2 as sort.sh

In D25736#570896, @pho wrote:
In D25736#570891, @kib wrote:
In D25736#570868, @pho wrote:

I completed a full stress2 test with D25736.74719.diff on r363390.
No problems seen.

How large the configured swap size was ? And how much of it was used (approximately) ?

$ ./swapused.sh 
FreeBSD t2.osted.lan 13.0-CURRENT FreeBSD 13.0-CURRENT #0 r363443M: Thu Jul 23 13:01:56 CEST 2020     pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO  amd64
swapinfo -h
Device              Size     Used    Avail Capacity
/dev/da0p4           67G       0B      67G     0%

swap disk
  1027604514   140509184    4  freebsd-swap  (67G)
13:08:48 0%
13:10:02 1%
13:10:09 2%
:
13:18:33 32%
13:18:55 33%

I think this patch mostly needs Marius' test sort /dev/zero.

This test is included in stress2 as sort.sh

67G is on the edge of the previous max 64G. Could you try only the Marius' test with much larger swap, say 128G or even 512G ?

In D25736#570900, @kib wrote:
In D25736#570896, @pho wrote:
In D25736#570891, @kib wrote:
In D25736#570868, @pho wrote:

I completed a full stress2 test with D25736.74719.diff on r363390.
No problems seen.

How large the configured swap size was ? And how much of it was used (approximately) ?

$ ./swapused.sh 
FreeBSD t2.osted.lan 13.0-CURRENT FreeBSD 13.0-CURRENT #0 r363443M: Thu Jul 23 13:01:56 CEST 2020     pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO  amd64
swapinfo -h
Device              Size     Used    Avail Capacity
/dev/da0p4           67G       0B      67G     0%

swap disk
  1027604514   140509184    4  freebsd-swap  (67G)
13:08:48 0%
13:10:02 1%
13:10:09 2%
:
13:18:33 32%
13:18:55 33%

I think this patch mostly needs Marius' test sort /dev/zero.

This test is included in stress2 as sort.sh

67G is on the edge of the previous max 64G. Could you try only the Marius' test with much larger swap, say 128G or even 512G ?

swapinfo -h
Device              Size     Used    Avail Capacity
/dev/da0p4           67G       0B      67G     0%
/dev/ada0p3          64G       0B      64G     0%
/dev/da0p5          1.0G       0B     1.0G     0%
Total               132G       0B     132G     0%

14:42:22 0%
14:43:07 1%
14:43:35 2%
14:44:21 9%
:
14:47:19 20%
14:47:41 21%
14:47:51 22%

Sorry I did not stated that explicitly, the large swap should be a single volume. Blists are allocatd per swap device.

In D25736#570929, @kib wrote:

Sorry I did not stated that explicitly, the large swap should be a single volume. Blists are allocatd per swap device.

Ah, OK.

swapinfo -h
Device              Size     Used    Avail Capacity
/dev/ada0p2         224G       0B     224G     0%

15:32:42 0%
15:33:15 1%
15:33:23 2%
:
15:36:37 20%
15:36:59 22%
15:38:58 31%
alc added inline comments.
sys/kern/subr_blist.c
73 ↗(On Diff #74752)

Move to the end of the previous, now-shorter line?

581 ↗(On Diff #74752)

Indented by one space too many.

582 ↗(On Diff #74752)

Ditto, and I would put the semicolon on its own line for better visibility.

667 ↗(On Diff #74752)

Indentation is incorrect.

668 ↗(On Diff #74752)

Ditto.

sys/sys/blist.h
88 ↗(On Diff #74752)

Add spaces around '*'.

This revision is now accepted and ready to land.Jul 25 2020, 6:03 PM