Details

Reviewers

markj
kib
alc

Commits

rG52b35140528c: swap_pager: examine swblks with pctrie iterators

Summary

Avoid walking from the pctrie root while iterating over pctrie elements by using iterators.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

dougm requested review of this revision.Sep 10 2024, 6:24 AM

markj added inline comments.Sep 10 2024, 1:39 PM

sys/vm/swap_pager.c
1901	The control flow here would be less confusing to me if this `i == 0` case handling was moved out of the loop: restart: if (object->flags & OBJ_DEAD) { ... } swblk_iter_init(&blks, object); sb = swblk_start(&blks, pi); swp_pager_init_freerange(&range); for (i = 0;;) { ... } and anywhere the loop drops the VM object look, instead of `i = 0; continue;`, `goto restart;`.
2220	This assignment could be moved into the loop header.
2233–2234	Unrelated to this diff, but, how is it safe to dereference `sb->p` while srcobject is unlocked? That seems wrong. Similarly, I don't think it's safe to access `sb` after reacquiring the lock. Am I missing something?
2243	Would it be possible to have an explicit iter_invalidate operation which resets the path but leaves the current index and limit alone?

dougm added inline comments.Sep 10 2024, 6:02 PM

sys/vm/swap_pager.c
2233–2234	Where swap_pager_copy is called in vm_object_split, all the pages are xbusied first. Where swap_pager_copy is called in vm_object_collapse, vm_object_collapse_scan does a "check for busy page" and calls vm_object_collapse_scan_wait for non-busy objects, which does ... I have no idea. I can change this to save the sb values before releasing the lock, but I don't know that to be correct. I can change this to do a big goto restart after reacquiring the lock. But I'd just be making wild stabs in the dark.
2243	Sure.

Apply suggestions not related to race conditions.

Incorporate meta_transfer changes. Use blk.index for sp->p in several places. Compute 'start' more simply.

I wonder if we could combine swblk_iter_init() (and swblk_iter_limit_init()) and swblk_start(). Or perhaps introduce some macros similar to VM_MAP_ENTRY_FOREACH(). I think that would be straightforward but for swap_pager_find_least(), which expects the caller to initialize the iterator state. I made some inline comments which argue for keeping swap_pager_find_least() as it was for now.

sys/fs/tmpfs/tmpfs_vnops.c
2164	What we really want here is a swap pager operation which returns the next pindex for which there is no swap block. Absent that, I'm not sure it's really worth maintaining an iterator here. (I'm trying to argue here that swap_pager_find_least() should keep its old interface, since I think that permits some simplification of the iterator interface.)
sys/vm/swap_pager.c
1886	Can we have two nested loops instead?
1940	If `m == NULL` here, then the object lock was dropped and the iterator is invalid.
sys/vm/swap_pager.h
43	This should go into the `#ifdef _KERNEL` block, together with the functions that reference it.
sys/vm/vm_object.c
1706	The first part of this loop is searching for the smallest pindex >= `pi` for which there is a resident page or an assigned swap block. I wonder if it'd make sense to have a combined interface which searches for this pindex, since the vm_page scan will eventually also make use of iterators. (And even more radically, I wonder if it'd be possible and sensible to use the same trie for both resident pages and swap blocks.) In other words, I wonder if we should just leave this loop as it was before for now.

Accept as many suggestions as possible, but without defining a FOREACH macro.

dougm added a subscriber: alc.Sep 15 2024, 8:58 PM

dougm added inline comments.Sep 15 2024, 9:12 PM

sys/vm/vm_object.c
1706	@alc was suggesting this reuse just yesterday. He observed that some nodes would have both pages and swap blocks.

Incorporate changes from comments-only commit. Fix calculation of initial start value.

dougm added a reviewer: kib.Sep 21 2024, 10:54 PM

markj added inline comments.Sep 22 2024, 11:12 AM

sys/vm/swap_pager.c
1899	The indentation here is wrong.
1917	Is it safe to access `sb` here, after the object lock was dropped and reacquired? Perhaps the existence of an xbusy page in the object at that pindex is sufficient to provide that guarantee.
sys/vm/vm_object.c
1706	Indeed, we'd need to come up with some encoding for the keys which handles this overlap. I don't have any suggestions offhand. One comment is that instead of storing vm_page pointers, we could likely compress them into a (segid, offset) tuple which should in general require less than 64 bits.

Address @markj concerns.

sys/vm/swap_pager.c
1917	If it's safe now, it might still become unsafe in the future, or if not that, then remain a puzzle that someone else will ask about someday. So I'll rewrite to avoid the use of sb after lock release.

Undo the last non-whitespace change, since it breaks swap_pager_find_least by possibly creating an empty block.

alc added inline comments.Sep 22 2024, 5:29 PM

sys/vm/swap_pager.c
2268–2269	The comment does not match the code.

Update the patch to reflect the changes already checked in to main to address the last couple of comments.

kib accepted this revision.Sep 24 2024, 3:47 AM

This revision is now accepted and ready to land.Sep 24 2024, 3:47 AM

I don't see any correctness problems.

sys/vm/swap_pager.c
571–572	Can we group the iter functions together?
578
1905	The page is not dirtied here; as stated in a now deleted comment that happens after the restart.
1907	... and if (m == NULL) need only occur after vm_page_alloc()
1918–1919	There ought to be a comment explaining that we get here because of an object lock release.
1955–1962	The correctness of updating pi is subtle; it depends on us having already blocked allocation of swap blocks on the devise being shut down. Otherwise, a restart would always need to start from pi==0. I think this deserves a comment.
2205–2206	Why was this deleted?
2255	A comment motivating the reset would be good.

Address @alc feedback.

This revision now requires review to proceed.Sep 24 2024, 7:32 PM

dougm added inline comments.Sep 24 2024, 7:33 PM

sys/vm/swap_pager.c
571–572	Yes.
578	And swblk_next -> swblk_iter_next too.
1955–1962	Added a KASSERT to justify the comment.
2205–2206	A clerical error. Restored.

A performance test. With this patch {F96492700}, which computes stats on calls to swp_pager_meta_free, I ran all 8 stress2 swap tests, recording the stats immediately before and aofter the tests. The stats are 'cycles', 'calls', and 'pages' (freed). The results are:

Original:
vm.stats.swap.pages: 83867620
vm.stats.swap.cycles: 11894179396
vm.stats.swap.calls: 61589970

Modified:
vm.stats.swap.pages: 84177365
vm.stats.swap.cycles: 11623436808
vm.stats.swap.calls: 62086155

Even though, for whatever reason, the test with the modified code freed more pages and made more more function calls, the cycles spent on swp_pager_meta_free fell by 2.276%.

markj accepted this revision.Sep 26 2024, 2:37 PM

This revision is now accepted and ready to land.Sep 26 2024, 2:37 PM

With this patch:
{F96917947}
I compared the cycles consumed freeing 1000000 pages without and with this patch. The results from ministat:

x old
+ new
+------------------------------------------------------------------------------+
|    +   ++ +++ ++   ++ +                              x x x  x x  x x x   x  +|
||_____________M____A__________________|                  |_____A_____|        |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  12      11315490      11602387      11444362      11447090     88533.923
+  12      10575194      11651487      10713880      10792243     283272.81
Difference at 95.0% confidence
        -654847 +/- 177689
        -5.72064% +/- 1.54457%
        (Student's t, pooled s = 209859)

In D46620#1067113, @dougm wrote:

With this patch:
{F96917947}
I compared the cycles consumed freeing 1000000 pages without and with this patch. The results from ministat:

What exactly do you mean by "freeing"? I wouldn't expect this patch on its own to affect how fast we can free pages, or is this including your other iterator patches?

x old
+ new
+------------------------------------------------------------------------------+
|    +   ++ +++ ++   ++ +                              x x x  x x  x x x   x  +|
||_____________M____A__________________|                  |_____A_____|        |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  12      11315490      11602387      11444362      11447090     88533.923
+  12      10575194      11651487      10713880      10792243     283272.81
Difference at 95.0% confidence
        -654847 +/- 177689
        -5.72064% +/- 1.54457%
        (Student's t, pooled s = 209859)

What exactly do you mean by "freeing"? I wouldn't expect this patch on its own to affect how fast we can free pages, or is this including your other iterator patches?

I mean "calling swap_pager_meta_free" after constructing an object and filling it with blocks just for the opportunity to time the freeing of it. This patch does use iterators in swap_pager_meta_free. The timing results do not reflect changes to any file but swap_pager.c.

It's not easy to get a signoff from @alc. Maybe today will be the day. But he demands performance testing.

grahamperrin added a subscriber: grahamperrin.Sep 29 2024, 12:11 AM

alc added inline comments.Sep 29 2024, 6:59 AM

sys/vm/swap_pager.c
1894
1955–1962	Why is this before the call to swblk_iter_next()? Alternatively, does this variable need to exist? Couldn't restart do swblk_iter(&blks, object, blks.index);?
2256	This is not style(9).

Update to account for overlapping changes to swap_pager.c. Address comments from @alc. Drop variable 'pi'.

This revision now requires review to proceed.Sep 29 2024, 7:18 AM

In D46620#1067113, @dougm wrote:

With this patch:
{F96917947}
I compared the cycles consumed freeing 1000000 pages without and with this patch. The results from ministat:

x old
+ new
+------------------------------------------------------------------------------+
|    +   ++ +++ ++   ++ +                              x x x  x x  x x x   x  +|
||_____________M____A__________________|                  |_____A_____|        |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  12      11315490      11602387      11444362      11447090     88533.923
+  12      10575194      11651487      10713880      10792243     283272.81
Difference at 95.0% confidence
        -654847 +/- 177689
        -5.72064% +/- 1.54457%
        (Student's t, pooled s = 209859)

I was hoping for a greater gain, but this is perfectly okay. The way this test is constructed probably represents a best case for swap_pager in the sense of bringing out the difference between the old implementation versus the iterator. The reason being that the swap blocks are contiguous, and so there are fewer subr_blist calls and those calls are freeing large batches. So, the time spent in subr_blist is minimized.

I would use this test to try to optimize the pctrie code underlying swblk_iter_next().

sys/vm/swap_pager.c
2402	From a performance standpoint, I'm skeptical of using iterators in cases like this where there will be 0 or 1 calls to swblk_iter_next(). If there are 0 calls, the cost of setting up the iterator is wasted cycles. If there is 1 call, the savings from the old implementation, i.e., walking the trie again, which should be almost all dcache hits, will be offset by the icache misses for executing a new function, swblk_iter_next(). In contrast, the old implementation simply used the same function twice.

alc accepted this revision.Sep 29 2024, 7:56 AM

This revision is now accepted and ready to land.Sep 29 2024, 7:56 AM

dougm added inline comments.Sep 29 2024, 8:06 AM

sys/vm/swap_pager.c
2402	The two callers of swap_pager_find_least both call it from within loops, and an earlier version of this had those functions passing an iterator into this function, rather than having this function create it's own every time. But @markj rejected that usage.

alc added inline comments.Sep 29 2024, 8:06 AM

sys/vm/swap_pager.c
1918	Don't you need to reset the page iterator too?

Reset the page iterator too.

This revision now requires review to proceed.Sep 29 2024, 8:08 AM

alc accepted this revision.Sep 29 2024, 4:27 PM

alc added inline comments.

sys/vm/swap_pager.c
1920
2258

This revision is now accepted and ready to land.Sep 29 2024, 4:27 PM

Closed by commit rG52b35140528c: swap_pager: examine swblks with pctrie iterators (authored by dougm). · Explain WhySep 29 2024, 4:47 PM

This revision was automatically updated to reflect the committed changes.

dougm marked an inline comment as done.

dougm added a commit: rG52b35140528c: swap_pager: examine swblks with pctrie iterators.

Herald added a subscriber: imp. · View Herald TranscriptSep 29 2024, 4:47 PM

markj added inline comments.Sep 30 2024, 8:52 AM

sys/vm/swap_pager.c
2402	To be clear, my position was that we should try to avoid exposing the iterator interface directly and instead provide a function which can be used to interleave iteration of the swap block and VM object tries, since that's what those callers are doing.

swap_pager: use pctrie iterators
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 143361

sys/fs/tmpfs/tmpfs_vnops.c

sys/vm/swap_pager.h

sys/vm/swap_pager.c

sys/vm/vm_object.c

swap_pager: use pctrie iteratorsClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 143361

sys/fs/tmpfs/tmpfs_vnops.c

sys/vm/swap_pager.h

sys/vm/swap_pager.c

sys/vm/vm_object.c

swap_pager: use pctrie iterators
ClosedPublic
Actions

Revision Contents
Changeset List