Avoid walking from the pctrie root while iterating over pctrie elements by using iterators.
Details
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
sys/vm/swap_pager.c | ||
---|---|---|
1901 | The control flow here would be less confusing to me if this i == 0 case handling was moved out of the loop: restart: if (object->flags & OBJ_DEAD) { ... } swblk_iter_init(&blks, object); sb = swblk_start(&blks, pi); swp_pager_init_freerange(&range); for (i = 0;;) { ... } and anywhere the loop drops the VM object look, instead of i = 0; continue;, goto restart;. | |
2220 | This assignment could be moved into the loop header. | |
2233–2234 | Unrelated to this diff, but, how is it safe to dereference sb->p while srcobject is unlocked? That seems wrong. Similarly, I don't think it's safe to access sb after reacquiring the lock. Am I missing something? | |
2243 | Would it be possible to have an explicit iter_invalidate operation which resets the path but leaves the current index and limit alone? |
sys/vm/swap_pager.c | ||
---|---|---|
2233–2234 | Where swap_pager_copy is called in vm_object_split, all the pages are xbusied first. Where swap_pager_copy is called in vm_object_collapse, vm_object_collapse_scan does a "check for busy page" and calls vm_object_collapse_scan_wait for non-busy objects, which does ... I have no idea. I can change this to save the sb values before releasing the lock, but I don't know that to be correct. I can change this to do a big goto restart after reacquiring the lock. But I'd just be making wild stabs in the dark. | |
2243 | Sure. |
Incorporate meta_transfer changes. Use blk.index for sp->p in several places. Compute 'start' more simply.
I wonder if we could combine swblk_iter_init() (and swblk_iter_limit_init()) and swblk_start(). Or perhaps introduce some macros similar to VM_MAP_ENTRY_FOREACH(). I think that would be straightforward but for swap_pager_find_least(), which expects the caller to initialize the iterator state. I made some inline comments which argue for keeping swap_pager_find_least() as it was for now.
sys/fs/tmpfs/tmpfs_vnops.c | ||
---|---|---|
2164 | What we really want here is a swap pager operation which returns the next pindex for which there is no swap block. Absent that, I'm not sure it's really worth maintaining an iterator here. (I'm trying to argue here that swap_pager_find_least() should keep its old interface, since I think that permits some simplification of the iterator interface.) | |
sys/vm/swap_pager.c | ||
1886 | Can we have two nested loops instead? | |
1940 | If m == NULL here, then the object lock was dropped and the iterator is invalid. | |
sys/vm/swap_pager.h | ||
43 | This should go into the #ifdef _KERNEL block, together with the functions that reference it. | |
sys/vm/vm_object.c | ||
1706 | The first part of this loop is searching for the smallest pindex >= pi for which there is a resident page or an assigned swap block. I wonder if it'd make sense to have a combined interface which searches for this pindex, since the vm_page scan will eventually also make use of iterators. (And even more radically, I wonder if it'd be possible and sensible to use the same trie for both resident pages and swap blocks.) In other words, I wonder if we should just leave this loop as it was before for now. |
Incorporate changes from comments-only commit. Fix calculation of initial start value.
sys/vm/swap_pager.c | ||
---|---|---|
1899 | The indentation here is wrong. | |
1917 | Is it safe to access sb here, after the object lock was dropped and reacquired? Perhaps the existence of an xbusy page in the object at that pindex is sufficient to provide that guarantee. | |
sys/vm/vm_object.c | ||
1706 | Indeed, we'd need to come up with some encoding for the keys which handles this overlap. I don't have any suggestions offhand. One comment is that instead of storing vm_page pointers, we could likely compress them into a (segid, offset) tuple which should in general require less than 64 bits. |
Undo the last non-whitespace change, since it breaks swap_pager_find_least by possibly creating an empty block.
sys/vm/swap_pager.c | ||
---|---|---|
2268–2269 | The comment does not match the code. |
Update the patch to reflect the changes already checked in to main to address the last couple of comments.
I don't see any correctness problems.
sys/vm/swap_pager.c | ||
---|---|---|
571–572 | Can we group the iter functions together? | |
578 | ||
1905 | The page is not dirtied here; as stated in a now deleted comment that happens after the restart. | |
1907 | ... and if (m == NULL) need only occur after vm_page_alloc() | |
1918–1919 | There ought to be a comment explaining that we get here because of an object lock release. | |
1955–1962 | The correctness of updating pi is subtle; it depends on us having already blocked allocation of swap blocks on the devise being shut down. Otherwise, a restart would always need to start from pi==0. I think this deserves a comment. | |
2205–2206 | Why was this deleted? | |
2255 | A comment motivating the reset would be good. |
A performance test. With this patch {F96492700}, which computes stats on calls to swp_pager_meta_free, I ran all 8 stress2 swap tests, recording the stats immediately before and aofter the tests. The stats are 'cycles', 'calls', and 'pages' (freed). The results are:
Original:
vm.stats.swap.pages: 83867620
vm.stats.swap.cycles: 11894179396
vm.stats.swap.calls: 61589970
Modified:
vm.stats.swap.pages: 84177365
vm.stats.swap.cycles: 11623436808
vm.stats.swap.calls: 62086155
Even though, for whatever reason, the test with the modified code freed more pages and made more more function calls, the cycles spent on swp_pager_meta_free fell by 2.276%.
With this patch:
{F96917947}
I compared the cycles consumed freeing 1000000 pages without and with this patch. The results from ministat:
x old + new +------------------------------------------------------------------------------+ | + ++ +++ ++ ++ + x x x x x x x x x +| ||_____________M____A__________________| |_____A_____| | +------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 12 11315490 11602387 11444362 11447090 88533.923 + 12 10575194 11651487 10713880 10792243 283272.81 Difference at 95.0% confidence -654847 +/- 177689 -5.72064% +/- 1.54457% (Student's t, pooled s = 209859)
What exactly do you mean by "freeing"? I wouldn't expect this patch on its own to affect how fast we can free pages, or is this including your other iterator patches?
x old + new +------------------------------------------------------------------------------+ | + ++ +++ ++ ++ + x x x x x x x x x +| ||_____________M____A__________________| |_____A_____| | +------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 12 11315490 11602387 11444362 11447090 88533.923 + 12 10575194 11651487 10713880 10792243 283272.81 Difference at 95.0% confidence -654847 +/- 177689 -5.72064% +/- 1.54457% (Student's t, pooled s = 209859)
What exactly do you mean by "freeing"? I wouldn't expect this patch on its own to affect how fast we can free pages, or is this including your other iterator patches?
I mean "calling swap_pager_meta_free" after constructing an object and filling it with blocks just for the opportunity to time the freeing of it. This patch does use iterators in swap_pager_meta_free. The timing results do not reflect changes to any file but swap_pager.c.
It's not easy to get a signoff from @alc. Maybe today will be the day. But he demands performance testing.
Update to account for overlapping changes to swap_pager.c. Address comments from @alc. Drop variable 'pi'.
I was hoping for a greater gain, but this is perfectly okay. The way this test is constructed probably represents a best case for swap_pager in the sense of bringing out the difference between the old implementation versus the iterator. The reason being that the swap blocks are contiguous, and so there are fewer subr_blist calls and those calls are freeing large batches. So, the time spent in subr_blist is minimized.
I would use this test to try to optimize the pctrie code underlying swblk_iter_next().
sys/vm/swap_pager.c | ||
---|---|---|
2402 | From a performance standpoint, I'm skeptical of using iterators in cases like this where there will be 0 or 1 calls to swblk_iter_next(). If there are 0 calls, the cost of setting up the iterator is wasted cycles. If there is 1 call, the savings from the old implementation, i.e., walking the trie again, which should be almost all dcache hits, will be offset by the icache misses for executing a new function, swblk_iter_next(). In contrast, the old implementation simply used the same function twice. |
sys/vm/swap_pager.c | ||
---|---|---|
1918 | Don't you need to reset the page iterator too? |
sys/vm/swap_pager.c | ||
---|---|---|
2402 | To be clear, my position was that we should try to avoid exposing the iterator interface directly and instead provide a function which can be used to interleave iteration of the swap block and VM object tries, since that's what those callers are doing. |