Differential D42029

swap pager: Fix a race in swap_pager_swapoff_object()
ClosedPublic
Actions

Authored by markj on Oct 1 2023, 3:55 PM.

Details

Reviewers

alc
kib
dougm

Commits

rG6f35c2380737: swap_pager: Fix a race in swap_pager_swapoff_object()
rGd8cb6c173417: swap_pager: Fix a race in swap_pager_swapoff_object()
rGaa229a59adea: swap_pager: Fix a race in swap_pager_swapoff_object()
rGe61568aeeec7: swap_pager: Fix a race in swap_pager_swapoff_object()

Summary

When we disable swapping to a device, we scan the full VM object list
looking for objects with swap trie nodes that reference the device in
question. The pages corresponding to those nodes are paged in.

While paging in, we drop the VM object lock. Moreover, we do not hold a
reference for the object; swap_pager_swapoff_object() merely bumps the
paging-in-progress counter. vm_object_terminate() waits for this
counter to drain before proceeding and freeing pages.

However, swap_pager_swapoff_object() decrements the counter before
re-acquiring the VM object lock, which means that vm_object_terminate()
can race to acquire the lock and free the pages. Then,
swap_pager_swapoff_object() ends up unbusying a freed page. Fix the
problem by acquiring the lock before waking up sleepers.

PR: 273610
Reported by: Graham Perrin <grahamperrin@gmail.com>

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

markj created this revision.Oct 1 2023, 3:55 PM

Herald added a subscriber: imp. · View Herald TranscriptOct 1 2023, 3:55 PM

markj requested review of this revision.Oct 1 2023, 3:55 PM

Harbormaster completed remote builds in B53775: Diff 128029.Oct 1 2023, 3:55 PM

This is just my speculation as to the cause of bugzilla PR 273610, but I'm pretty sure it's a real bug.

kib accepted this revision.Oct 1 2023, 4:31 PM

This revision is now accepted and ready to land.Oct 1 2023, 4:31 PM

I suspect there is another race with swapoff. Imagine that some thread A started swap_pager_putpages(), and allocated some swap blocks. Then, another thread B marked the swap device where a block was allocated, as closing, and started scanning objects for swapoff page-ins. The thread A could block waiting for vm_object lock, which is owned by B, while holding allocated blocks. Thread B finishes scan and releases the object lock, allowing A to swap out on the closing device.

I think putpages() needs to re-check the device for closing after obtaining the object lock.

In D42029#958606, @kib wrote:

I suspect there is another race with swapoff. Imagine that some thread A started swap_pager_putpages(), and allocated some swap blocks. Then, another thread B marked the swap device where a block was allocated, as closing, and started scanning objects for swapoff page-ins. The thread A could block waiting for vm_object lock, which is owned by B, while holding allocated blocks. Thread B finishes scan and releases the object lock, allowing A to swap out on the closing device.

I think putpages() needs to re-check the device for closing after obtaining the object lock.

Isn't this handled by the check at the end of swap_pager_swapoff()? i.e., after scanning the VM object list, we check to see if any swap blocks are still allocated. If so, we retry the whole operation.

In D42029#958607, @markj wrote:

In D42029#958606, @kib wrote:

I suspect there is another race with swapoff. Imagine that some thread A started swap_pager_putpages(), and allocated some swap blocks. Then, another thread B marked the swap device where a block was allocated, as closing, and started scanning objects for swapoff page-ins. The thread A could block waiting for vm_object lock, which is owned by B, while holding allocated blocks. Thread B finishes scan and releases the object lock, allowing A to swap out on the closing device.

I think putpages() needs to re-check the device for closing after obtaining the object lock.

Isn't this handled by the check at the end of swap_pager_swapoff()? i.e., after scanning the VM object list, we check to see if any swap blocks are still allocated. If so, we retry the whole operation.