Page MenuHomeFreeBSD

A race in vm_page_busy_sleep(9).
ClosedPublic

Authored by kib on Oct 8 2016, 5:13 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 8, 3:42 AM
Unknown Object (File)
Tue, Dec 24, 6:27 PM
Unknown Object (File)
Nov 22 2024, 11:37 AM
Unknown Object (File)
Nov 21 2024, 4:00 PM
Unknown Object (File)
Nov 21 2024, 3:31 PM
Unknown Object (File)
Nov 18 2024, 5:38 AM
Unknown Object (File)
Nov 17 2024, 6:55 PM
Unknown Object (File)
Nov 17 2024, 5:15 PM
Subscribers

Details

Summary

Suppose that we have an exclusively busy page, and a thread which can accept shared-busy page. In this case, typical code waiting for the page xbusy state to pass is

 		if (vm_page_xbusied(m)) {
                        vm_page_lock(m);
 			VM_OBJECT_WUNLOCK(object);    <---1
			vm_page_busy_sleep(p, "vmopax");
 			goto again;
 		}

Suppose that the xbusy state owner locked the object, unbusied the page and unlocked the object after we are at the line 1, but before we executed the load of the busy_lock word in vm_page_busy_sleep(). If it happens that there is still no waiters recorded for the busy state, the xbusy owner did not acquired the page lock, so it proceeded.

More, suppose that some other thread happen to share-busy the page after xbusy state was relinquished but before the m->busy_lock is read in vm_page_busy_sleep(). Again, that thread only need vm_object lock to proceed. Then, vm_page_busy_sleep() reads busy_lock value equal to the VPB_SHARERS_WORD(1)

In this case, all tests in vm_page_busy_sleep(9) pass and we are going to sleep, despite the page being share-busied.

Update check for m->busy_lock == VPB_UNBUSIED in vm_page_busy_sleep(9) to accept shared-busy state if we only wait for the xbusy state to pass.

Test Plan

Apparently current code does not share-busy pages from parallel threads, but I have plans for it.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib retitled this revision from to A race in vm_page_busy_sleep(9)..
kib updated this object.
kib edited the test plan for this revision. (Show Details)
kib added reviewers: alc, markj.
kib set the repository for this revision to rS FreeBSD src repository - subversion.
markj edited edge metadata.
This revision is now accepted and ready to land.Oct 10 2016, 5:43 PM

I just wanted to confirm a couple things that aren't explicitly stated in the summary.

  1. When the xbusy owner reacquires and releases the object lock, it is relinquishing its xbusy claim on the page.
  1. Because there are no waiters at the time that the xbusy owner relinquishes its claim, it can do so with only an atomic cmpset. In particular, it doesn't have to acquire the page lock. Moreover, another thread can sbusy the page with only an atomic cmpset.
sys/vm/vm_page.c
752 ↗(On Diff #21184)

Since we're modifying the nearby code can we switch this assert to the newer form: vm_page_assert_locked(m)

kib edited edge metadata.
In D8196#170863, @alc wrote:

I just wanted to confirm a couple things that aren't explicitly stated in the summary.

  1. When the xbusy owner reacquires and releases the object lock, it is relinquishing its xbusy claim on the page.

By the 'the xbusy owner' you mean the thread which unbusies the xbusy state in my scenario ? Yes.

  1. Because there are no waiters at the time that the xbusy owner relinquishes its claim, it can do so with only an atomic cmpset. In particular, it doesn't have to acquire the page lock. Moreover, another thread can sbusy the page with only an atomic cmpset.

Again, yes, applied to the share-busy owner == "another thread" in my scenario.

I edited the description of the review to hopefully address you notes.

kib edited edge metadata.

Change the style of the page lock assert.

Merge sequential if()s with the same 'then' clause in vm_page_busy_sleep().

This revision now requires review to proceed.Oct 12 2016, 11:41 PM
alc edited edge metadata.
This revision is now accepted and ready to land.Oct 13 2016, 7:11 AM
This revision was automatically updated to reflect the committed changes.

As an aside, I think it would be worthwhile to add comments to the page busy code that describe how the page lock is (and isn't) used. For example, I think that a casual reader of the page busy code might think that the page lock is protecting against races like you've addressed in this change.