Page MenuHomeFreeBSD

(vm object 3) Add a shared object busy synchronization mechanism that blocks new page busy acquires while held.
Needs ReviewPublic

Authored by jeff on Tue, Sep 10, 7:58 PM.

Details

Summary

This is the least elegant of the object concurrency patches. The basic notion is that the object can become busy, after which new pages can not become busy but existing pages may persist in being busy. This permits the existing style of only checking, and not acquiring, busy for dedicated blocks of code.

Here with object busy we're synchronizing vm_fault_soft_fast against pmap_remove* or changes in valid bits on any constituent page in the super page set. We could simply busy every page in the set and then unbusy at the end as we would've for 512 individual mappings. This would be my preference, but there was concern over the cost. So instead we use the atomic object busy counter to emulate the old behavior of a busy check being sufficient.

This is the only consumer that still uses the pattern of checking busy without acquiring and so it is at present the only place that requires object busy. This is only synchronizing against other page busy consumers that don't have the object lock held. Callers which do have the object write lock will still be serialized. The object busy may allow us to do vm_fault_soft_fast without the object lock held across the pmap operation.

The implementation does allow for pages to transiently become busy, notice the object busy count, and drop busy. These will revert back to single page mappings if you lose the race. I don't see a reasonable way around this. I think it's likely that if the page is becoming busy it may drop out of the superpage set later anyway.

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 26389
Build 24846: arc lint + arc unit

Event Timeline

jeff created this revision.Tue, Sep 10, 7:58 PM
jeff retitled this revision from Add a shared object busy synchronization mechanism that blocks new page busy acquires while held. to (vm object 3) Add a shared object busy synchronization mechanism that blocks new page busy acquires while held..Tue, Sep 10, 8:10 PM
jeff edited the summary of this revision. (Show Details)
jeff edited reviewers, added: alc, kib, markj, dougm; removed: manu.
jeff edited the summary of this revision. (Show Details)Tue, Sep 17, 9:55 PM
kib added inline comments.Wed, Sep 18, 6:24 AM
sys/amd64/amd64/pmap.c
5660

As was discussed somewhere else, this assert should stil check that the page is busied if PMAP_ENTER_NOSLEEP was specified. It may be just a replacement of VM_OBJECT_ASSERT_LOCKED() by (PMAP_ENTER_NOSLEEP && VM_PAGE_OBJECT_BUSY_ASSERT).

sys/vm/vm_fault.c
285

We wrote a value to fs->first_object->busy above, and there we read m->busy. Other thread could write the lock cookie to m->busy, and then rechecks first_object->busy.

What prevents us from seeing non-busies m, simultaneously with other thread seeing unbusy first_object ? In other words, it is typical (x=1; read y;) vs. (y = 1; read x;) race.

jeff added inline comments.Wed, Sep 18, 8:43 PM
sys/amd64/amd64/pmap.c
5660

I'm not sure I fully follow. We're already asserting that the page is busy in all cases. What is the additional assert?

sys/vm/vm_fault.c
285

The important thing is that when the object is busy you can transiently have a busy page but you can not have any modifications permitted by busying a page. So you are left with a harmless race and you skip superpage insertion.

So in your example the first thread may read the page as busy and operate as-if the page had been busy before the call to vm_object_busy(). The second thread will observe the object busy count in tryxbusy and unbusy the page before returning failure to the caller. So you do have this transient condition. I believe it is harmless and according to tests I have done it is quite rare.

The other obvious solution is to abandon this whole approach and simply busy/unbusy 512 pages.

markj added inline comments.Wed, Sep 18, 10:17 PM
sys/vm/vm_fault.c
281

Since the goal of these patches is to permit busying a page without the object lock held, how is the check for the busy state of m below going to work? I presume that at some point we must change this code to xbusy m, but that won't succeed if the object is busy.

285

Currently vm_page_xbusy() and vm_page_sbusy() panic if their attempts to update the page state fail. Right now we are protected from that by the object lock, but how would this work in the longer term?

sys/vm/vm_page.c
904

I think this deserves a comment. What exactly does the wiring protect us from here?

1112

m doesn't need to be parenthesized here.

jeff added inline comments.Wed, Sep 18, 10:25 PM
sys/vm/vm_fault.c
281

I feel that my comments must be failing to describe something fundamental. This is addressed in the patch description.

If the page was busied by another thread before the call to vm_object_busy then busied returns true and we fall back to the fully synchronized version of vm_fault. All of the code in soft_fault_fast is an optional fast path.

We do not busy the first page or any other page in the superpage set. We use the object busy to guarantee that the state remains stable across the operation by denying other threads from acquiring busy.

285

Those functions are to be eliminated at the end of the patchset. Nothing in the kernel calls them after #2. Only trybusy and busy_acquire() are used.

sys/vm/vm_page.c
904

Yes the notion is if you don't have the object locked you must be holding a reference count on the page otherwise it can disappear or change identity while you sleep and loop to acquire busy again.

markj added inline comments.Wed, Sep 18, 10:39 PM
sys/vm/vm_page.c
1351

Isn't obj != NULL guaranteed by the fact that we hold the object lock?

1384

Same here.

I think I've brought this up before, but I would like it if the VM had a generic per-2MB page structure. We already have several in vm_reserv and the pmap, and IMO it would be a good place to maintain a "compound" busy state, rather than in the object. I worry that a mechanism to block the busying of all pages in an object will inhibit concurrency and lead to transient latency spikes. I don't object to the current approach though.

sys/vm/vm_fault.c
281

I just confused myself while reading code.

jeff added a comment.Wed, Sep 18, 10:59 PM

I think I've brought this up before, but I would like it if the VM had a generic per-2MB page structure. We already have several in vm_reserv and the pmap, and IMO it would be a good place to maintain a "compound" busy state, rather than in the object. I worry that a mechanism to block the busying of all pages in an object will inhibit concurrency and lead to transient latency spikes. I don't object to the current approach though.

Right now the object lock is the mechanism that blocks busying of all pages and creates transient latency spikes. This at least narrows it a level and allows other object operations to proceed. If you had a per-superpage object you would have to be able to very quickly look it up and acquire in tryxbusy/sbusy.

Ultimately I would like to see a generic mechanism that can treat pages as groups with a single set of state. A variable page size.

kib added inline comments.Thu, Sep 19, 1:47 PM
sys/amd64/amd64/pmap.c
5660

I see, I forgot about the fact that vm_page_object_busy_assert() does check for busy (but not for xbusy, it seems).

sys/vm/vm_fault.c
285

But the impossible reads are relevant for the non-superpage case as well. We might map a busy page (4k), and other thread might mark and assume the page busy without noticing object->busy.

kib added a comment.EditedThu, Sep 19, 1:55 PM
In D21592#473571, @jeff wrote:

I think I've brought this up before, but I would like it if the VM had a generic per-2MB page structure. We already have several in vm_reserv and the pmap, and IMO it would be a good place to maintain a "compound" busy state, rather than in the object. I worry that a mechanism to block the busying of all pages in an object will inhibit concurrency and lead to transient latency spikes. I don't object to the current approach though.

Right now the object lock is the mechanism that blocks busying of all pages and creates transient latency spikes. This at least narrows it a level and allows other object operations to proceed. If you had a per-superpage object you would have to be able to very quickly look it up and acquire in tryxbusy/sbusy.
Ultimately I would like to see a generic mechanism that can treat pages as groups with a single set of state. A variable page size.

But isn't vm_reserv adequate object ? psind is set to 1 iff the reserv is fully populated. Why cannot we lock vm_reserv around pmap_enter() in psind==1 case for fast_soft ?

Update: when I wrote that, I thought about locking reserv and e.g. busy the first page in the range. Incompatible manipulation of any page in the superpage would need to break the reservation. But this is not enough, indeed.

jeff added a comment.Thu, Sep 19, 7:38 PM
In D21592#473738, @kib wrote:
In D21592#473571, @jeff wrote:

I think I've brought this up before, but I would like it if the VM had a generic per-2MB page structure. We already have several in vm_reserv and the pmap, and IMO it would be a good place to maintain a "compound" busy state, rather than in the object. I worry that a mechanism to block the busying of all pages in an object will inhibit concurrency and lead to transient latency spikes. I don't object to the current approach though.

Right now the object lock is the mechanism that blocks busying of all pages and creates transient latency spikes. This at least narrows it a level and allows other object operations to proceed. If you had a per-superpage object you would have to be able to very quickly look it up and acquire in tryxbusy/sbusy.
Ultimately I would like to see a generic mechanism that can treat pages as groups with a single set of state. A variable page size.

But isn't vm_reserv adequate object ? psind is set to 1 iff the reserv is fully populated. Why cannot we lock vm_reserv around pmap_enter() in psind==1 case for fast_soft ?
Update: when I wrote that, I thought about locking reserv and e.g. busy the first page in the range. Incompatible manipulation of any page in the superpage would need to break the reservation. But this is not enough, indeed.

You would need to handle busy like we did vm_page_lock() where it refers to a range of pages. Or you would need to do an optional tiered lock like object busy but with the reservation being the second lock. I hope that as we figure out how to aggregate pages into collections better this object busy goes away. For now it lets me go forward with a lot of object lock reduction.

sys/vm/vm_fault.c
285

I don't understand this example. How will another thread assume page is busy without noticing object busy?