vm_fault_hold: handle vm_page_rename failure
ClosedPublic
Actions

Authored by rlibby on Dec 1 2015, 4:13 AM.

Details

Reviewers

kib
alc
cem

Group Reviewers

Contributor Reviewers (ports)

Commits

rS291907: vm_fault_hold: handle vm_page_rename failure

Summary

On vm_page_rename failure, fix a missing object unlock and a double
free of a page.

First remove the old page, then rename into other page into
first_object, then free the old page. This avoids the problem on
rename failure. This is a little ugly but seems to be the most
straightforward solution.

This bug blocks further exploration of M_NOWAIT bugs with RADIX NODE
because of frequency (including D4146).

This is the patch currently applied internally at Isilon. A few other
solutions considered:

Inline pieces of unlock_and_deallocate. This is efficient in terms of the various locks but seems fragile, if the recipe for retrying the fault ever changes.
Patching unlock_and_deallocate. This seems okay in principle but would be easier to do if I understood vm_fault_hold better. I am nervous about e.g. putting an if (fs.first_m != NULL) guard there and possibly covering up problems at the other various call sites.
Enhance vm_page_replace and call that instead of vm_page_rename. It's arguable if that might be a good idea, and it can be explored later.

Test Plan

With the debugging from here:

https://github.com/freebsd/freebsd/compare/master...rlibby:mnowait-dbg

sysctl debug.fail_point.uma_zalloc_arg="1%return"
kyua test -k /usr/tests/sys/Kyuafile

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

rlibby updated this revision to Diff 10618.Dec 1 2015, 4:13 AM

rlibby retitled this revision from to vm_fault_hold: handle vm_page_rename failure.

rlibby updated this object.

rlibby edited the test plan for this revision. (Show Details)

Herald added a reviewer: Contributor Reviewers (ports). · View Herald TranscriptDec 1 2015, 4:13 AM

Herald added a subscriber: imp. · View Herald Transcript

rlibby updated this object.Dec 1 2015, 4:17 AM

rlibby added reviewers: alc, kib, cem.

rlibby added a subscriber: markj.

kib accepted this revision.Dec 1 2015, 8:40 AM

kib edited edge metadata.

This revision is now accepted and ready to land.Dec 1 2015, 8:40 AM

How would you have to enhance vm_page_replace() to use it here? Unlike vm_page_rename(), it doesn't dirty the page. What else?

In D4326#91286, @alc wrote:

How would you have to enhance vm_page_replace() to use it here? Unlike vm_page_rename(), it doesn't dirty the page. What else?

It also doesn't remove the new page from its old object (if any). I
was also thinking of cleaning up what looks to me like some
double-lookup inefficiency and following up on a suggestion of
Attilio's for vm_page_replace not to require an xbusy'd page.

I wasn't going to dirty the page inside vm_page_replace, but explicitly
after calling it in vm_fault_hold. I think other callers might not
want it dirtied.

This route would eliminate the allocation here and obviate the failure
handling entirely.

Here is an experimental patch (not tested):
https://github.com/freebsd/freebsd/compare/master...rlibby:replace-vm_page_replace

@alc, is there anything more you're looking for from me?

In D4326#92343, @rlibby_gmail.com wrote:

@alc, is there anything more you're looking for from me?

Nope.

Closed by commit rS291907: vm_fault_hold: handle vm_page_rename failure (authored by cem). · Explain WhyDec 6 2015, 5:46 PM

This revision was automatically updated to reflect the committed changes.

In D4326#91307, @rlibby_gmail.com wrote:

In D4326#91286, @alc wrote:

How would you have to enhance vm_page_replace() to use it here? Unlike vm_page_rename(), it doesn't dirty the page. What else?

It also doesn't remove the new page from its old object (if any). I
was also thinking of cleaning up what looks to me like some
double-lookup inefficiency and following up on a suggestion of
Attilio's for vm_page_replace not to require an xbusy'd page.

Strictly speaking, I don't think that either of these changes are needed to apply vm_page_replace() to vm_fault(). The page "mold"/"fs.first_m" will be xbusy and not in a paging queue. Moreover, even with the double lookup, I believe that there will be one less radix tree operation performed.

I wasn't going to dirty the page inside vm_page_replace, but explicitly
after calling it in vm_fault_hold. I think other callers might not
want it dirtied.

Yes, the current callers don't need the page to be dirtied. I agree with making the caller responsible for dirtying the page.

For the same reason, I would also make the caller responsible for removing mnew from its current object, and instead add a "KASSERT(mnew->object == NULL, ..." to vm_page_replace().

This route would eliminate the allocation here and obviate the failure
handling entirely.

Yes, which is exactly why I prefer to use this solution.

Here is an experimental patch (not tested):
https://github.com/freebsd/freebsd/compare/master...rlibby:replace-vm_page_replace

In D4326#93303, @alc wrote:

In D4326#91307, @rlibby_gmail.com wrote:

In D4326#91286, @alc wrote:

How would you have to enhance vm_page_replace() to use it here? Unlike vm_page_rename(), it doesn't dirty the page. What else?

It also doesn't remove the new page from its old object (if any). I
was also thinking of cleaning up what looks to me like some
double-lookup inefficiency and following up on a suggestion of
Attilio's for vm_page_replace not to require an xbusy'd page.

Strictly speaking, I don't think that either of these changes are needed to apply vm_page_replace() to vm_fault(). The page "mold"/"fs.first_m" will be xbusy and not in a paging queue. Moreover, even with the double lookup, I believe that there will be one less radix tree operation performed.

If the old page (fs.first_m) is guaranteed to be xbusy'd, then I agree
that no changes to vm_page_replace are necessary. The minimal change
in that case would be to vm_page_remove(fs.m) instead, and replace the
vm_page_rename with vm_page_replace and vm_page_dirty. (Then
vm_page_free(fs.first_m) afterward as just committed above in
rS291907.)

It is not obvious to me that the fs.first_m will be xbusy'd but I am
not very familiar with this code. I see explicit xbusy'ing of fs.m and
an assertion to that effect, but not for fs.first_m.

I wasn't going to dirty the page inside vm_page_replace, but explicitly
after calling it in vm_fault_hold. I think other callers might not
want it dirtied.

Yes, the current callers don't need the page to be dirtied. I agree with making the caller responsible for dirtying the page.

For the same reason, I would also make the caller responsible for removing mnew from its current object, and instead add a "KASSERT(mnew->object == NULL, ..." to vm_page_replace().

Yeah, that seems fine. My observation here would be that currently
vm_page_replace is operating without page locks. If one considered
changing a page's object without the page lock problematic (vm_page
comments claim object is "(O,P)", but maybe O is enough in reality?),
then we should add page locking. Then if we are already locking the
page, doing the vm_page_remove(mnew) doesn't seem like a burden and
possibly optimizes a lock/unlock.

This route would eliminate the allocation here and obviate the failure
handling entirely.

Yes, which is exactly why I prefer to use this solution.

I am happy to keep driving this if there's interest. What would you
like to do here? I can open a new diff with some or all of the above.

In D4326#93357, @rlibby_gmail.com wrote:

In D4326#93303, @alc wrote:

In D4326#91307, @rlibby_gmail.com wrote:

In D4326#91286, @alc wrote:

How would you have to enhance vm_page_replace() to use it here? Unlike vm_page_rename(), it doesn't dirty the page. What else?

It also doesn't remove the new page from its old object (if any). I
was also thinking of cleaning up what looks to me like some
double-lookup inefficiency and following up on a suggestion of
Attilio's for vm_page_replace not to require an xbusy'd page.

Strictly speaking, I don't think that either of these changes are needed to apply vm_page_replace() to vm_fault(). The page "mold"/"fs.first_m" will be xbusy and not in a paging queue. Moreover, even with the double lookup, I believe that there will be one less radix tree operation performed.

If the old page (fs.first_m) is guaranteed to be xbusy'd, then I agree
that no changes to vm_page_replace are necessary. The minimal change
in that case would be to vm_page_remove(fs.m) instead, and replace the
vm_page_rename with vm_page_replace and vm_page_dirty. (Then
vm_page_free(fs.first_m) afterward as just committed above in
rS291907.)

It is not obvious to me that the fs.first_m will be xbusy'd but I am
not very familiar with this code. I see explicit xbusy'ing of fs.m and
an assertion to that effect, but not for fs.first_m.

Typically, fs.first_m starts life as fs.m. It may also help to know that by default vm_page_alloc() xbusy's the returned page. Kostik, am I forgetting anything?

I wasn't going to dirty the page inside vm_page_replace, but explicitly
after calling it in vm_fault_hold. I think other callers might not
want it dirtied.

Yes, the current callers don't need the page to be dirtied. I agree with making the caller responsible for dirtying the page.

For the same reason, I would also make the caller responsible for removing mnew from its current object, and instead add a "KASSERT(mnew->object == NULL, ..." to vm_page_replace().

Yeah, that seems fine. My observation here would be that currently
vm_page_replace is operating without page locks. If one considered
changing a page's object without the page lock problematic (vm_page
comments claim object is "(O,P)", but maybe O is enough in reality?),
then we should add page locking. Then if we are already locking the
page, doing the vm_page_remove(mnew) doesn't seem like a burden and
possibly optimizes a lock/unlock.

I need to think about the page locking. Yes, we may want to introduce page locking on the page "mold", but that is going to be a different page lock than the page "mnew". So, I don't see that as a good argument for bringing the vm_page_remove(mnew) into the function vm_page_replace().

This route would eliminate the allocation here and obviate the failure
handling entirely.

Yes, which is exactly why I prefer to use this solution.

I am happy to keep driving this if there's interest. What would you
like to do here? I can open a new diff with some or all of the above.

Do it.

I would just ask that you separate the changes to vm_fault() from those to vm_page_replace(). I'm ready rubber stamp the change that eliminates the unnecessary radix tree operation from vm_page_replace, but I want to think about the xbusy change since that affects the interface.

In D4326#93602, @alc wrote:

In D4326#93357, @rlibby_gmail.com wrote:

If the old page (fs.first_m) is guaranteed to be xbusy'd, then I agree
that no changes to vm_page_replace are necessary. The minimal change
in that case would be to vm_page_remove(fs.m) instead, and replace the
vm_page_rename with vm_page_replace and vm_page_dirty. (Then
vm_page_free(fs.first_m) afterward as just committed above in
rS291907.)

It is not obvious to me that the fs.first_m will be xbusy'd but I am
not very familiar with this code. I see explicit xbusy'ing of fs.m and
an assertion to that effect, but not for fs.first_m.

Typically, fs.first_m starts life as fs.m. It may also help to know that by default vm_page_alloc() xbusy's the returned page. Kostik, am I forgetting anything?

I cannot add anything. The fs.first_m is always coming from fs.m. The code either busies fs.m explicitely, or allocates it by calling vm_page_alloc() without VM_ALLOC_NOBUSY flag. In either case, the page must be busy since we drop the object lock to call pmap_enter() in the slow path.

rlibby mentioned this in D4471: vm_page_replace: remove redundant radix lookup.Dec 9 2015, 4:30 PM

rlibby mentioned this in D4478: vm_fault_hold: use vm_page_replace instead of vm_page_rename.Dec 9 2015, 8:38 PM