Page MenuHomeFreeBSD

Fix one more case of vnode_pager I/O errors
AcceptedPublic

Authored by chs on Wed, Jun 24, 6:04 PM.

Details

Summary

My recent change in r361855 didn't correctly handle read ahead/behind pages.
We current still call vm_page_readahead_finish() for such pages even if
the read failed and the pages are not marked valid, but that function
asserts that the page is valid since apparently we don't want invalid pages
on the paging queues.

This change essentially reverts r361855 and instead handles read ahead/behind
pages separately in the loop when the I/O has failed. We still need to unbusy
those pages even though we can't put them on a paging queue.

Test Plan

I wrote some atf tests for sendfile that I'll be submitting shortly.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 31990
Build 29536: arc lint + arc unit

Event Timeline

chs created this revision.Wed, Jun 24, 6:04 PM
chs requested review of this revision.Wed, Jun 24, 6:04 PM
markj added a reviewer: alc.Wed, Jun 24, 6:33 PM
markj added a comment.Wed, Jun 24, 6:35 PM

Hmm. There is nothing actually wrong with having invalid pages on the paging queues. I think the assertion is trying to express that it doesn't make a lot of sense to keep the page at all if an I/O error. In fact, before r292373 vm_page_readahead_finish() handling invalid pages by freeing them. But, to do that you need the object write lock.

So with this diff it looks like we are effectively leaking the invalid readahead/behind pages. I believe we could use error != 0 as a test for whether to grab the object lock in read or write mode, and simply free the pages in the latter case.

chs added a comment.Fri, Jun 26, 8:21 PM

Yea, I didn't know what the problem having invalid pages on the paging queues would be either.

And yea, this patch does leave these invalid pages in an odd state, where the pagedaemon can't reclaim them (though if the vnode is reclaimed then all of these invalid pages will be freed at that point). I wasn't going to worry about that right now since the sendfile code already can leave pages in this state after an I/O error, and kib thought that was ok in a review of a previous fix for sendfile I/O error handling. But if you'd like this fix to avoid creating this situation in vnode_pager_generic_getpages_done(), we can do that too.

While I was looking into vm_page_readahead_finish(), I noticed that swp_pager_async_iodone() also leaves pages in this state.

vm_page_grab_valid() (which is the last of the three callers of vm_page_readahead_finish()) does have code to free pages which it allocated that couldn't be read, but it also has a check for these pages being wired, and if the pages are wired then we only unbusy them rather than freeing them. That sounds wrong to me... the page will still be attached to its object, so when the thread that wired the page eventually unwires it, the page won't be freed and will end up in the same invalid and not on a paging queue state. Though I'm not sure that pages that are xbusied and invalid can really be wired by another thread anyway... I don't know what the reason to do that would be. Do you know if the page can really become wired while it's in this state? If it can be wired when we want to free it, how can we handle that? ... I found the comment in vm_page_release_locked() which says that it is actually legal to wire an xbusied page, so it looks like we do need to handle that. Note that vm_page_release_locked() has the same logic as vm_page_grab_valid(), where the page is freed only if it is not wired, and merely unbusied if it is wired, and so this likely needs the same fix as vm_page_grab_valid(), whatever that turns out to be. ... I think the operation that we want for this is "remove this page from its object, and free the page if it is not wired, or unbusy the page if it is wired." If the page is wired by someone else then the page will be freed when that other thread unwires. Does that sound right? I looked for an existing vm_page_* function that does that and I don't see it. It also feels like there would be a race between this new operation and another thread releasing the last wire reference, but I haven't thought about that very much yet.

At first I thought that taking the object write lock in iodone context (as we are in vnode_pager_generic_getpages_done() and swp_pager_async_iodone()) sounded like it could lead to deadlock, but looking again I see that we already take the object lock in read mode in vnode_pager_generic_getpages_done(), so taking the object lock in write mode there is probably ok too.

I'll update the diff here to have vnode_pager_generic_getpages_done() free the pages as you suggest, and that version passes the sendfile atf test that I also put out for review. I won't try to do anything yet about the case where the page is wired and thus cannot be freed though.

It would be great if we could have an assertion or collection of assertions that make sure that we don't put a page into this state where it is invalid and not busy and not wired and not on a paging queue. I think such an assertion would look something like "m->object == NULL || vm_page_queue(m) != PQ_NONE || vm_page_wired(m)", and it would need to go in several places: vm_page_unwire(), vm_page_release(), vm_page_drop(), and probably some subset of the vm_page_pqstate_* functions. Does that sound right? Are there any other functions that can change the page state to this one that we want to be illegal? (These assertions would ideally be written to check the new state of the page via the new value that is installed by the atomic op that changes the state eg. in the ref_count field, and would be skipped in cases where the page is actually freed. But you get the idea.)

chs updated this revision to Diff 73711.Fri, Jun 26, 8:21 PM

redo diff as markj suggested.

kib accepted this revision.Sat, Jun 27, 2:07 AM
This revision is now accepted and ready to land.Sat, Jun 27, 2:07 AM
alc accepted this revision.Sat, Jun 27, 6:37 PM
markj added a comment.Sun, Jun 28, 9:08 PM
In D25430#562442, @chs wrote:

And yea, this patch does leave these invalid pages in an odd state, where the pagedaemon can't reclaim them (though if the vnode is reclaimed then all of these invalid pages will be freed at that point). I wasn't going to worry about that right now since the sendfile code already can leave pages in this state after an I/O error, and kib thought that was ok in a review of a previous fix for sendfile I/O error handling. But if you'd like this fix to avoid creating this situation in vnode_pager_generic_getpages_done(), we can do that too.

While I was looking into vm_page_readahead_finish(), I noticed that swp_pager_async_iodone() also leaves pages in this state.

I see. That looks like a bug to me.

vm_page_grab_valid() (which is the last of the three callers of vm_page_readahead_finish()) does have code to free pages which it allocated that couldn't be read, but it also has a check for these pages being wired, and if the pages are wired then we only unbusy them rather than freeing them. That sounds wrong to me... the page will still be attached to its object, so when the thread that wired the page eventually unwires it, the page won't be freed and will end up in the same invalid and not on a paging queue state.

Typically, when a thread unwires a page, it uses vm_page_unwire(), which places the page in a queue. There is also vm_page_unwire_noq(), which must be used more carefully to avoid the problem that you pointed out. (Almost all of its current uses in the tree are for "unmanaged" pages that will never be on a page queue, like page table pages. We should probably have a distinct function for unwiring such pages.)

Though I'm not sure that pages that are xbusied and invalid can really be wired by another thread anyway... I don't know what the reason to do that would be. Do you know if the page can really become wired while it's in this state?

In the current locking scheme it is legal to wire a page while holding only its object lock. I suspect that right now, since the scope of the busy lock has expanded, it's not possible for that to happen, but there is no rule against it.

If it can be wired when we want to free it, how can we handle that? ... I found the comment in vm_page_release_locked() which says that it is actually legal to wire an xbusied page, so it looks like we do need to handle that. Note that vm_page_release_locked() has the same logic as vm_page_grab_valid(), where the page is freed only if it is not wired, and merely unbusied if it is wired, and so this likely needs the same fix as vm_page_grab_valid(), whatever that turns out to be. ...

I think vm_page_grab_valid() is actually ok. The key is in the comment above vm_page_wired(): if a page's object is write-locked, the page is xbusy, and it is not mapped, then its wire count is guaranteed not to increase. An invalid page is never mapped. So code like this:

if (!vm_page_wired(m))
    vm_page_free(m);

need only consider a race with an unwiring thread. If vm_page_wired() returns true, then the unwiring thread is responsible for ensuring that it ends up in a paging queue (or freeing it, but in this case there is no race because the object is locked). If vm_page_wired() returns false, then it is safe to free the page because we know that the wiring count cannot increase while the page is locked as described above.

I think the operation that we want for this is "remove this page from its object, and free the page if it is not wired, or unbusy the page if it is wired." If the page is wired by someone else then the page will be freed when that other thread unwires. Does that sound right? I looked for an existing vm_page_* function that does that and I don't see it. It also feels like there would be a race between this new operation and another thread releasing the last wire reference, but I haven't thought about that very much yet.

vm_page_remove() returns a value which indicates whether the page is safe to be freed. When removing a page, it must be xbusy and its object must be locked, and it must not be mapped.

I'll update the diff here to have vnode_pager_generic_getpages_done() free the pages as you suggest, and that version passes the sendfile atf test that I also put out for review. I won't try to do anything yet about the case where the page is wired and thus cannot be freed though.

Per my comments above, I believe it is correct to handle this exactly the same way as vm_page_grab_valid(): i.e., check for wirings and only free the page if none are present.

It would be great if we could have an assertion or collection of assertions that make sure that we don't put a page into this state where it is invalid and not busy and not wired and not on a paging queue. I think such an assertion would look something like "m->object == NULL || vm_page_queue(m) != PQ_NONE || vm_page_wired(m)", and it would need to go in several places: vm_page_unwire(), vm_page_release(), vm_page_drop(), and probably some subset of the vm_page_pqstate_* functions. Does that sound right? Are there any other functions that can change the page state to this one that we want to be illegal? (These assertions would ideally be written to check the new state of the page via the new value that is installed by the atomic op that changes the state eg. in the ref_count field, and would be skipped in cases where the page is actually freed. But you get the idea.)

The tricky part is that all of these fields are synchronized using different mechanisms, so it is hard to write assertions that aren't racy. I agree with the implication of your comments that this stuff is too subtle and somewhat under-asserted.

sys/vm/vnode_pager.c
1150

Per my comments below, we should check for wirings before freeing. Again, the wire count will not increase while:

  • the page's object is write-locked,
  • the page is xbusy, and
  • the page is unmapped.

Since this is a read-ahead page, it was allocated by the getpages method and must be invalid, which implies that it is unmapped. We could reasonably assert that these pages are invalid as well.