Page MenuHomeFreeBSD

Fix one more case of vnode_pager I/O errors
ClosedPublic

Authored by chs on Jun 24 2020, 6:04 PM.

Details

Summary

My recent change in r361855 didn't correctly handle read ahead/behind pages.
We current still call vm_page_readahead_finish() for such pages even if
the read failed and the pages are not marked valid, but that function
asserts that the page is valid since apparently we don't want invalid pages
on the paging queues.

This change essentially reverts r361855 and instead handles read ahead/behind
pages separately in the loop when the I/O has failed. We still need to unbusy
those pages even though we can't put them on a paging queue.

Test Plan

I wrote some atf tests for sendfile that I'll be submitting shortly.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

chs requested review of this revision.Jun 24 2020, 6:04 PM

Hmm. There is nothing actually wrong with having invalid pages on the paging queues. I think the assertion is trying to express that it doesn't make a lot of sense to keep the page at all if an I/O error. In fact, before r292373 vm_page_readahead_finish() handling invalid pages by freeing them. But, to do that you need the object write lock.

So with this diff it looks like we are effectively leaking the invalid readahead/behind pages. I believe we could use error != 0 as a test for whether to grab the object lock in read or write mode, and simply free the pages in the latter case.

Yea, I didn't know what the problem having invalid pages on the paging queues would be either.

And yea, this patch does leave these invalid pages in an odd state, where the pagedaemon can't reclaim them (though if the vnode is reclaimed then all of these invalid pages will be freed at that point). I wasn't going to worry about that right now since the sendfile code already can leave pages in this state after an I/O error, and kib thought that was ok in a review of a previous fix for sendfile I/O error handling. But if you'd like this fix to avoid creating this situation in vnode_pager_generic_getpages_done(), we can do that too.

While I was looking into vm_page_readahead_finish(), I noticed that swp_pager_async_iodone() also leaves pages in this state.

vm_page_grab_valid() (which is the last of the three callers of vm_page_readahead_finish()) does have code to free pages which it allocated that couldn't be read, but it also has a check for these pages being wired, and if the pages are wired then we only unbusy them rather than freeing them. That sounds wrong to me... the page will still be attached to its object, so when the thread that wired the page eventually unwires it, the page won't be freed and will end up in the same invalid and not on a paging queue state. Though I'm not sure that pages that are xbusied and invalid can really be wired by another thread anyway... I don't know what the reason to do that would be. Do you know if the page can really become wired while it's in this state? If it can be wired when we want to free it, how can we handle that? ... I found the comment in vm_page_release_locked() which says that it is actually legal to wire an xbusied page, so it looks like we do need to handle that. Note that vm_page_release_locked() has the same logic as vm_page_grab_valid(), where the page is freed only if it is not wired, and merely unbusied if it is wired, and so this likely needs the same fix as vm_page_grab_valid(), whatever that turns out to be. ... I think the operation that we want for this is "remove this page from its object, and free the page if it is not wired, or unbusy the page if it is wired." If the page is wired by someone else then the page will be freed when that other thread unwires. Does that sound right? I looked for an existing vm_page_* function that does that and I don't see it. It also feels like there would be a race between this new operation and another thread releasing the last wire reference, but I haven't thought about that very much yet.

At first I thought that taking the object write lock in iodone context (as we are in vnode_pager_generic_getpages_done() and swp_pager_async_iodone()) sounded like it could lead to deadlock, but looking again I see that we already take the object lock in read mode in vnode_pager_generic_getpages_done(), so taking the object lock in write mode there is probably ok too.

I'll update the diff here to have vnode_pager_generic_getpages_done() free the pages as you suggest, and that version passes the sendfile atf test that I also put out for review. I won't try to do anything yet about the case where the page is wired and thus cannot be freed though.

It would be great if we could have an assertion or collection of assertions that make sure that we don't put a page into this state where it is invalid and not busy and not wired and not on a paging queue. I think such an assertion would look something like "m->object == NULL || vm_page_queue(m) != PQ_NONE || vm_page_wired(m)", and it would need to go in several places: vm_page_unwire(), vm_page_release(), vm_page_drop(), and probably some subset of the vm_page_pqstate_* functions. Does that sound right? Are there any other functions that can change the page state to this one that we want to be illegal? (These assertions would ideally be written to check the new state of the page via the new value that is installed by the atomic op that changes the state eg. in the ref_count field, and would be skipped in cases where the page is actually freed. But you get the idea.)

redo diff as markj suggested.

This revision is now accepted and ready to land.Jun 27 2020, 2:07 AM
In D25430#562442, @chs wrote:

And yea, this patch does leave these invalid pages in an odd state, where the pagedaemon can't reclaim them (though if the vnode is reclaimed then all of these invalid pages will be freed at that point). I wasn't going to worry about that right now since the sendfile code already can leave pages in this state after an I/O error, and kib thought that was ok in a review of a previous fix for sendfile I/O error handling. But if you'd like this fix to avoid creating this situation in vnode_pager_generic_getpages_done(), we can do that too.

While I was looking into vm_page_readahead_finish(), I noticed that swp_pager_async_iodone() also leaves pages in this state.

I see. That looks like a bug to me.

vm_page_grab_valid() (which is the last of the three callers of vm_page_readahead_finish()) does have code to free pages which it allocated that couldn't be read, but it also has a check for these pages being wired, and if the pages are wired then we only unbusy them rather than freeing them. That sounds wrong to me... the page will still be attached to its object, so when the thread that wired the page eventually unwires it, the page won't be freed and will end up in the same invalid and not on a paging queue state.

Typically, when a thread unwires a page, it uses vm_page_unwire(), which places the page in a queue. There is also vm_page_unwire_noq(), which must be used more carefully to avoid the problem that you pointed out. (Almost all of its current uses in the tree are for "unmanaged" pages that will never be on a page queue, like page table pages. We should probably have a distinct function for unwiring such pages.)

Though I'm not sure that pages that are xbusied and invalid can really be wired by another thread anyway... I don't know what the reason to do that would be. Do you know if the page can really become wired while it's in this state?

In the current locking scheme it is legal to wire a page while holding only its object lock. I suspect that right now, since the scope of the busy lock has expanded, it's not possible for that to happen, but there is no rule against it.

If it can be wired when we want to free it, how can we handle that? ... I found the comment in vm_page_release_locked() which says that it is actually legal to wire an xbusied page, so it looks like we do need to handle that. Note that vm_page_release_locked() has the same logic as vm_page_grab_valid(), where the page is freed only if it is not wired, and merely unbusied if it is wired, and so this likely needs the same fix as vm_page_grab_valid(), whatever that turns out to be. ...

I think vm_page_grab_valid() is actually ok. The key is in the comment above vm_page_wired(): if a page's object is write-locked, the page is xbusy, and it is not mapped, then its wire count is guaranteed not to increase. An invalid page is never mapped. So code like this:

if (!vm_page_wired(m))
    vm_page_free(m);

need only consider a race with an unwiring thread. If vm_page_wired() returns true, then the unwiring thread is responsible for ensuring that it ends up in a paging queue (or freeing it, but in this case there is no race because the object is locked). If vm_page_wired() returns false, then it is safe to free the page because we know that the wiring count cannot increase while the page is locked as described above.

I think the operation that we want for this is "remove this page from its object, and free the page if it is not wired, or unbusy the page if it is wired." If the page is wired by someone else then the page will be freed when that other thread unwires. Does that sound right? I looked for an existing vm_page_* function that does that and I don't see it. It also feels like there would be a race between this new operation and another thread releasing the last wire reference, but I haven't thought about that very much yet.

vm_page_remove() returns a value which indicates whether the page is safe to be freed. When removing a page, it must be xbusy and its object must be locked, and it must not be mapped.

I'll update the diff here to have vnode_pager_generic_getpages_done() free the pages as you suggest, and that version passes the sendfile atf test that I also put out for review. I won't try to do anything yet about the case where the page is wired and thus cannot be freed though.

Per my comments above, I believe it is correct to handle this exactly the same way as vm_page_grab_valid(): i.e., check for wirings and only free the page if none are present.

It would be great if we could have an assertion or collection of assertions that make sure that we don't put a page into this state where it is invalid and not busy and not wired and not on a paging queue. I think such an assertion would look something like "m->object == NULL || vm_page_queue(m) != PQ_NONE || vm_page_wired(m)", and it would need to go in several places: vm_page_unwire(), vm_page_release(), vm_page_drop(), and probably some subset of the vm_page_pqstate_* functions. Does that sound right? Are there any other functions that can change the page state to this one that we want to be illegal? (These assertions would ideally be written to check the new state of the page via the new value that is installed by the atomic op that changes the state eg. in the ref_count field, and would be skipped in cases where the page is actually freed. But you get the idea.)

The tricky part is that all of these fields are synchronized using different mechanisms, so it is hard to write assertions that aren't racy. I agree with the implication of your comments that this stuff is too subtle and somewhat under-asserted.

sys/vm/vnode_pager.c
1150 ↗(On Diff #73711)

Per my comments below, we should check for wirings before freeing. Again, the wire count will not increase while:

  • the page's object is write-locked,
  • the page is xbusy, and
  • the page is unmapped.

Since this is a read-ahead page, it was allocated by the getpages method and must be invalid, which implies that it is unmapped. We could reasonably assert that these pages are invalid as well.

In D25430#562442, @chs wrote:

And yea, this patch does leave these invalid pages in an odd state, where the pagedaemon can't reclaim them (though if the vnode is reclaimed then all of these invalid pages will be freed at that point). I wasn't going to worry about that right now since the sendfile code already can leave pages in this state after an I/O error, and kib thought that was ok in a review of a previous fix for sendfile I/O error handling. But if you'd like this fix to avoid creating this situation in vnode_pager_generic_getpages_done(), we can do that too.

While I was looking into vm_page_readahead_finish(), I noticed that swp_pager_async_iodone() also leaves pages in this state.

I see. That looks like a bug to me.

Ok, I'll follow up on that separately.

vm_page_grab_valid() (which is the last of the three callers of vm_page_readahead_finish()) does have code to free pages which it allocated that couldn't be read, but it also has a check for these pages being wired, and if the pages are wired then we only unbusy them rather than freeing them. That sounds wrong to me... the page will still be attached to its object, so when the thread that wired the page eventually unwires it, the page won't be freed and will end up in the same invalid and not on a paging queue state.

Typically, when a thread unwires a page, it uses vm_page_unwire(), which places the page in a queue. There is also vm_page_unwire_noq(), which must be used more carefully to avoid the problem that you pointed out. (Almost all of its current uses in the tree are for "unmanaged" pages that will never be on a page queue, like page table pages. We should probably have a distinct function for unwiring such pages.)

I thought you were saying that we don't want to place these still-invalid pages in a queue, so then vm_page_unwire() would be doing the wrong thing. If we end up deciding that it's ok to put invalid pages on a queue then we can just remove that assertion from vm_page_readahead_finish(). But it seems better to free these invalid pages immediately, and I thought that's what you were arguing for doing.

Though I'm not sure that pages that are xbusied and invalid can really be wired by another thread anyway... I don't know what the reason to do that would be. Do you know if the page can really become wired while it's in this state?

In the current locking scheme it is legal to wire a page while holding only its object lock. I suspect that right now, since the scope of the busy lock has expanded, it's not possible for that to happen, but there is no rule against it.

If it can be wired when we want to free it, how can we handle that? ... I found the comment in vm_page_release_locked() which says that it is actually legal to wire an xbusied page, so it looks like we do need to handle that. Note that vm_page_release_locked() has the same logic as vm_page_grab_valid(), where the page is freed only if it is not wired, and merely unbusied if it is wired, and so this likely needs the same fix as vm_page_grab_valid(), whatever that turns out to be. ...

I think vm_page_grab_valid() is actually ok. The key is in the comment above vm_page_wired(): if a page's object is write-locked, the page is xbusy, and it is not mapped, then its wire count is guaranteed not to increase. An invalid page is never mapped. So code like this:

if (!vm_page_wired(m))
    vm_page_free(m);

need only consider a race with an unwiring thread. If vm_page_wired() returns true, then the unwiring thread is responsible for ensuring that it ends up in a paging queue (or freeing it, but in this case there is no race because the object is locked). If vm_page_wired() returns false, then it is safe to free the page because we know that the wiring count cannot increase while the page is locked as described above.

I think the operation that we want for this is "remove this page from its object, and free the page if it is not wired, or unbusy the page if it is wired." If the page is wired by someone else then the page will be freed when that other thread unwires. Does that sound right? I looked for an existing vm_page_* function that does that and I don't see it. It also feels like there would be a race between this new operation and another thread releasing the last wire reference, but I haven't thought about that very much yet.

vm_page_remove() returns a value which indicates whether the page is safe to be freed. When removing a page, it must be xbusy and its object must be locked, and it must not be mapped.

Ahh, this does look like what we want here. I'll update the diff to use that on these read ahead/behind pages.

I'll update the diff here to have vnode_pager_generic_getpages_done() free the pages as you suggest, and that version passes the sendfile atf test that I also put out for review. I won't try to do anything yet about the case where the page is wired and thus cannot be freed though.

Per my comments above, I believe it is correct to handle this exactly the same way as vm_page_grab_valid(): i.e., check for wirings and only free the page if none are present.

But if we only xunbusy the read ahead/behind pages in vnode_pager_generic_getpages_done() and vm_page_grab_valid() when the read failed and the page is now wired, the thread that unwires has no way to know that it should free the page. We need to do vm_page_remove() so that the unwire thread will know to free the page.

I looked through all of the callers of vm_pager_get_pages() to see how they handle errors, and most of them don't correctly handle the case where the page is now wired. If we continue to allow pages to be wired while holding only the page's object's lock, then we'll need to fix all of those too. Is there any compelling reason to continue to allow wiring a page while holding only the page's object's lock? If we change the rule to require a page to be busy in order to increase the wire count then that would be a lot simpler.

I also see that the locking assertion in vm_page_wire() is:

if (!vm_page_busied(m) && !vm_object_busied(m->object))
        VM_OBJECT_ASSERT_LOCKED(m->object);

so according to this, it's also ok to wire a page if you have the page's object busy. However, the only caller of vm_object_busy() (which is vm_fault_soft_fast()) also has the page's object rlocked, so it would make sense to change this locking assertion in vm_page_wire() to no longer allow wiring with only the page's object busy.

It would be great if we could have an assertion or collection of assertions that make sure that we don't put a page into this state where it is invalid and not busy and not wired and not on a paging queue. I think such an assertion would look something like "m->object == NULL || vm_page_queue(m) != PQ_NONE || vm_page_wired(m)", and it would need to go in several places: vm_page_unwire(), vm_page_release(), vm_page_drop(), and probably some subset of the vm_page_pqstate_* functions. Does that sound right? Are there any other functions that can change the page state to this one that we want to be illegal? (These assertions would ideally be written to check the new state of the page via the new value that is installed by the atomic op that changes the state eg. in the ref_count field, and would be skipped in cases where the page is actually freed. But you get the idea.)

The tricky part is that all of these fields are synchronized using different mechanisms, so it is hard to write assertions that aren't racy. I agree with the implication of your comments that this stuff is too subtle and somewhat under-asserted.

If such assertions about the locking requirements for these various operations can't be written without races then the design probably has fundamental races preventing correctness too.

updated diff for markj's latest comments.

This revision now requires review to proceed.Jul 6 2020, 10:50 PM
In D25430#565790, @chs wrote:

vm_page_grab_valid() (which is the last of the three callers of vm_page_readahead_finish()) does have code to free pages which it allocated that couldn't be read, but it also has a check for these pages being wired, and if the pages are wired then we only unbusy them rather than freeing them. That sounds wrong to me... the page will still be attached to its object, so when the thread that wired the page eventually unwires it, the page won't be freed and will end up in the same invalid and not on a paging queue state.

Typically, when a thread unwires a page, it uses vm_page_unwire(), which places the page in a queue. There is also vm_page_unwire_noq(), which must be used more carefully to avoid the problem that you pointed out. (Almost all of its current uses in the tree are for "unmanaged" pages that will never be on a page queue, like page table pages. We should probably have a distinct function for unwiring such pages.)

I thought you were saying that we don't want to place these still-invalid pages in a queue, so then vm_page_unwire() would be doing the wrong thing. If we end up deciding that it's ok to put invalid pages on a queue then we can just remove that assertion from vm_page_readahead_finish(). But it seems better to free these invalid pages immediately, and I thought that's what you were arguing for doing.

Well, it *is* ok to put invalid pages on a queue. It makes more sense to free the page immediately if the object write lock is held, or if it can be acquired easily, but that might not always be the case. If the page is wired then it certainly cannot be freed, but as long as the unwiring thread ensures that the page ends up in a queue, we can be sure that the page is not leaked. Since it is quite unlikely that an invalid readahead page ends up being wired, then I'm satisfied with the approach of checking for wirings before freeing an invalid readahead page.

The approach of removing the page from the object should be ok too and provides more consistent behaviour, so I'm in favour of that.

I'll update the diff here to have vnode_pager_generic_getpages_done() free the pages as you suggest, and that version passes the sendfile atf test that I also put out for review. I won't try to do anything yet about the case where the page is wired and thus cannot be freed though.

Per my comments above, I believe it is correct to handle this exactly the same way as vm_page_grab_valid(): i.e., check for wirings and only free the page if none are present.

But if we only xunbusy the read ahead/behind pages in vnode_pager_generic_getpages_done() and vm_page_grab_valid() when the read failed and the page is now wired, the thread that unwires has no way to know that it should free the page. We need to do vm_page_remove() so that the unwire thread will know to free the page.

I looked through all of the callers of vm_pager_get_pages() to see how they handle errors, and most of them don't correctly handle the case where the page is now wired. If we continue to allow pages to be wired while holding only the page's object's lock, then we'll need to fix all of those too. Is there any compelling reason to continue to allow wiring a page while holding only the page's object's lock? If we change the rule to require a page to be busy in order to increase the wire count then that would be a lot simpler.

In at least some of those cases (like md(4)), it should be impossible for the page in question to acquire wirings. The tmpfs and shmem cases seem potentially problematic.

I can't see any lingering cases where we rely on the object lock to provide synchronization when wiring a page, so I think your proposal is a good idea. In the past it wouldn't really have been possible but the scope of the busy lock has increased.

I also see that the locking assertion in vm_page_wire() is:

if (!vm_page_busied(m) && !vm_object_busied(m->object))
        VM_OBJECT_ASSERT_LOCKED(m->object);

so according to this, it's also ok to wire a page if you have the page's object busy. However, the only caller of vm_object_busy() (which is vm_fault_soft_fast()) also has the page's object rlocked, so it would make sense to change this locking assertion in vm_page_wire() to no longer allow wiring with only the page's object busy.

If the page is unbusied, and the object is busied, then the page is effectively xbusied because a busied object blocks vm_page_xbusy(). So the object busy is not really a separate case. That said, I don't see a reason to allow it here if it makes the code harder still to reason about.

The tricky part is that all of these fields are synchronized using different mechanisms, so it is hard to write assertions that aren't racy. I agree with the implication of your comments that this stuff is too subtle and somewhat under-asserted.

If such assertions about the locking requirements for these various operations can't be written without races then the design probably has fundamental races preventing correctness too.

Assertions about locking requirements for various operations can be and are written without races. I'm just saying that each vm_page is part of several state machines with different synchronization protocols, so it's hard to assert on the combined state when it can't be loaded atomically from memory.

This revision is now accepted and ready to land.Jul 6 2020, 11:43 PM
In D25430#565790, @chs wrote:

I looked through all of the callers of vm_pager_get_pages() to see how they handle errors, and most of them don't correctly handle the case where the page is now wired. If we continue to allow pages to be wired while holding only the page's object's lock, then we'll need to fix all of those too. Is there any compelling reason to continue to allow wiring a page while holding only the page's object's lock? If we change the rule to require a page to be busy in order to increase the wire count then that would be a lot simpler.

In at least some of those cases (like md(4)), it should be impossible for the page in question to acquire wirings. The tmpfs and shmem cases seem potentially problematic.

I can't see any lingering cases where we rely on the object lock to provide synchronization when wiring a page, so I think your proposal is a good idea. In the past it wouldn't really have been possible but the scope of the busy lock has increased.

The proposal means that it is impossible to wire the page in non-sleepable context, unless we own xbusy on the page in advance. I believe this is no-go.

In D25430#565838, @kib wrote:
In D25430#565790, @chs wrote:

I looked through all of the callers of vm_pager_get_pages() to see how they handle errors, and most of them don't correctly handle the case where the page is now wired. If we continue to allow pages to be wired while holding only the page's object's lock, then we'll need to fix all of those too. Is there any compelling reason to continue to allow wiring a page while holding only the page's object's lock? If we change the rule to require a page to be busy in order to increase the wire count then that would be a lot simpler.

In at least some of those cases (like md(4)), it should be impossible for the page in question to acquire wirings. The tmpfs and shmem cases seem potentially problematic.

I can't see any lingering cases where we rely on the object lock to provide synchronization when wiring a page, so I think your proposal is a good idea. In the past it wouldn't really have been possible but the scope of the busy lock has increased.

The proposal means that it is impossible to wire the page in non-sleepable context, unless we own xbusy on the page in advance. I believe this is no-go.

It is unusual for code to wire managed pages without using the busy lock to provide some consistency. vm_page_grab(), etc., provide mechanisms to avoid blocking, so callers can block and retry at a point of their choosing. In fact I cannot see any places in the tree today where a managed wiring is created without the busy lock.

In D25430#565838, @kib wrote:

The proposal means that it is impossible to wire the page in non-sleepable context, unless we own xbusy on the page in advance. I believe this is no-go.

It is unusual for code to wire managed pages without using the busy lock to provide some consistency. vm_page_grab(), etc., provide mechanisms to avoid blocking, so callers can block and retry at a point of their choosing. In fact I cannot see any places in the tree today where a managed wiring is created without the busy lock.

I would agree that wiring could obey to this rule, but I believe we must provide sleep-less mechanism to prevent page from reuse. Some time ago it was hold, since then hold count was merged with wire count, thus wire count should provide such functionality.

In D25430#565910, @kib wrote:
In D25430#565838, @kib wrote:

The proposal means that it is impossible to wire the page in non-sleepable context, unless we own xbusy on the page in advance. I believe this is no-go.

It is unusual for code to wire managed pages without using the busy lock to provide some consistency. vm_page_grab(), etc., provide mechanisms to avoid blocking, so callers can block and retry at a point of their choosing. In fact I cannot see any places in the tree today where a managed wiring is created without the busy lock.

I would agree that wiring could obey to this rule, but I believe we must provide sleep-less mechanism to prevent page from reuse. Some time ago it was hold, since then hold count was merged with wire count, thus wire count should provide such functionality.

Let's keep it for now then. In the meantime, we might generalize vm_page_readahead_free() to handle non-readahead pages. For example, fault_page_free() could also remove the page from its object. This function could also be used in tmpfs_reg_resize() and shm_dotruncate_locked().

In D25430#565910, @kib wrote:
In D25430#565838, @kib wrote:

The proposal means that it is impossible to wire the page in non-sleepable context, unless we own xbusy on the page in advance. I believe this is no-go.

It is unusual for code to wire managed pages without using the busy lock to provide some consistency. vm_page_grab(), etc., provide mechanisms to avoid blocking, so callers can block and retry at a point of their choosing. In fact I cannot see any places in the tree today where a managed wiring is created without the busy lock.

I would agree that wiring could obey to this rule, but I believe we must provide sleep-less mechanism to prevent page from reuse. Some time ago it was hold, since then hold count was merged with wire count, thus wire count should provide such functionality.

Let's keep it for now then. In the meantime, we might generalize vm_page_readahead_free() to handle non-readahead pages. For example, fault_page_free() could also remove the page from its object. This function could also be used in tmpfs_reg_resize() and shm_dotruncate_locked().

What about vm_page_readahead_free() would need to change to handle non-readahead pages, other than changing the name? Is there a name that you would prefer?

fault_page_free() does do more or less the same thing, so it would make sense to change it to use the new common function, whatever we end up naming it. Those tmpfs and shm functions would benefit from using this new function as well. But I'd like to commit all those other changes (as well as the vm_page_grab_valid() change in the most recent diff here) separately so that I can fix the vnode_pager_getpages_done() bug that I introduced fixed sooner, after which I can also commit the new tests for sendfile that trigger that bug. Hopefully that's ok.

In D25430#567733, @chs wrote:
In D25430#565910, @kib wrote:
In D25430#565838, @kib wrote:

The proposal means that it is impossible to wire the page in non-sleepable context, unless we own xbusy on the page in advance. I believe this is no-go.

It is unusual for code to wire managed pages without using the busy lock to provide some consistency. vm_page_grab(), etc., provide mechanisms to avoid blocking, so callers can block and retry at a point of their choosing. In fact I cannot see any places in the tree today where a managed wiring is created without the busy lock.

I would agree that wiring could obey to this rule, but I believe we must provide sleep-less mechanism to prevent page from reuse. Some time ago it was hold, since then hold count was merged with wire count, thus wire count should provide such functionality.

Let's keep it for now then. In the meantime, we might generalize vm_page_readahead_free() to handle non-readahead pages. For example, fault_page_free() could also remove the page from its object. This function could also be used in tmpfs_reg_resize() and shm_dotruncate_locked().

What about vm_page_readahead_free() would need to change to handle non-readahead pages, other than changing the name? Is there a name that you would prefer?

I don't think the implementation needs to change. vm_page_remove_invalid() might be a reasonable name.

fault_page_free() does do more or less the same thing, so it would make sense to change it to use the new common function, whatever we end up naming it. Those tmpfs and shm functions would benefit from using this new function as well. But I'd like to commit all those other changes (as well as the vm_page_grab_valid() change in the most recent diff here) separately so that I can fix the vnode_pager_getpages_done() bug that I introduced fixed sooner, after which I can also commit the new tests for sendfile that trigger that bug. Hopefully that's ok.

That is ok with me.

rename the new function and limit this diff to only fixing vnode_pager_generic_getpages_done().

This revision now requires review to proceed.Jul 15 2020, 11:51 PM
sys/vm/vm_page.c
1369 ↗(On Diff #74502)

I would call this vm_page_free_invalid()

1375 ↗(On Diff #74502)

I think it is that rare case when it is useful to assert that m->object != NULL.

update diff for kib's latest comments.

This revision is now accepted and ready to land.Jul 17 2020, 8:20 PM