Page MenuHomeFreeBSD

vm_page: Fix PG_ZERO handling in vm_page_grab_*()
Needs ReviewPublic

Authored by markj on Thu, Apr 16, 8:51 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, May 12, 9:50 AM
Unknown Object (File)
Tue, May 12, 3:24 AM
Unknown Object (File)
Mon, May 11, 9:33 PM
Unknown Object (File)
Sat, May 9, 8:11 AM
Unknown Object (File)
Sun, May 3, 10:17 AM
Unknown Object (File)
Fri, May 1, 7:17 AM
Unknown Object (File)
Fri, May 1, 7:11 AM
Unknown Object (File)
Wed, Apr 29, 2:25 AM
Subscribers

Details

Reviewers
alc
kib
Summary

vm_page_grab_pages() and vm_page_grab_pages_unlocked() were honouring
PG_ZERO for resident, invalid pages, but this isn't really sound.
PG_ZERO should only be considered at allocation time. Unconditionally
already-resident pages if VM_ALLOC_ZERO was specified.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 72304
Build 69187: arc lint + arc unit

Event Timeline

markj requested review of this revision.Thu, Apr 16, 8:51 PM
sys/vm/vm_page.c
5235–5236

Just as a matter style, for consistency, could you swap the order of the tests (vm_page_none_valid(m), (allocflags & VM_ALLOC_ZERO) != 0) either here or in vm_page_grab_pages_unlocked().

markj marked an inline comment as done.

Make tests more consistent.

Do we have any caller using vm_page_grab_pages(_unlocked)(VM_ALLOC_ZERO)? I did not found any.

I thought what could be a use for such call, and I really do not see how it can be useful. Both vnode and swap objects might have the page content on the volume, so the invalid page we found on the queue cannot be safely zeroed without consulting pager first. In the end, the decision to zero such page belongs to the caller.

In D56432#1292906, @kib wrote:

Do we have any caller using vm_page_grab_pages(_unlocked)(VM_ALLOC_ZERO)? I did not found any.

kern_kexec.c calls vm_page_grab_pages_unlocked(VM_ALLOC_ZERO). It looks like it could just use vm_page_alloc() though.

I thought what could be a use for such call, and I really do not see how it can be useful. Both vnode and swap objects might have the page content on the volume, so the invalid page we found on the queue cannot be safely zeroed without consulting pager first. In the end, the decision to zero such page belongs to the caller.

Effectively, you agree that we can drop the support and disable passing VM_ALLOC_ZERO to vm_page_grab_unlocked?

In D56432#1294556, @kib wrote:

Effectively, you agree that we can drop the support and disable passing VM_ALLOC_ZERO to vm_page_grab_unlocked?

After kern_kexec is fixed, yes. But now I am trying to understand what that code is doing, and failing.

In D56432#1292906, @kib wrote:

I thought what could be a use for such call, and I really do not see how it can be useful. Both vnode and swap objects might have the page content on the volume, so the invalid page we found on the queue cannot be safely zeroed without consulting pager first. In the end, the decision to zero such page belongs to the caller.

For other object types, like OBJT_PHYS in kern_kexec.c, it is still somewhat useful since we do not have a vm_page_alloc_pages(), so pages have to be allocated one-by-one. But yes, also note that vm_page_grab_valid() does not handle VM_ALLOC_ZERO.

In D56432#1294556, @kib wrote:

Effectively, you agree that we can drop the support and disable passing VM_ALLOC_ZERO to vm_page_grab_unlocked?

After kern_kexec is fixed, yes. But now I am trying to understand what that code is doing, and failing.

In D56432#1292906, @kib wrote:

I thought what could be a use for such call, and I really do not see how it can be useful. Both vnode and swap objects might have the page content on the volume, so the invalid page we found on the queue cannot be safely zeroed without consulting pager first. In the end, the decision to zero such page belongs to the caller.

For other object types, like OBJT_PHYS in kern_kexec.c, it is still somewhat useful since we do not have a vm_page_alloc_pages(), so pages have to be allocated one-by-one. But yes, also note that vm_page_grab_valid() does not handle VM_ALLOC_ZERO.

Even so, using the unlocked variant is utterly pointless, since there can't possibly be any existing pages in the object.

In D56432#1294614, @alc wrote:
In D56432#1294556, @kib wrote:

Effectively, you agree that we can drop the support and disable passing VM_ALLOC_ZERO to vm_page_grab_unlocked?

After kern_kexec is fixed, yes. But now I am trying to understand what that code is doing, and failing.

In D56432#1292906, @kib wrote:

I thought what could be a use for such call, and I really do not see how it can be useful. Both vnode and swap objects might have the page content on the volume, so the invalid page we found on the queue cannot be safely zeroed without consulting pager first. In the end, the decision to zero such page belongs to the caller.

For other object types, like OBJT_PHYS in kern_kexec.c, it is still somewhat useful since we do not have a vm_page_alloc_pages(), so pages have to be allocated one-by-one. But yes, also note that vm_page_grab_valid() does not handle VM_ALLOC_ZERO.

Even so, using the unlocked variant is utterly pointless, since there can't possibly be any existing pages in the object.

Yes, though I believe kib's question applies to the locked vm_page_grab_pages() as well.

In D56432#1294556, @kib wrote:

Effectively, you agree that we can drop the support and disable passing VM_ALLOC_ZERO to vm_page_grab_unlocked?

After kern_kexec is fixed, yes. But now I am trying to understand what that code is doing, and failing.

I had the same issue. It could happen that the allocation provide the page that has phys address already belonging to the final location of the segment. In this case, the page should not be copied through trampoline, it would not work since it content could be overwritten before current data is copied to the final location. Instead, they put the page into the correct index of the containing object there.

In D56432#1294689, @kib wrote:
In D56432#1294556, @kib wrote:

Effectively, you agree that we can drop the support and disable passing VM_ALLOC_ZERO to vm_page_grab_unlocked?

After kern_kexec is fixed, yes. But now I am trying to understand what that code is doing, and failing.

I had the same issue. It could happen that the allocation provide the page that has phys address already belonging to the final location of the segment. In this case, the page should not be copied through trampoline, it would not work since it content could be overwritten before current data is copied to the final location. Instead, they put the page into the correct index of the containing object there.

I think I understand. In any case I think we can remove the use of the VM_ALLOC_ZERO flag there, but I would do that separately. My immediate goal is to avoid trusting PG_ZERO for managed page allocations, since that has led to several severe bugs and does not provide much benefit (unlike for noobj allocations, see my comments in D56234).

In D56432#1294689, @kib wrote:
In D56432#1294556, @kib wrote:

Effectively, you agree that we can drop the support and disable passing VM_ALLOC_ZERO to vm_page_grab_unlocked?

After kern_kexec is fixed, yes. But now I am trying to understand what that code is doing, and failing.

I had the same issue. It could happen that the allocation provide the page that has phys address already belonging to the final location of the segment. In this case, the page should not be copied through trampoline, it would not work since it content could be overwritten before current data is copied to the final location. Instead, they put the page into the correct index of the containing object there.

I think I understand. In any case I think we can remove the use of the VM_ALLOC_ZERO flag there, but I would do that separately. My immediate goal is to avoid trusting PG_ZERO for managed page allocations, since that has led to several severe bugs and does not provide much benefit (unlike for noobj allocations, see my comments in D56234).

In my opinion, removing VM_ALLOC_ZERO (formal) support from vm_page_grab* is the best step there.