Paths

Table of Contentst

vm_fault: Reset m_needs_zeroing in vm_fault_busy_sleep()
ClosedPublic
Actions

Authored by markj on Apr 3 2026, 2:45 PM.

Details

Reviewers

alc
kib

Commits

rG6c9dd7528350: vm_fault: Reset m_needs_zeroing properly
rG4d22b3925df8: vm_fault: Reset m_needs_zeroing properly
rG1abe7ead45c3: vm_fault: Reset m_needs_zeroing properly
rGffb21713d9fd: vm_fault: Reset m_needs_zeroing properly
rG58718cf36593: vm_fault: Reset m_needs_zeroing properly
rG50f7b62f0862: vm_fault: Reset m_needs_zeroing properly
rG9b7c0f4f81f0: vm_fault: Reset m_needs_zeroing properly
rG04132e010043: vm_fault: Reset m_needs_zeroing properly

Summary

vm_fault_next() relies upon vm_fault_busy_sleep() to clean up fault
state, and this was not accounted for in commit cff67bc43df1.

Fixes: cff67bc43df1 ("vm_fault: only rely on PG_ZERO when the page was newly allocated")

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

markj created this revision.Apr 3 2026, 2:45 PM

Herald added a subscriber: imp. · View Herald TranscriptApr 3 2026, 2:45 PM

markj requested review of this revision.Apr 3 2026, 2:45 PM

Harbormaster completed remote builds in B71935: Diff 174858.Apr 3 2026, 2:45 PM

Assuming that this change is right, I'd like to go further and remove the whole PG_ZERO mechanism: it adds non-trivial complexity, I've never seen workloads where it elides a significant number (say, more than 0.1%) of page zero operations, and page-zeroing is cheaper now than it used to be. Any thoughts?

The mere fact that we slept on the page somewhere in the shadow hierarchy does not mean that first_m should be zeroed.
But, if we go this route, then IMO m_needs_zero should be reset to true right at the RetryFault, and be done with it.

WRT removing PG_ZERO, do you propose to zeroing page in vm_page_alloc(VM_PAGE_ZERO)? Long time ago we had background thread that zeroed pages, and separate queue for zeroed pages. With addition of the object reservations it is indeed quite unlikely that the page at specific object pindex happen to be zero because it was returned from something like page table page allocator.

Later note: removing the PG_ZERO late zeroing there would still not remove the need to explicitly zero the page, since in case the page was found invalid on the object queue.

In D56234#1287011, @kib wrote:

The mere fact that we slept on the page somewhere in the shadow hierarchy does not mean that first_m should be zeroed.
But, if we go this route, then IMO m_needs_zero should be reset to true right at the RetryFault, and be done with it.

WRT removing PG_ZERO, do you propose to zeroing page in vm_page_alloc(VM_PAGE_ZERO)? Long time ago we had background thread that zeroed pages, and separate queue for zeroed pages.

Yes. I removed the background thread, as it was disabled by default and did not work properly with the vm_reserv subsystem (IIRC because free pages belonging to a reservation are counted in v_free_count but are not present in the vm_phys buddy queues, and the background thread did not handle this discrepancy). I also made vm_page_alloc_noobj(VM_ALLOC_ZERO) zero the page, so it is somewhat strange that vm_page_alloc(VM_ALLOC_ZERO) does not.

With addition of the object reservations it is indeed quite unlikely that the page at specific object pindex happen to be zero because it was returned from something like page table page allocator.

Also because page table pages are freed to VM_FREEPOOL_DIRECT, whereas fault pages are allocated from VM_FREEPOOL_DEFAULT.

Later note: removing the PG_ZERO late zeroing there would still not remove the need to explicitly zero the page, since in case the page was found invalid on the object queue.

Yes, but IMO we should simply zero it unconditionally. That would remove the need for this extra state tracking in m_needs_zeroing. It would also simplify the page allocator itself.

Reset m_needs_zeroing after the RetryFault label.

Harbormaster completed remote builds in B71961: Diff 174916.Apr 4 2026, 10:46 AM

kib accepted this revision.Apr 4 2026, 10:47 AM

This revision is now accepted and ready to land.Apr 4 2026, 10:47 AM

We might call vm_fault_allocate() more than once while handling a given page fault: suppose a top-level object shadows a OBJT_SWAP object with some swap space assigned. We'll allocate the top-level page, set m_needs_zeroing, then allocate a page in the backing object, set m_needs_zeroing again, then try to page in. If the pager doesn't have the page, we'll end up zeroing the top-level page using the wrong value for the flag.

Set m_needs_zeroing only for the top-level page, since that's what
vm_fault_zerofill() operates on.

This revision now requires review to proceed.Apr 6 2026, 4:02 AM

Harbormaster completed remote builds in B71977: Diff 174950.Apr 6 2026, 4:02 AM

This is probably fine, but then resetting m_needs_zeroing seems to be somewhat unneeded. I started thinking that m_needs_zeroing should be removed, and when we find an invalid page on the shadowing object queue, we need to zero it immediately.

This revision is now accepted and ready to land.Apr 6 2026, 7:05 AM

In D56234#1287417, @kib wrote:

This is probably fine, but then resetting m_needs_zeroing seems to be somewhat unneeded. I started thinking that m_needs_zeroing should be removed, and when we find an invalid page on the shadowing object queue, we need to zero it immediately.

Yes, I think that would be better. But, since we need to release another EN for the bug, I prefer to first commit a minimal patch.

In D56234#1286732, @markj wrote:

Assuming that this change is right, I'd like to go further and remove the whole PG_ZERO mechanism: it adds non-trivial complexity, I've never seen workloads where it elides a significant number (say, more than 0.1%) of page zero operations, and page-zeroing is cheaper now than it used to be. Any thoughts?

I'm using this dtrace script to see how often the page allocator returns pre-zeroed pages:

dtrace -n 'vm_page_alloc_domain_iter:return {@a[args[1]->flags & 4] = count();} vm_page_alloc_noobj_domain:return {@b[args[1]->flags & 4] = count();}'

The two aggregations effectively capture the number of pre-zeroed pages allocated from the default and direct freepools, respectively. When I run builds, for the default freepool, the count is effectively zero, but for the direct freepool we usually get >50% pre-zeroed pages. This intuitively makes sense since during a build, most of the activity in the direct freepool will come from pmap.c allocating and freeing PTPs, the pmap.c is the main (the only?) source of pre-zeroed pages.

So rather than removing PG_ZERO outright, IMO we should instead ignore it when allocating from the default free pool. That is:

Make vm_page_alloc_* handle VM_ALLOC_ZERO by zeroing the page unconditionally.
Change the fault handler to ignore PG_ZERO and always zero a new top-level page if there is no backing page.
Change the fault handler to immediately zero a resident top-level page if it is invalid.

This may introduce some unnecessary zeroing, but I believe that would be pretty minimal, and I think it's worth it to avoid these kinds of bugs.

In D56234#1288141, @markj wrote:
I'm using this dtrace script to see how often the page allocator returns pre-zeroed pages:
dtrace -n 'vm_page_alloc_domain_iter:return {@a[args[1]->flags & 4] = count();} vm_page_alloc_noobj_domain:return {@b[args[1]->flags & 4] = count();}'
The two aggregations effectively capture the number of pre-zeroed pages allocated from the default and direct freepools, respectively. When I run builds, for the default freepool, the count is effectively zero, but for the direct freepool we usually get >50% pre-zeroed pages. This intuitively makes sense since during a build, most of the activity in the direct freepool will come from pmap.c allocating and freeing PTPs, the pmap.c is the main (the only?) source of pre-zeroed pages.

On my laptop, a clean buildkernel yields:

0         59295715
4              525

0          1907011
4          2960739

so for the default freepool, we only allocated 525 pre-zeroed pages in total, whereas something like 3/5ths of the pages allocated from the direct freepool were prezeroed.

Closed by commit rG04132e010043: vm_fault: Reset m_needs_zeroing properly (authored by markj). · Explain WhyApr 8 2026, 2:05 PM

This revision was automatically updated to reflect the committed changes.

markj added a commit: rG04132e010043: vm_fault: Reset m_needs_zeroing properly.

markj added a commit: rG9b7c0f4f81f0: vm_fault: Reset m_needs_zeroing properly.Apr 13 2026, 2:57 AM

markj added a commit: rG50f7b62f0862: vm_fault: Reset m_needs_zeroing properly.

markj added a commit: rG58718cf36593: vm_fault: Reset m_needs_zeroing properly.Apr 13 2026, 10:58 AM

markj added a commit: rGffb21713d9fd: vm_fault: Reset m_needs_zeroing properly.Apr 21 2026, 3:44 PM

markj added a commit: rG1abe7ead45c3: vm_fault: Reset m_needs_zeroing properly.

markj added a commit: rG4d22b3925df8: vm_fault: Reset m_needs_zeroing properly.

markj added a commit: rG6c9dd7528350: vm_fault: Reset m_needs_zeroing properly.Apr 21 2026, 3:47 PM

markj mentioned this in D56432: vm_page: Fix PG_ZERO handling in vm_page_grab_*().Apr 22 2026, 2:25 PM