Page MenuHomeFreeBSD

vm: Handle VM_ALLOC_ZERO in the page allocator
AcceptedPublic

Authored by markj on Fri, Feb 19, 8:18 PM.

Details

Reviewers
alc
kib
jeff
Group Reviewers
manpages
Summary

Currently VM_ALLOC_ZERO is advisory and callers must remember to zero
the pages themselves if necessary, which is most of the time. This is
error-prone, especially considering that vm_page_alloc() and
vm_page_grab() interpret VM_ALLOC_ZERO differently, and it complicates
callers. The main benefit appears to be an optimization for the fault
handler such that it can release the object lock before zeroing the
page. It also provides a minor improvement for platforms without a
direct map, since if they map the returned page anyway they can zero the
returned memory without having to use pmap_zero_page().

I propose zeroing pages in the allocator and handling
performance-sensitive cases more surgically. A follow-up commit will
modify consumers.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 37209
Build 34098: arc lint + arc unit

Event Timeline

sys/vm/vm_page.c
2393–2394

Wouldn't it be useful to set PG_ZERO if VM_ALLOC_ZERO was set? At least the work we did would not be lost.

markj marked an inline comment as done.

Set PG_ZERO if radix insertion fails and we zeroed the page.

If the page was already zeroed and VM_ALLOC_ZERO was not specified,
then we will not restore PG_ZERO. It seems like too much work to
handle this rare case, but if others disagree I will handle it.

This revision is now accepted and ready to land.Sat, Feb 20, 12:49 PM
sys/vm/vm_page.c
1960

This is going to change the meaning of this counter from zero-fill on page faults (which is where the "d" for "demand" as in "demand paging" comes from) to zero-fill on any allocation.

1985

Why the new "blank line" here? These are all optional flags, and there is no similar blank line added in similar comments below.

2372

Just an observation: Per table 5 (https://www.usenix.org/system/files/atc20-zhu-weixi_0.pdf), we should really stop zeroing one page at a time. On machines with a direct map, pmap_zero_page_area() would already "do the right thing". We would just need to modify the implementation on the other architectures.

2496

"pages" -> "page", "their" -> "its".

I really have mixed feelings about this change. If anything, we should try to discourage page zeroing while the object lock is held. On the other hand, for VM_ALLOC_NOOBJ allocations, I think that this change makes perfect sense, which brings me to the following proposal: Remove VM_ALLOC_NOOBJ and VM_ALLOC_ZERO from vm_page_alloc{,_contig}(), and provide separate allocation functions to replace VM_ALLOC_NOOBJ, akin to vm_page_alloc_freelist(). Virtually all VM_ALLOC_NOOBJ call sites are being changed, so we as well change the function being called. This will slightly simplify vm_page_alloc(), e.g., it can assume that the object is always non-NULL and that it will only allocate from the default pool. Also, when I say, "Remove ... VM_ALLOC_ZERO from vm_page_alloc()", I would probably let PG_ZERO pass through vm_page_alloc() unchanged. Right now, we clear that flag unless VM_ALLOC_ZERO was specified. As for the functions that replace VM_ALLOC_NOOBJ, I would drop not only the object parameter, but also the pindex. The few callers that (ab)use the pindex can set it themselves.

In D28805#645337, @alc wrote:

I really have mixed feelings about this change. If anything, we should try to discourage page zeroing while the object lock is held. On the other hand, for VM_ALLOC_NOOBJ allocations, I think that this change makes perfect sense, which brings me to the following proposal: Remove VM_ALLOC_NOOBJ and VM_ALLOC_ZERO from vm_page_alloc{,_contig}(), and provide separate allocation functions to replace VM_ALLOC_NOOBJ, akin to vm_page_alloc_freelist().
Virtually all VM_ALLOC_NOOBJ call sites are being changed, so we as well change the function being called. This will slightly simplify vm_page_alloc(), e.g., it can assume that the object is always non-NULL and that it will only allocate from the default pool. Also, when I say, "Remove ... VM_ALLOC_ZERO from vm_page_alloc()", I would probably let PG_ZERO pass through vm_page_alloc() unchanged. Right now, we clear that flag unless VM_ALLOC_ZERO was specified. As for the functions that replace VM_ALLOC_NOOBJ, I would drop not only the object parameter, but also the pindex. The few callers that (ab)use the pindex can set it themselves.

So to be clear, your proposal is to add a vm_page_alloc_anon() (or _noobj()?) which accepts a VM_ALLOC_ZERO flag and implements it by zeroing the page, whereas vm_page_alloc(_contig)() should stop taking VM_ALLOC_ZERO and instead return PG_ZERO unchanged? I like the idea of splitting the allocator functions and preserving PG_ZERO. It feels a bit odd to have inconsistent handling with respect to VM_ALLOC_ZERO though. There are situations where allocations are rare enough that zeroing under the object lock is not a problem (or it is required), and splitting the allocator entry points would make it easier to spot calls where it is likely to be a problem. The pti_obj object used to manage userspace-visible "holes" in the kernel address space is an example of this.

sys/vm/vm_page.c
1960

That is true, I'm shoehorning them in here. My weak argument is that the vmstat -s descriptions are vague enough to be correct under this new usage. However with your proposal implemented I would move these increments back to their original location.

2372

I think we'd want to additionally widen the types of the size and off to size_t.