Page MenuHomeFreeBSD

Provide accounting of user-wired pages.
ClosedPublic

Authored by markj on Apr 14 2019, 8:27 AM.

Details

Summary

This change adds a new vm counter, v_user_wire_count, which counts the
number of user-wired virtual pages. v_wire_count continues to count
user-wired physical pages. The motivation is to implement a system
limit on user-wired pages, replacing the existing limit applied to
mlock(2).

Like per-process RLIMIT_MEMLOCK, accounting is done at the VM map layer,
so pages wired into multiple vm_map_entries are counted multiple times.
I initially tried to provide accounting at the physical page layer, but
this turns out to be quite complicated because there is no reliable
method of counting the number of user-wired mappings of a given physical
page. In particular, there are several mechanisms that cause a
user-wired page to be unmapped:

  • msync(MS_INVALIDATE)
  • mprotect(PROT_NONE)
  • truncation of the backing vnode or shm object

Thus, when unwiring a physical page, it is not really possible to
determine whether other user wirings exist without some additional
information in the vm_page structure. As a step toward merging
hold_count and wire_count, I therefore followed kib's suggestion of
simply counting virtual pages.

With this change the vm_page_max_wired limit is renamed to
vm_page_max_user_wired and the check is moved into vm_map_wire(). This
ensures that the limit is applied to all user wirings; previously it was
not applied to, e.g., mlockall(2).

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

markj created this revision.Apr 14 2019, 8:27 AM
kib added inline comments.Apr 14 2019, 9:12 AM
sys/sys/vmmeter.h
132 ↗(On Diff #56192)

Why not declare it u_long from tne start ? vm_page_array_size is long.

sys/vm/vm_pageout.c
199 ↗(On Diff #56192)

bde' preferred style is to have CTLFLAG_XXX on the previous line, and the variable address starting the first continuation line.

mjg added a subscriber: mjg.Apr 14 2019, 1:02 PM
mjg added inline comments.
sys/sys/vmmeter.h
132 ↗(On Diff #56192)

Any reason to put this in vmmeter? Currently the struct is read-only with everything initialized at boot time (also vm_cnt is annotated with read_mostly which "discourages" writing). Perhaps a user-visible vmmeter struct can be constructed instead and the kernel for its own uses would keep the new atomic counter elsewhere (e.g. a global annotated with exclusive_cache_line).

markj marked 2 inline comments as done.Apr 14 2019, 1:04 PM
markj updated this revision to Diff 56199.

Handle kib's notes.

markj added inline comments.Apr 14 2019, 1:06 PM
sys/sys/vmmeter.h
132 ↗(On Diff #56192)

I don't think there's any problem with moving it out, though I expect that it will be updated fairly rarely in general (so I think using a global atomic counter is fine). I'll try moving it out.

markj marked an inline comment as done.Apr 20 2019, 10:35 AM
markj updated this revision to Diff 56421.

Move v_user_wire_count out of vmmeter.

kib added inline comments.Apr 20 2019, 10:49 AM
sys/vm/vm_pageout.h
78 ↗(On Diff #56421)

It is strange to keep this limit int, while the counter is long.

markj updated this revision to Diff 56422.Apr 20 2019, 11:06 AM

Define vm_max_user_wired_pages as a u_long.

kib accepted this revision.Apr 20 2019, 11:12 AM
This revision is now accepted and ready to land.Apr 20 2019, 11:12 AM
markj updated this revision to Diff 56685.Apr 26 2019, 6:00 AM

Annotate the user wire counter with __exclusive_cache_line, requested by mjg.

This revision now requires review to proceed.Apr 26 2019, 6:00 AM
rgrimes added a subscriber: rgrimes.May 1 2019, 5:21 PM

Do we have to add _user_ to the variables, I only see a rename here, and not a second value used in place of where _users_ went in, or am I totally missing something?

rgrimes accepted this revision.May 1 2019, 5:23 PM

Do we have to add _user_ to the variables, I only see a rename here, and not a second value used in place of where _users_ went in, or am I totally missing something?

NVM, I did totally miss something, your still using the old variable to count something else. Ignore my comment.

This revision is now accepted and ready to land.May 1 2019, 5:23 PM
markj updated this revision to Diff 56966.May 2 2019, 3:27 PM
  • Fix a format string in vmstat.
  • Return ENOMEM if vmm fails to wire a bhyve VM because it hit the user-wiring limit.
  • Update userspace tests.
  • Fix various races and error-handling issues.
    • Many vm_map_wire() callers did not check for errors, so MAP_WIREFUTURE wirings could fail silently.
    • Some code performed a racy check for MAP_WIREFUTURE, so a concurrent munlockall() could return without having fully unwired the address space.
This revision now requires review to proceed.May 2 2019, 3:27 PM
ngie accepted this revision.May 7 2019, 12:49 AM

The user space and test changes all look great. I’ll defer to someone else (like kib@) about the VM changes.

Curious: why was the quantity changed from an int to a u_long quantity? Could a comment be added to note why the change is being made, and can it optionally be committed as a separate change, if the reasoning is for modern architecture scaling?

This revision is now accepted and ready to land.May 7 2019, 12:49 AM
kib accepted this revision.May 7 2019, 1:36 AM
markj added a comment.May 13 2019, 4:22 PM
In D19908#434693, @ngie wrote:

The user space and test changes all look great. I’ll defer to someone else (like kib@) about the VM changes.
Curious: why was the quantity changed from an int to a u_long quantity? Could a comment be added to note why the change is being made, and can it optionally be committed as a separate change, if the reasoning is for modern architecture scaling?

Sorry, I forgot to reply.

Right now the kernel typically uses u_int for counting pages. This stops working well on systems with close to PAGE_SIZE * 2^32 = 2^44 = 16TB of physical memory. To support that much RAM we will have to convert page counters to u_long. Since I am adding a new counter, it makes sense to preemptively use u_long, and because the new counter is compared with the user wiring limit, the limit's value should also become u_long. There is some sense in committing that part of the change separately to make MFCs easier, but I still prefer to commit the single diff. There are other considerations that prevent a straightforward MFC, so some care needs to be taken regardless.

This revision was automatically updated to reflect the committed changes.