Page MenuHomeFreeBSD

Provide accounting of user-wired pages.
ClosedPublic

Authored by markj on Apr 14 2019, 8:27 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Jan 5, 12:50 PM
Unknown Object (File)
Sun, Jan 5, 12:33 PM
Unknown Object (File)
Fri, Jan 3, 9:00 PM
Unknown Object (File)
Fri, Jan 3, 1:26 PM
Unknown Object (File)
Dec 6 2024, 6:05 PM
Unknown Object (File)
Nov 26 2024, 1:46 AM
Unknown Object (File)
Nov 24 2024, 9:56 PM
Unknown Object (File)
Nov 19 2024, 5:34 PM

Details

Summary

This change adds a new vm counter, v_user_wire_count, which counts the
number of user-wired virtual pages. v_wire_count continues to count
user-wired physical pages. The motivation is to implement a system
limit on user-wired pages, replacing the existing limit applied to
mlock(2).

Like per-process RLIMIT_MEMLOCK, accounting is done at the VM map layer,
so pages wired into multiple vm_map_entries are counted multiple times.
I initially tried to provide accounting at the physical page layer, but
this turns out to be quite complicated because there is no reliable
method of counting the number of user-wired mappings of a given physical
page. In particular, there are several mechanisms that cause a
user-wired page to be unmapped:

  • msync(MS_INVALIDATE)
  • mprotect(PROT_NONE)
  • truncation of the backing vnode or shm object

Thus, when unwiring a physical page, it is not really possible to
determine whether other user wirings exist without some additional
information in the vm_page structure. As a step toward merging
hold_count and wire_count, I therefore followed kib's suggestion of
simply counting virtual pages.

With this change the vm_page_max_wired limit is renamed to
vm_page_max_user_wired and the check is moved into vm_map_wire(). This
ensures that the limit is applied to all user wirings; previously it was
not applied to, e.g., mlockall(2).

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 23785
Build 22731: arc lint + arc unit

Event Timeline

sys/sys/vmmeter.h
131

Why not declare it u_long from tne start ? vm_page_array_size is long.

sys/vm/vm_pageout.c
199

bde' preferred style is to have CTLFLAG_XXX on the previous line, and the variable address starting the first continuation line.

mjg added inline comments.
sys/sys/vmmeter.h
131

Any reason to put this in vmmeter? Currently the struct is read-only with everything initialized at boot time (also vm_cnt is annotated with read_mostly which "discourages" writing). Perhaps a user-visible vmmeter struct can be constructed instead and the kernel for its own uses would keep the new atomic counter elsewhere (e.g. a global annotated with exclusive_cache_line).

markj marked 2 inline comments as done.

Handle kib's notes.

sys/sys/vmmeter.h
131

I don't think there's any problem with moving it out, though I expect that it will be updated fairly rarely in general (so I think using a global atomic counter is fine). I'll try moving it out.

markj marked an inline comment as done.

Move v_user_wire_count out of vmmeter.

sys/vm/vm_pageout.h
78

It is strange to keep this limit int, while the counter is long.

Define vm_max_user_wired_pages as a u_long.

This revision is now accepted and ready to land.Apr 20 2019, 11:12 AM

Annotate the user wire counter with __exclusive_cache_line, requested by mjg.

This revision now requires review to proceed.Apr 26 2019, 6:00 AM

Do we have to add _user_ to the variables, I only see a rename here, and not a second value used in place of where _users_ went in, or am I totally missing something?

Do we have to add _user_ to the variables, I only see a rename here, and not a second value used in place of where _users_ went in, or am I totally missing something?

NVM, I did totally miss something, your still using the old variable to count something else. Ignore my comment.

This revision is now accepted and ready to land.May 1 2019, 5:23 PM
  • Fix a format string in vmstat.
  • Return ENOMEM if vmm fails to wire a bhyve VM because it hit the user-wiring limit.
  • Update userspace tests.
  • Fix various races and error-handling issues.
    • Many vm_map_wire() callers did not check for errors, so MAP_WIREFUTURE wirings could fail silently.
    • Some code performed a racy check for MAP_WIREFUTURE, so a concurrent munlockall() could return without having fully unwired the address space.
This revision now requires review to proceed.May 2 2019, 3:27 PM

The user space and test changes all look great. I’ll defer to someone else (like kib@) about the VM changes.

Curious: why was the quantity changed from an int to a u_long quantity? Could a comment be added to note why the change is being made, and can it optionally be committed as a separate change, if the reasoning is for modern architecture scaling?

This revision is now accepted and ready to land.May 7 2019, 12:49 AM
In D19908#434693, @ngie wrote:

The user space and test changes all look great. I’ll defer to someone else (like kib@) about the VM changes.

Curious: why was the quantity changed from an int to a u_long quantity? Could a comment be added to note why the change is being made, and can it optionally be committed as a separate change, if the reasoning is for modern architecture scaling?

Sorry, I forgot to reply.

Right now the kernel typically uses u_int for counting pages. This stops working well on systems with close to PAGE_SIZE * 2^32 = 2^44 = 16TB of physical memory. To support that much RAM we will have to convert page counters to u_long. Since I am adding a new counter, it makes sense to preemptively use u_long, and because the new counter is compared with the user wiring limit, the limit's value should also become u_long. There is some sense in committing that part of the change separately to make MFCs easier, but I still prefer to commit the single diff. There are other considerations that prevent a straightforward MFC, so some care needs to be taken regardless.

This revision was automatically updated to reflect the committed changes.