Pages are user-wired by mlock(2) and indirectly by mlockall(2).
User-wired pages carry a reference in m->wire_count to ensure that they
are not freed by the page daemon; unlike kernel-wired pages they also
have a flag set in m->flags.
The main motivation for the change is to provide accounting of
user-wired pages. mlock(2) currently fails if the total number of
wired pages exceeds vm_page_wired_max; with this change only the number
of user-wired pages is compared against this limit. I also "fixed"
mlockall(2) to respect this limit, as documented in its man page. In
particular, mlockall(MCL_CURRENT) uses a racy check to determine if the
corresponding wiring would exceed the user-wired limit, and wirings
triggered by the MAP_WIREFUTURE flag are subject to the same limit.
The changes to make mlockall(2) respect the limit should perhaps be
committed separately since they have the potential to introduce
regressions. In this change, old_mlock is extended to disable the
global limit as well.
Only managed physical pages are counted in the user wire count. This is
for two reasons: first, unmanaged or fictitious pages are unevictable
regardless of whether they are user-wired, so logically they should not
be counted against the limit. Second, we use pmap_page_wired_mappings()
to determine whether all user wirings are removed before clearing
PG_USER_WIRED; this does not work for unmanaged pages.
A couple of new KPIs are introduced: vm_page_wire_user() and
vm_page_unwire_user(). These respectively set and clear PG_USER_WIRED.
An alternative would be to account user wirings in the pmap layer, but
that approach is more complicated and I don't see any real benefits.
There are some corner cases in this diff:
- In sys_mlockall() we compare map->size with the global limit, but this check is racy since the map size may change before we call vm_map_wire(). The per-process RLIMIT_MEMLOCK check has the same race.
- Suppose a range of VAs is user-wired, and then kernel-wired, e.g., by vslock(). Suppose then that the range is user-unwired. For m in the range, pmap_page_wired_mappings(m) > 0 even though we removed the last user wiring, so v_user_wire_count will not be decremented until the kernel wiring is removed.
- As mentioned above, unmanaged and ficitious pages are not counted towards the total number of user-wired pages. However, when checking the size of a mapping against the system limit, we do not exclude unmanaged mappings.
I believe these cases are not likely to be problematic in practice.
The diff does not update any documentation yet; I will work on that if
there are no major objections to the approach taken here.