Page MenuHomeFreeBSD

Merge hold_count into wire_count.
ClosedPublic

Authored by markj on Feb 19 2019, 5:48 PM.

Details

Summary

Remove the vm_page_(un)hold() KPIs. Some KPIs still have "hold" in the
name, e.g., pmap_extract_and_hold(). I'm not sure how best to handle
these. I think "hold" is a better name than "wire" for a general
reference counter, so I'm not really happy renaming them to
s/hold/wire/.

In some places, after unholding we now have to check whether the page has
been removed from its object, in which case we free the page if its
wire_count has dropped to 0. In a future change I would prefer to move
this logic into vm_page_unwire_noq() to avoid memory leaks caused by
missing m->object == NULL checks.

Test Plan

Peter reported no problems with the patch.

The cxgbe changes will need some scrutiny, but I will see if there is any general feedback first.

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 25207
Build 23887: arc lint + arc unit

Event Timeline

markj edited reviewers, added: kib, alc, jeff; removed: manu.

Don't we need to split the kernel wire/user wire limits for this to be feasible, otherwise intensive io holding pages would deny user wiring requests.

sys/vm/vm_fault.c
1613–1615

Isn't it a good moment to introduce a VM KPI to wrap this op into something like vm_page_disown() ? (I do not insist on the name).

Linuxkpi put_page() is yet another place.

In D19247#413465, @kib wrote:

Don't we need to split the kernel wire/user wire limits for this to be feasible, otherwise intensive io holding pages would deny user wiring requests.

That is my next step, yes. I am not certain that the changes must be committed together since the user wire limit is already quite broken: it is typically exceeded for anyone using the ZFS ARC, and mlockall() does not check the user wire limit.

sys/vm/vm_fault.c
1613–1615

In the review description I suggested doing this in a future diff, but I can try it here. Why not handle it in vm_page_unwire_noq(), instead of introducing a new KPI? Note that vm_page_unwire() already checks whether m->object == NULL.

In D19247#413465, @kib wrote:

Don't we need to split the kernel wire/user wire limits for this to be feasible, otherwise intensive io holding pages would deny user wiring requests.

That is my next step, yes. I am not certain that the changes must be committed together since the user wire limit is already quite broken: it is typically exceeded for anyone using the ZFS ARC, and mlockall() does not check the user wire limit.

My point is that it might start breaking configurations which worked before, e.g. UFS/tmpfs-only installs.

sys/vm/vm_fault.c
1613–1615

But how would you detect that it is fine to free the page ? Checking VPO_UNMANAGED == 0 and object == NULL ? I am not sure that this would not break any caller.

In D19247#414032, @kib wrote:
In D19247#413465, @kib wrote:

Don't we need to split the kernel wire/user wire limits for this to be feasible, otherwise intensive io holding pages would deny user wiring requests.

That is my next step, yes. I am not certain that the changes must be committed together since the user wire limit is already quite broken: it is typically exceeded for anyone using the ZFS ARC, and mlockall() does not check the user wire limit.

My point is that it might start breaking configurations which worked before, e.g. UFS/tmpfs-only installs.

I will complete the kernel/user wiring split patch before attempting to commit this diff, then.

sys/vm/vm_fault.c
1613–1615

Yes, in that case I think it should be safe to free the page. Typically, upon the last unwire it is also the caller's responsibility to enqueue the unwired page. But this must not be done if m->object == NULL, as the page daemon does not handle it.

Some auditing is required, but I think it is the right direction. I would like struct vm_page to have a general-purpose reference count with KPIs that implicitly handle freeing the page. Moreover, such a change would help avoid bugs like that fixed by r329216.

  • Rebase after removal of sys/dev/drm
  • Remove vm_page_(un)hold() prototypes for now
In D19247#413465, @kib wrote:

Don't we need to split the kernel wire/user wire limits for this to be feasible, otherwise intensive io holding pages would deny user wiring requests.

This is implemented by r347532.

@np and/or @jhb do you have any thoughts on the changes to cxgbe? To summarize, I wish to eliminate vm_page_hold() and use wiring instead. In the past, the main distinction has been that hold/unhold operations have no effect on the page's position in the LRU queues, so those operations were significantly cheaper. However, wire/unwire operations are now cheaper than they used to be: wired pages are lazily dequeued, so vm_page_wire() never touches the page queue lock, and page queue operation batching means that unwiring is typically cheaper as well. If this is still significant enough to be a concern, I would be willing to add a function to unwire a page without requeuing it. Instead, the function would set PGA_DEQUEUE on the page, so that the page daemon requeues it if it is encountered during a page queue scan.

sys/i386/i386/pmap.c
5913–5914

I think this is dead code. Remove it as pre-cleanup ?

sys/vm/vm_glue.c
226–227

Pass VM_ALLOC_WIRE ?

240–241

The this wiring is not needed, but unwire in error case is due.

sys/vm/vm_page.h
814–815

May be kill this function ? It is quite confusing to leave it IMO. It is around dozen users of it, all in sys/vm.

The cxgbe changes are ok.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
458

It would seem more consistent to rename these functions as well now.

sys/vm/vm_page.h
814–815

I concur on killing it and/or renaming it to 'vm_page_wired'

markj marked 5 inline comments as done.
  • Address feedback.
sys/i386/i386/pmap.c
5913–5914

I think the whole routine is effectively dead code. It does not declare a return value. It certainly does not compile after the 4/4 split. Do you think it is worth preserving at all? If so I will remove the dead code, fix style and compilation issues.

sys/vm/vm_fault.c
1613–1615

I was being dismissive here, and the question requires more thought. I will explain why I chose to add the object == NULL checks.

Suppose we unhold a page with hold_count == 1. If PG_UNHOLDFREE is set, we must have m->object == NULL. The inverse statement is almost always true as well. Looking through the vm_page_remove() callers:

  • SGX and TTM drivers, cdev_pager_free_page(). The pages are fictitious and their lifetime must be managed by the driver.
  • We temporarily remove the page in the "optimized" CoW fault handler. This case represents a bug in my diff.
  • vm_object_collapse_scan(). A wired page is removed from the object; the unwiring thread is responsible for freeing it. Same semantics as PG_UNHOLDFREE.
  • vm_page_remove(). The page is moved between two objects with the page lock held, so the update to m->object is atomic with respect to the page lock.
  • vm_page_reclaim_run(). We require that m->wire_count == 0 and m->hold_count == 0 in order for reclamation to proceed.
  • vm_page_free_prep(). Sets PG_UNHOLDFREE if m->hold_count > 0.

Except for the CoW case mentioned above, I believe it is also true that if m->object == NULL in vm_page_unhold(), then PG_UNHOLDFREE must be set, so we will free the page. I will think about how to solve that one case.

sys/vm/vm_page.h
208

The reordering is to preserve 32-bit alignment of aflags.

814–815

I renamed it to vm_page_wired(). IMO it is helpful to have the routine; might be that we want to use a few bits of wire_count for other purposes. There are many places where we still check m->wire_count inline. I will update them in a separate diff to avoid making this one too big and hard to read.

Getting way above my pay grade

sys/i386/i386/pmap.c
5913–5914

Yes, exactly, I mean pmap_pid_dump() as a whole. It is covered by PMAP_DEBUG which does not appear anywhere else.

sys/vm/vm_fault.c
1613–1615

So why not keep PG_UNHOLDFREE around, perhaps renamed ? This would fix the CoW case and removes very obscure detail of the wiring.

sys/i386/i386/pmap.c
5913–5914

Ah, I thought you were referring only to the VM_MIN_KERNEL_ADDRESS fragment. I will simply remove this function, then.

sys/vm/vm_fault.c
1613–1615

Currently vm_page_free_prep() panics if the page is wired. Are you suggesting we change that semantic?

FWIW, I think PG_UNHOLDFREE is kind of hacky and complicates locking. Following this diff I wish to begin manipulating wire_count using atomics when the page is managed, as part of an effort to reduce the scope of the page lock.

Wire the page during an optimized copy-on-write fault. This ensures
that a thread concurrently unwiring the page will not attempt to
free it while it is temporarily removed from its object. Since this
also prevents the page daemon from scanning the page, we no longer
need to dequeue the page. (See r331128 and r331131.)

sys/vm/vm_fault.c
1613–1615

If you merge hold and wire, then the semantic should be mostly a superposition of both. We allow freeing held pages, which means that it is reasonable to allow free wired pages after the merge.

sys/vm/vm_fault.c
1613–1615

Well, my desired semantic is to permit vm_page_unwire() to free the page if there are no more references, similar to PG_UNHOLDFREE. However, my plan is to count the object reference in wire_count and have vm_page_unwire() free the page when the last reference is released, since this simplifies synchronization.

I will post some more patches following this direction, to make discussion easier.

sys/vm/vm_fault.c
1613–1615

See D20486. It is a large diff and still needs a bit of work, but it is stable for me. I will try to continue splitting off pieces into separate reviews; D20485 can be committed separately, for instance.

Basically, with that change, vm_page_wire() will free the page upon the last reference. In particular, it has essentially the same semantics as PG_UNHOLDFREE.

  • Remove vm_page_hold.9.
  • Rebase.

Are all of the prerequisite changes for this patch now committed?

In D19247#451188, @alc wrote:

Are all of the prerequisite changes for this patch now committed?

Yes, this applies cleanly to head.

BTW I am still slowly pulling pieces of D20486 out into separate patches. I've also found some races in that change which are not yet fixed in the revision that's in phabricator.

I will be very happy to see this change committed. It's something that I've wanted to see happen for many years.

sys/vm/vm_fault.c
1147–1155

We don't typically put a blank line before non-trivial comments that open a block.

This revision is now accepted and ready to land.Jul 4 2019, 7:54 PM
markj marked an inline comment as done.
  • Remove a blank line.
  • Rebase after r349649.
This revision now requires review to proceed.Jul 4 2019, 8:50 PM
This revision is now accepted and ready to land.Jul 4 2019, 8:55 PM