Page MenuHomeFreeBSD

Merge hold_count into wire_count.
ClosedPublic

Authored by markj on Feb 19 2019, 5:48 PM.

Details

Summary

Remove the vm_page_(un)hold() KPIs. Some KPIs still have "hold" in the
name, e.g., pmap_extract_and_hold(). I'm not sure how best to handle
these. I think "hold" is a better name than "wire" for a general
reference counter, so I'm not really happy renaming them to
s/hold/wire/.

In some places, after unholding we now have to check whether the page has
been removed from its object, in which case we free the page if its
wire_count has dropped to 0. In a future change I would prefer to move
this logic into vm_page_unwire_noq() to avoid memory leaks caused by
missing m->object == NULL checks.

Test Plan

Peter reported no problems with the patch.

The cxgbe changes will need some scrutiny, but I will see if there is any general feedback first.

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 25207
Build 23887: arc lint + arc unit

Event Timeline

markj created this revision.Feb 19 2019, 5:48 PM
markj edited the test plan for this revision. (Show Details)Feb 19 2019, 5:49 PM
markj edited reviewers, added: kib, alc, jeff; removed: manu.
kib added a comment.Feb 24 2019, 9:35 AM

Don't we need to split the kernel wire/user wire limits for this to be feasible, otherwise intensive io holding pages would deny user wiring requests.

sys/vm/vm_fault.c
1613–1615

Isn't it a good moment to introduce a VM KPI to wrap this op into something like vm_page_disown() ? (I do not insist on the name).

Linuxkpi put_page() is yet another place.

markj added a comment.Feb 25 2019, 3:47 PM
In D19247#413465, @kib wrote:

Don't we need to split the kernel wire/user wire limits for this to be feasible, otherwise intensive io holding pages would deny user wiring requests.

That is my next step, yes. I am not certain that the changes must be committed together since the user wire limit is already quite broken: it is typically exceeded for anyone using the ZFS ARC, and mlockall() does not check the user wire limit.

sys/vm/vm_fault.c
1613–1615

In the review description I suggested doing this in a future diff, but I can try it here. Why not handle it in vm_page_unwire_noq(), instead of introducing a new KPI? Note that vm_page_unwire() already checks whether m->object == NULL.

markj edited the test plan for this revision. (Show Details)Feb 25 2019, 3:48 PM
kib added a comment.Feb 25 2019, 4:14 PM
In D19247#413465, @kib wrote:

Don't we need to split the kernel wire/user wire limits for this to be feasible, otherwise intensive io holding pages would deny user wiring requests.

That is my next step, yes. I am not certain that the changes must be committed together since the user wire limit is already quite broken: it is typically exceeded for anyone using the ZFS ARC, and mlockall() does not check the user wire limit.

My point is that it might start breaking configurations which worked before, e.g. UFS/tmpfs-only installs.

sys/vm/vm_fault.c
1613–1615

But how would you detect that it is fine to free the page ? Checking VPO_UNMANAGED == 0 and object == NULL ? I am not sure that this would not break any caller.

markj added a comment.Feb 25 2019, 4:24 PM
In D19247#414032, @kib wrote:
In D19247#413465, @kib wrote:

Don't we need to split the kernel wire/user wire limits for this to be feasible, otherwise intensive io holding pages would deny user wiring requests.

That is my next step, yes. I am not certain that the changes must be committed together since the user wire limit is already quite broken: it is typically exceeded for anyone using the ZFS ARC, and mlockall() does not check the user wire limit.

My point is that it might start breaking configurations which worked before, e.g. UFS/tmpfs-only installs.

I will complete the kernel/user wiring split patch before attempting to commit this diff, then.

sys/vm/vm_fault.c
1613–1615

Yes, in that case I think it should be safe to free the page. Typically, upon the last unwire it is also the caller's responsibility to enqueue the unwired page. But this must not be done if m->object == NULL, as the page daemon does not handle it.

Some auditing is required, but I think it is the right direction. I would like struct vm_page to have a general-purpose reference count with KPIs that implicitly handle freeing the page. Moreover, such a change would help avoid bugs like that fixed by r329216.

markj updated this revision to Diff 54487.Feb 27 2019, 5:13 PM
  • Rebase after removal of sys/dev/drm
  • Remove vm_page_(un)hold() prototypes for now
markj updated this revision to Diff 57381.May 14 2019, 2:27 AM

Rebase.

markj added subscribers: np, jhb.May 14 2019, 2:37 AM
In D19247#413465, @kib wrote:

Don't we need to split the kernel wire/user wire limits for this to be feasible, otherwise intensive io holding pages would deny user wiring requests.

This is implemented by r347532.

@np and/or @jhb do you have any thoughts on the changes to cxgbe? To summarize, I wish to eliminate vm_page_hold() and use wiring instead. In the past, the main distinction has been that hold/unhold operations have no effect on the page's position in the LRU queues, so those operations were significantly cheaper. However, wire/unwire operations are now cheaper than they used to be: wired pages are lazily dequeued, so vm_page_wire() never touches the page queue lock, and page queue operation batching means that unwiring is typically cheaper as well. If this is still significant enough to be a concern, I would be willing to add a function to unwire a page without requeuing it. Instead, the function would set PGA_DEQUEUE on the page, so that the page daemon requeues it if it is encountered during a page queue scan.

kib added inline comments.May 20 2019, 8:17 PM
sys/i386/i386/pmap.c
5913–5914

I think this is dead code. Remove it as pre-cleanup ?

sys/vm/vm_glue.c
226–227

Pass VM_ALLOC_WIRE ?

240–241

The this wiring is not needed, but unwire in error case is due.

sys/vm/vm_page.h
814–815

May be kill this function ? It is quite confusing to leave it IMO. It is around dozen users of it, all in sys/vm.

jhb added a comment.May 20 2019, 10:15 PM

The cxgbe changes are ok.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
458

It would seem more consistent to rename these functions as well now.

sys/vm/vm_page.h
814–815

I concur on killing it and/or renaming it to 'vm_page_wired'

markj updated this revision to Diff 57872.May 24 2019, 11:05 PM
markj marked 5 inline comments as done.
  • Address feedback.
markj added inline comments.May 24 2019, 11:05 PM
sys/i386/i386/pmap.c
5913–5914

I think the whole routine is effectively dead code. It does not declare a return value. It certainly does not compile after the 4/4 split. Do you think it is worth preserving at all? If so I will remove the dead code, fix style and compilation issues.

sys/vm/vm_fault.c
1613–1615

I was being dismissive here, and the question requires more thought. I will explain why I chose to add the object == NULL checks.

Suppose we unhold a page with hold_count == 1. If PG_UNHOLDFREE is set, we must have m->object == NULL. The inverse statement is almost always true as well. Looking through the vm_page_remove() callers:

  • SGX and TTM drivers, cdev_pager_free_page(). The pages are fictitious and their lifetime must be managed by the driver.
  • We temporarily remove the page in the "optimized" CoW fault handler. This case represents a bug in my diff.
  • vm_object_collapse_scan(). A wired page is removed from the object; the unwiring thread is responsible for freeing it. Same semantics as PG_UNHOLDFREE.
  • vm_page_remove(). The page is moved between two objects with the page lock held, so the update to m->object is atomic with respect to the page lock.
  • vm_page_reclaim_run(). We require that m->wire_count == 0 and m->hold_count == 0 in order for reclamation to proceed.
  • vm_page_free_prep(). Sets PG_UNHOLDFREE if m->hold_count > 0.

Except for the CoW case mentioned above, I believe it is also true that if m->object == NULL in vm_page_unhold(), then PG_UNHOLDFREE must be set, so we will free the page. I will think about how to solve that one case.

sys/vm/vm_page.h
209

The reordering is to preserve 32-bit alignment of aflags.

814–815

I renamed it to vm_page_wired(). IMO it is helpful to have the routine; might be that we want to use a few bits of wire_count for other purposes. There are many places where we still check m->wire_count inline. I will update them in a separate diff to avoid making this one too big and hard to read.

rgrimes resigned from this revision.May 24 2019, 11:32 PM

Getting way above my pay grade

kib added inline comments.May 25 2019, 12:25 PM
sys/i386/i386/pmap.c
5913–5914

Yes, exactly, I mean pmap_pid_dump() as a whole. It is covered by PMAP_DEBUG which does not appear anywhere else.

sys/vm/vm_fault.c
1613–1615

So why not keep PG_UNHOLDFREE around, perhaps renamed ? This would fix the CoW case and removes very obscure detail of the wiring.

markj added inline comments.May 25 2019, 11:23 PM
sys/i386/i386/pmap.c
5913–5914

Ah, I thought you were referring only to the VM_MIN_KERNEL_ADDRESS fragment. I will simply remove this function, then.

sys/vm/vm_fault.c
1613–1615

Currently vm_page_free_prep() panics if the page is wired. Are you suggesting we change that semantic?

FWIW, I think PG_UNHOLDFREE is kind of hacky and complicates locking. Following this diff I wish to begin manipulating wire_count using atomics when the page is managed, as part of an effort to reduce the scope of the page lock.

markj updated this revision to Diff 57904.May 26 2019, 12:16 AM

Wire the page during an optimized copy-on-write fault. This ensures
that a thread concurrently unwiring the page will not attempt to
free it while it is temporarily removed from its object. Since this
also prevents the page daemon from scanning the page, we no longer
need to dequeue the page. (See r331128 and r331131.)

markj updated this revision to Diff 57905.May 26 2019, 12:18 AM

Rebase.

kib added inline comments.May 26 2019, 9:20 AM
sys/vm/vm_fault.c
1613–1615

If you merge hold and wire, then the semantic should be mostly a superposition of both. We allow freeing held pages, which means that it is reasonable to allow free wired pages after the merge.

markj added inline comments.May 26 2019, 3:42 PM
sys/vm/vm_fault.c
1613–1615

Well, my desired semantic is to permit vm_page_unwire() to free the page if there are no more references, similar to PG_UNHOLDFREE. However, my plan is to count the object reference in wire_count and have vm_page_unwire() free the page when the last reference is released, since this simplifies synchronization.

I will post some more patches following this direction, to make discussion easier.

markj added inline comments.May 31 2019, 9:25 PM
sys/vm/vm_fault.c
1613–1615

See D20486. It is a large diff and still needs a bit of work, but it is stable for me. I will try to continue splitting off pieces into separate reviews; D20485 can be committed separately, for instance.

Basically, with that change, vm_page_wire() will free the page upon the last reference. In particular, it has essentially the same semantics as PG_UNHOLDFREE.

markj updated this revision to Diff 58171.Jun 2 2019, 1:57 AM

Rebase.

markj updated this revision to Diff 58386.Jun 7 2019, 10:15 PM

Rebase.

markj updated this revision to Diff 59011.Jun 25 2019, 4:06 PM
  • Remove vm_page_hold.9.
  • Rebase.
alc added a comment.Jul 2 2019, 7:25 PM

Are all of the prerequisite changes for this patch now committed?

markj added a comment.Jul 2 2019, 7:27 PM
In D19247#451188, @alc wrote:

Are all of the prerequisite changes for this patch now committed?

Yes, this applies cleanly to head.

BTW I am still slowly pulling pieces of D20486 out into separate patches. I've also found some races in that change which are not yet fixed in the revision that's in phabricator.

alc accepted this revision.Jul 4 2019, 7:54 PM

I will be very happy to see this change committed. It's something that I've wanted to see happen for many years.

sys/vm/vm_fault.c
1147–1155

We don't typically put a blank line before non-trivial comments that open a block.

This revision is now accepted and ready to land.Jul 4 2019, 7:54 PM
markj updated this revision to Diff 59425.Jul 4 2019, 8:50 PM
markj marked an inline comment as done.
  • Remove a blank line.
  • Rebase after r349649.
This revision now requires review to proceed.Jul 4 2019, 8:50 PM
alc accepted this revision.Jul 4 2019, 8:55 PM
This revision is now accepted and ready to land.Jul 4 2019, 8:55 PM
kib accepted this revision.Jul 5 2019, 7:16 PM
alc added a comment.Jul 19 2019, 5:52 PM

Shouldn't this be closed?

markj closed this revision.Jul 19 2019, 6:08 PM

Committed in r349846.