Details

Reviewers

alc
kib
dougm
jeff

Commits

rS360354: Use a single VM object for kernel stacks.

Summary

Previously we allocated a separate VM object for each kernel stack. Now
that we cache fully constructed kernel stacks, I can't see a reason for
keeping it that way: kernel stack allocation and free will avoid locking
the object most of the time, and kernel stack swapout/swapin are done by
dedicated threads, so there is very little concurrent access.

Add a global kstack_object, and use the stack KVA address to index into
the object.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

markj created this revision.Apr 17 2020, 5:26 PM

Harbormaster completed remote builds in B30563: Diff 70702.Apr 17 2020, 5:26 PM

markj added reviewers: alc, kib, dougm, jeff.Apr 17 2020, 5:29 PM

markj added inline comments.Apr 17 2020, 5:33 PM

sys/vm/vm_glue.c
324 ↗	(On Diff #70702)	If we allocate KVA from the system, isn't it safe to assume that it is already unmapped? Are we making that assumption elsewhere?
sys/vm/vm_swapout.c
574 ↗	(On Diff #70702)	If we're swapping in a stack, I think we can assume that all of the pages are invalid.

kib added inline comments.Apr 19 2020, 11:38 AM

sys/vm/vm_glue.c
324 ↗	(On Diff #70702)	I think it might be some kind of leftover from previous code for caching. Do we have a way to assert that the address is unmapped in KVA ? I cannot think of any.
sys/vm/vm_swapout.c
574 ↗	(On Diff #70702)	swapout unwires the pages, everything else is up to the pagedaemon. If pagedaemon did not reused the page, then it can be still valid.

markj added inline comments.Apr 20 2020, 1:56 PM

sys/vm/vm_glue.c
324 ↗	(On Diff #70702)	Can't we just call pmap_kextract() on each page in the guard page range and assert that the return value is 0?
sys/vm/vm_swapout.c
574 ↗	(On Diff #70702)	Thanks, somehow I missed that case.

Remove XXX comments.
Assert that guard pages are already unmapped.

Harbormaster completed remote builds in B30628: Diff 70803.Apr 20 2020, 1:56 PM

kib accepted this revision.Apr 20 2020, 2:43 PM

kib added inline comments.

sys/vm/vm_glue.c
324 ↗	(On Diff #70702)	That assumes that physical address 0 is invalid for general use. For instance, on amd64 it was not true until L1TF mitigation was added several months ago.

This revision is now accepted and ready to land.Apr 20 2020, 2:43 PM

alc added inline comments.Apr 20 2020, 5:21 PM

sys/vm/vm_glue.c
467 ↗	(On Diff #70803)	This might as well be defined as OBJT_SWAP.

alc added inline comments.Apr 20 2020, 5:28 PM

sys/vm/vm_glue.c
343 ↗	(On Diff #70803)	Given the defined size of the object the second argument should be "atop(ks - VM_MIN_KERNEL_ADDRESS) + i"

Make kstack_object a swap object.
Fix kstack_object indexing by computing indices relative to VM_MIN_KERNEL_ADDRESS instead of 0. This is consistent with kernel_object.
Restore the code that explicitly unmaps guard pages. We cannot use pmap_kextract() for now: kib notes that 0 is technically a valid physical address (some implementations return 0 when no mapping is found though), and some implementations panic if the address is not mapped.

This revision now requires review to proceed.Apr 20 2020, 10:28 PM

Harbormaster completed remote builds in B30640: Diff 70823.Apr 20 2020, 10:28 PM

Have you thought about whether there are any side-effects in swap behavior from using the same object? Might we run into clustering behavior?

sys/vm/vm_glue.c
428 ↗	(On Diff #70823)	WAITFAIL is to get back to the pref domain? This seems like it will be incredibly rare. Is it even worth handling? The overload of kstack policy is a bit odd but I guess that's the only way to do it at this layer.

In D24473#539261, @jeff wrote:

Have you thought about whether there are any side-effects in swap behavior from using the same object? Might we run into clustering behavior?

We place a gap of unmapped KVA at the "base" (i.e., the end) of each stack. This maps to holes at each end of each stack in the object's pindex space, which I think will prevent clustering of pageouts. That's not obviously a good thing: if multiple stacks are in the laundry queue, then we want to swap them out and I don't see the disadvantage in clustering that. wrt pageins, I think vm_thread_swapin() ensures that the swap pager will only read in the requested stack.

sys/vm/vm_glue.c
428 ↗	(On Diff #70823)	At first I did not bother, but it is not really much extra work. It is probably rare in general, but if we are swapping out kernel stacks, swapins might be happening concurrently with a memory shortage. I agree that specifying the policy this way is kind of awkward, I just didn't see a nicer way to do it.
431 ↗	(On Diff #70823)	Ideally vm_page_grab_pages() would try to ensure that the request is satisfied by contiguous pages, but we don't yet have a good way of specifying that. Reservations don't help here because of the guard page. The individual page allocations will go to the per-CPU caches, which don't do a good job of returning contiguous memory. In practice, with plenty of free memory we do end up with many contiguous stacks, but small amounts of fragmentation are common, e.g.: 0x8a2a95000: vnode 0x8a2a96000: anonymous 0x8a2a97000: kernel-stack 0x8a2a98000: vnode 0x8a2a99000: kernel-stack 0x8a2a9a000: vnode 0x8a2a9b000: kernel-stack 0x8a2a9c000: kernel-stack 0x8a2a9d000: vnode 0x8a2a9e000: vnode 0x8a2a9f000: anonymous 0x8a2aa0000: kernel-stack D14376 added a function that would be of use in this scenario.

kib accepted this revision.Apr 21 2020, 3:56 AM

This revision is now accepted and ready to land.Apr 21 2020, 3:56 AM

markj added inline comments.Apr 21 2020, 4:21 AM

sys/vm/vm_glue.c
431 ↗	(On Diff #70823)	It's not really correct to say that reservations don't help, it's just that they would never become fully populated. I think reservations could be used if we introduce a layer of indirection in the stackaddr->obj pindex mapping. Rather than using atop(stackaddr) as the pindex, we could use a unr allocator or a vmem arena or whatever to allocate virtual indices within kstack_object. The virtual index for a stack mapped at `a` would be `PHYS_TO_VM_PAGE(pmap_kextract(a))->pindex / kstack_pages`. Then there would be no holes in the pindex space, so the use of reservations for kstack_object wouldn't suffer from the problem above. This works especially well when kstack_pages is a power of 2, which it is by default.

Fix up the pindex calculation in vm_thread_swapout().

This revision now requires review to proceed.Apr 23 2020, 6:33 PM

Harbormaster completed remote builds in B30689: Diff 70921.Apr 23 2020, 6:33 PM

This revision was not accepted when it landed; it landed in state Needs Review.Apr 26 2020, 8:09 PM

Closed by commit rS360354: Use a single VM object for kernel stacks. (authored by markj). · Explain Why

This revision was automatically updated to reflect the committed changes.

markj added a commit: rS360354: Use a single VM object for kernel stacks..

Herald added a subscriber: imp. · View Herald TranscriptApr 26 2020, 8:09 PM

Use a single kernel stack object.
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 71019

head/sys/kern/kern_thread.c

head/sys/sys/proc.h

head/sys/vm/vm_extern.h

head/sys/vm/vm_glue.c

head/sys/vm/vm_kern.h

head/sys/vm/vm_swapout.c

Use a single kernel stack object.ClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 71019

head/sys/kern/kern_thread.c

head/sys/sys/proc.h

head/sys/vm/vm_extern.h

head/sys/vm/vm_glue.c

head/sys/vm/vm_kern.h

head/sys/vm/vm_swapout.c

Use a single kernel stack object.
ClosedPublic
Actions

Revision Contents
Changeset List