sysctlmemlock: add a knob to disable and make interruptible
AbandonedPublic
Actions

Authored by rlibby on Apr 1 2021, 10:28 PM.

Details

Reviewers

jhb
mjg

Summary

Some systems, in particular those without swap, do not benefit from
limiting wiring of user pages for sysctl requests, and instead only
experience contention. Add a knob so that those systems may disable the
limit.

Also, make the lock request interruptible.

Test Plan

While toggling the new sysctl kern.sysctl_memlock a few times:

jot 0 | xargs -P $(sysctl -n hw.ncpu) -I %% sysctl -ax > /dev/null

and

jot 0 | xargs -P $(sysctl -n hw.ncpu) -I %% sysctl -x kern.malloc_stats > /dev/null

and confirmed whether the lock is seen with lockstat.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 38263
Build 35152: arc lint + arc unit

Event Timeline

rlibby created this revision.Apr 1 2021, 10:28 PM

Herald added a subscriber: imp. · View Herald TranscriptApr 1 2021, 10:28 PM

rlibby requested review of this revision.Apr 1 2021, 10:28 PM

Harbormaster completed remote builds in B38263: Diff 86720.Apr 1 2021, 10:28 PM

I disagree with the patch.

The current mechanism is crap and should probably be retired, anyone interested in ramping up kernel memory use has plenty of other ways to do it.

The limit is also weirdly small -- 4 times the page is not particularly high.

That said, if the limit is to be kept, I think the following steps should be performed:

bump the threshold higher than that. it may happen to clear contention for whatever sysctl you found problematic
replace the lock with an atomic counter -- to that end introduce a multi-megabyte upper limit and atomic_fetchadd into a global. It can use sleepq to block/wait for wakeups if over limit.
should the kernel get a facility to provide scalable inexact counting, the above will be ready to be converted to use it

Huh. The sysctl at hand produces almost 15MB of data on my test box, that's unsustainable as it is.

For this sucker, I think the code should be reworked to stop pre-wiring all that memory and instead output this in chunks.

In D29542#662307, @mjg wrote:

I disagree with the patch.

The current mechanism is crap and should probably be retired, anyone interested in ramping up kernel memory use has plenty of other ways to do it.

I don't disagree that the current mechanism is outdated.

The limit is also weirdly small -- 4 times the page is not particularly high.

Well, I read the ~4 pages (formerly ~1 page) as an amount of wiring we assume we can afford across all threads simultaneously. So, always freely granting a small number of pages makes sense to me, even with a better wiring limit mechanism.

That said, if the limit is to be kept, I think the following steps should be performed:

bump the threshold higher than that. it may happen to clear contention for whatever sysctl you found problematic

We ran into this with net.inet.tcp.pcblist, but we have other sysctls (out of tree) that have a lot of output. kern.malloc_stats is an in-tree sysctl that has a lot of output. For systems without swap, there's not really a threshold that makes sense -- they simply don't care.

replace the lock with an atomic counter -- to that end introduce a multi-megabyte upper limit and atomic_fetchadd into a global. It can use sleepq to block/wait for wakeups if over limit.

should the kernel get a facility to provide scalable inexact counting, the above will be ready to be converted to use it

I have a plan for this, but it's not ready yet. In short it is to extract the semaphore uma uses (zone_alloc_limit) and make it generally available.

However, even then, we won't care for the limit on a system without swap.

We can certainly keep our change private if you're against this.

Will work on a providing a better semaphore first.

Revision Contents
Changeset List

Path

Size

sys/

kern/

kern_sysctl.c

15 lines

Diff 86720

View Options

sysctlmemlock: add a knob to disable and make interruptibleAbandonedPublicActions