Page MenuHomeFreeBSD

sysctlmemlock: add a knob to disable and make interruptible
Needs ReviewPublic

Authored by rlibby on Thu, Apr 1, 10:28 PM.

Details

Reviewers
jhb
mjg
Summary

Some systems, in particular those without swap, do not benefit from
limiting wiring of user pages for sysctl requests, and instead only
experience contention. Add a knob so that those systems may disable the
limit.

Also, make the lock request interruptible.

Test Plan

While toggling the new sysctl kern.sysctl_memlock a few times:

jot 0 | xargs -P $(sysctl -n hw.ncpu) -I %% sysctl -ax > /dev/null

and

jot 0 | xargs -P $(sysctl -n hw.ncpu) -I %% sysctl -x kern.malloc_stats > /dev/null

and confirmed whether the lock is seen with lockstat.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 38263
Build 35152: arc lint + arc unit

Event Timeline

I disagree with the patch.

The current mechanism is crap and should probably be retired, anyone interested in ramping up kernel memory use has plenty of other ways to do it.

The limit is also weirdly small -- 4 times the page is not particularly high.

That said, if the limit is to be kept, I think the following steps should be performed:

  1. bump the threshold higher than that. it may happen to clear contention for whatever sysctl you found problematic
  2. replace the lock with an atomic counter -- to that end introduce a multi-megabyte upper limit and atomic_fetchadd into a global. It can use sleepq to block/wait for wakeups if over limit.
  3. should the kernel get a facility to provide scalable inexact counting, the above will be ready to be converted to use it

Huh. The sysctl at hand produces almost 15MB of data on my test box, that's unsustainable as it is.

For this sucker, I think the code should be reworked to stop pre-wiring all that memory and instead output this in chunks.

In D29542#662307, @mjg wrote:

I disagree with the patch.

The current mechanism is crap and should probably be retired, anyone interested in ramping up kernel memory use has plenty of other ways to do it.

I don't disagree that the current mechanism is outdated.

The limit is also weirdly small -- 4 times the page is not particularly high.

Well, I read the ~4 pages (formerly ~1 page) as an amount of wiring we assume we can afford across all threads simultaneously. So, always freely granting a small number of pages makes sense to me, even with a better wiring limit mechanism.

That said, if the limit is to be kept, I think the following steps should be performed:

  1. bump the threshold higher than that. it may happen to clear contention for whatever sysctl you found problematic

We ran into this with net.inet.tcp.pcblist, but we have other sysctls (out of tree) that have a lot of output. kern.malloc_stats is an in-tree sysctl that has a lot of output. For systems without swap, there's not really a threshold that makes sense -- they simply don't care.

  1. replace the lock with an atomic counter -- to that end introduce a multi-megabyte upper limit and atomic_fetchadd into a global. It can use sleepq to block/wait for wakeups if over limit.
  2. should the kernel get a facility to provide scalable inexact counting, the above will be ready to be converted to use it

I have a plan for this, but it's not ready yet. In short it is to extract the semaphore uma uses (zone_alloc_limit) and make it generally available.

However, even then, we won't care for the limit on a system without swap.

We can certainly keep our change private if you're against this.