Page MenuHomeFreeBSD

sysctl: Do not serialize requests when running as root
ClosedPublic

Authored by markj on Nov 29 2024, 5:11 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 15, 10:53 PM
Unknown Object (File)
Fri, Dec 27, 8:38 AM
Unknown Object (File)
Fri, Dec 27, 5:36 AM
Unknown Object (File)
Thu, Dec 26, 12:39 PM
Unknown Object (File)
Thu, Dec 26, 11:44 AM
Unknown Object (File)
Dec 21 2024, 7:27 PM
Unknown Object (File)
Dec 19 2024, 6:08 AM
Unknown Object (File)
Dec 11 2024, 7:45 AM
Subscribers

Details

Summary

Bugs or unexpected behaviour can cause a user thread to block in a
sysctl handler for a long time. "procstat -kka" is the most useful tool
to see why this might happen, but it can block on sysctlmemlock too.

Since the purpose of this lock is merely to ensure userspace can't wire
too much memory, don't require it for requests from privileged threads.

PR: 282994

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj requested review of this revision.Nov 29 2024, 5:11 PM

May be a per-uid sysctl memlock limit is better.
BTW, why don't we account sysctl wired memory into the user wire limit?

This revision is now accepted and ready to land.Nov 29 2024, 5:32 PM
In D47842#1090650, @kib wrote:

May be a per-uid sysctl memlock limit is better.
BTW, why don't we account sysctl wired memory into the user wire limit?

It used to be that all user sysctl requests were serialized by a global lock, so it probably didn't matter very much since the wiring is always transient. Later, the sysctlmemlock was added, and we started serializing only those requests where oldsz > PAGE_SIZE. Perhaps we should modify vslock() to start counting these wirings, but I am not sure it is useful.

In D47842#1090650, @kib wrote:

May be a per-uid sysctl memlock limit is better.
BTW, why don't we account sysctl wired memory into the user wire limit?

It used to be that all user sysctl requests were serialized by a global lock, so it probably didn't matter very much since the wiring is always transient. Later, the sysctlmemlock was added, and we started serializing only those requests where oldsz > PAGE_SIZE. Perhaps we should modify vslock() to start counting these wirings, but I am not sure it is useful.

If we start counting, we can get rid of the global sysctl lock then?

The thing that makes this a bit odd in terms of the memory limit is that userspace isn't the one requesting to wire the memory. The kernel does by calling sysctl_wire_old_buffer in an individual handler. It would be nice to be smarter about this, e.g. don't acquire the lock until sysctl_wire_old_buffer is actually called? OTOH, one might ask why we really need the lock at all? I guess the worry is that userspace can DOS the kernel by making lots of requests with large buffers for nodes that are known to use sysctl_wire_old_buffer? One could implement a per-user limit of sorts by using a mtxpool-like thing but with sx locks and hash the uid to pick the specific sx lock to use if we really cared. Exempting root is probably sufficient in practice.

In D47842#1090897, @jhb wrote:

The thing that makes this a bit odd in terms of the memory limit is that userspace isn't the one requesting to wire the memory. The kernel does by calling sysctl_wire_old_buffer in an individual handler. It would be nice to be smarter about this, e.g. don't acquire the lock until sysctl_wire_old_buffer is actually called? OTOH, one might ask why we really need the lock at all? I guess the worry is that userspace can DOS the kernel by making lots of requests with large buffers for nodes that are known to use sysctl_wire_old_buffer?

I think that was the intent, yes. See commit 3e829b18d6d3f. The intent was to use a shared lock to synchronize sysctl tree lookups, so as a hedge against a DOS, "large" requests continued to be serialized. At some point it was found that PAGE_SIZE was too small, so the threshold was bumped.

One could implement a per-user limit of sorts by using a mtxpool-like thing but with sx locks and hash the uid to pick the specific sx lock to use if we really cared. Exempting root is probably sufficient in practice.

Yes, I'd rather keep things simple for now, absent some use-case which demonstrates that the current mechanism is too primitive.