Page MenuHomeFreeBSD

kern_resource.c: Track per-UID resource limits in 'struct uidinfo'
Needs RevisionPublic

Authored by bnovkov on Feb 1 2026, 5:22 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, May 17, 11:41 PM
Unknown Object (File)
Sun, May 17, 11:40 PM
Unknown Object (File)
Sun, May 17, 11:40 PM
Unknown Object (File)
Sun, May 17, 11:19 PM
Unknown Object (File)
Thu, May 14, 6:35 PM
Unknown Object (File)
Wed, May 13, 4:39 PM
Unknown Object (File)
Mon, May 11, 2:32 PM
Unknown Object (File)
Tue, May 5, 3:18 PM
Subscribers

Details

Reviewers
markj
olce
jilles
Summary

Certain getrlimit(2) resource limits are meant to limit resource
consumption on a per-UID basis and use the 'uidinfo' structure
to track the total resource consumption for a given UID.
However, the limits themselves are still stored in 'struct proc'
which makes it impossible to propagate a newly modified limit value to
all processes owned by the UID whose limit we wish to change.

This change addresses this issue by adding a new 'struct plimit'
member to 'struct uidinfo' and uses it to store resource limit values
for per-UID limits. The 'lim_rlimit' function will now distinguish
between per-process and per-UID limits and return the limit value
from the appropriate location.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

des added inline comments.
sys/sys/resourcevar.h
133

Does this protect the pointer or the content of the structure? I don't see a need for the former, and if the latter, why is it not inside the struct and why do other users of struct plimit not use a lock?

sys/sys/resourcevar.h
133

This protects the structure's contents.
IIRC struct plimit is only used as part of struct proc, and the regular setrlimit(2) syscall uses PROC_LOCK to protect the limits.

sys/sys/resourcevar.h
133

The struct plimit in struct proc does not need locking as it is only ever accessed by the thread it applies to. I believe this is also the case here since lim_rlimit() asserts that its thread argument is curthread.

With the existing rlimit functionality specified by POSIX, I can, as a user, defend against a forkbomb or prevent forks/kqueues entirely in a particular process subtree. A system administrator can also set resource limits which will be applied at login or service start. It seems like that setting limits on a particular process subtree will not work any more.

Also, can you point at more rationale (e.g. mailing list posts) why that functionality is not good enough?

With the existing rlimit functionality specified by POSIX, I can, as a user, defend against a forkbomb or prevent forks/kqueues entirely in a particular process subtree. A system administrator can also set resource limits which will be applied at login or service start.

Per-UID resource limits are a FreeBSD-specific, non-POSIX extension that is meant to track and limit total resource usage on a per-UID basis. However, at the moment only the total resource usage is actually tracked per-UID while the limits themselves remain tied to the per-process structure.
This puts the whole feature into an awkward spot since the setrlimit(2) manual page implies that such limits apply to all processes owned by a specific UID.
In reality the limit actually remains tied to the individual processes and changing it won't apply the new limit to other processes owned by that uid, which is what one would expect after reading the manpage.
This ambiguity has been causing issues in other parts of the kernel as well. Please take a look at the XXX comment in kern/kern_prot.c's _proc_set_cred function for more details.

jilles requested changes to this revision.Sun, May 17, 2:25 PM

With the existing rlimit functionality specified by POSIX, I can, as a user, defend against a forkbomb or prevent forks/kqueues entirely in a particular process subtree. A system administrator can also set resource limits which will be applied at login or service start.

Per-UID resource limits are a FreeBSD-specific, non-POSIX extension that is meant to track and limit total resource usage on a per-UID basis. However, at the moment only the total resource usage is actually tracked per-UID while the limits themselves remain tied to the per-process structure.

These limits are indeed non-POSIX, but Linux has similar behaviour for the RLIMIT_MEMLOCK, RLIMIT_MSGQUEUE, RLIMIT_NPROC and RLIMIT_SIGPENDING limits.

This puts the whole feature into an awkward spot since the setrlimit(2) manual page implies that such limits apply to all processes owned by a specific UID.
In reality the limit actually remains tied to the individual processes and changing it won't apply the new limit to other processes owned by that uid, which is what one would expect after reading the manpage.

I would say this should be fixed by clarifying the man page. For me, it's clear that all rlimits only apply to the current process and its children, while the tracking is per real UID for some of the limits.

This ambiguity has been causing issues in other parts of the kernel as well. Please take a look at the XXX comment in kern/kern_prot.c's _proc_set_cred function for more details.

Right, it's probably appropriate to prevent, for example, users from logging in when their real UID's RLIMIT_NPROC is being exceeded or would be exceeded by the new process. However, I think this should be done in such a way that the API to userland does not change. For example, Linux sets a flag on setuid(2) when the real UID's RLIMIT_NPROC is exceeded, and if the flag is set and the user is still above the limit, execve(2) fails.

I'm surprised that this is proposed after all those years, without a clear objective such as changes in the general computing environment or Linux compatibility.

There is also rctl(8) which already exists and can enforce various limits in the way being proposed here. Also, it is much more extensive than traditional rlimits.

This revision now requires changes to proceed.Sun, May 17, 2:25 PM

I would say this should be fixed by clarifying the man page. For me, it's clear that all rlimits only apply to the current process and its children, while the tracking is per real UID for some of the limits.

I disagree, the man page can only paper over the fundamental problem that arises from mixing limits with different scopes in a system call that only operates on one of them.
Let's say that you're faced with a situation where you suddenly need to drastically lower a per-UID limit for a user and have it take effect immediately. How are you going to accomplish this with the way we currently handle per-UID limits?
Changes to the limit value in login.conf will only apply to new sessions, which is a problem and a IMO big flaw in our current implementation.
Having a mechanism for enforcing per-UID limits but not being able to change the limit's value and have it take effect for that UID immediately makes no sense.

This ambiguity has been causing issues in other parts of the kernel as well. Please take a look at the XXX comment in kern/kern_prot.c's _proc_set_cred function for more details.

Right, it's probably appropriate to prevent, for example, users from logging in when their real UID's RLIMIT_NPROC is being exceeded or would be exceeded by the new process.

I agree.

However, I think this should be done in such a way that the API to userland does not change. For example, Linux sets a flag on setuid(2) when the real UID's RLIMIT_NPROC is exceeded, and if the flag is set and the user is still above the limit, execve(2) fails.

That seems like a very roundabout way of accomplishing the same thing that I'm proposing here and in D55039. Furthermore, taking this approach would mean that each kernel subsystem using a certain limit would have to be changed in a similar manner which is a lot of extra work.

There is also rctl(8) which already exists and can enforce various limits in the way being proposed here. Also, it is much more extensive than traditional rlimits.

rctl(4) is an optional driver that can be omitted from the system or disabled with kern.racct.enable, in which case we are left without a mechanism that prevents a bad actor from exhausting kernel memory by repeatedly allocating kernel objects through a system call or other user-facing APIs.
I assume that is also the reason why these per-UID limits exist in setrlimit(2) in the first place.

I suggest discussing this on a mailing list.

I would say this should be fixed by clarifying the man page. For me, it's clear that all rlimits only apply to the current process and its children, while the tracking is per real UID for some of the limits.

I disagree, the man page can only paper over the fundamental problem that arises from mixing limits with different scopes in a system call that only operates on one of them.
Let's say that you're faced with a situation where you suddenly need to drastically lower a per-UID limit for a user and have it take effect immediately. How are you going to accomplish this with the way we currently handle per-UID limits?
Changes to the limit value in login.conf will only apply to new sessions, which is a problem and a IMO big flaw in our current implementation.
Having a mechanism for enforcing per-UID limits but not being able to change the limit's value and have it take effect for that UID immediately makes no sense.

Personally, I think this need is somewhat unusual, and sysctls like kern.maxprocperuid and rctl(4) can deal with it. Running many different applications on a single OS image is becoming less common. These would typically go in different containers/jails. The rctl facility can limit by jail as well, which this change can't do.

This ambiguity has been causing issues in other parts of the kernel as well. Please take a look at the XXX comment in kern/kern_prot.c's _proc_set_cred function for more details.

Right, it's probably appropriate to prevent, for example, users from logging in when their real UID's RLIMIT_NPROC is being exceeded or would be exceeded by the new process.

I agree.

However, I think this should be done in such a way that the API to userland does not change. For example, Linux sets a flag on setuid(2) when the real UID's RLIMIT_NPROC is exceeded, and if the flag is set and the user is still above the limit, execve(2) fails.

That seems like a very roundabout way of accomplishing the same thing that I'm proposing here and in D55039. Furthermore, taking this approach would mean that each kernel subsystem using a certain limit would have to be changed in a similar manner which is a lot of extra work.

Backwards compatibility and keeping existing models intact is worth something.

Also, most of the per-UID rlimits need not do any such magic because it is uncommon to allocate a resource and then change/set real UID. Only the process and perhaps the pseudo-terminal are allocated by the privileged login machinery and then transferred to the logged in user.

There is also rctl(8) which already exists and can enforce various limits in the way being proposed here. Also, it is much more extensive than traditional rlimits.

rctl(4) is an optional driver that can be omitted from the system or disabled with kern.racct.enable, in which case we are left without a mechanism that prevents a bad actor from exhausting kernel memory by repeatedly allocating kernel objects through a system call or other user-facing APIs.
I assume that is also the reason why these per-UID limits exist in setrlimit(2) in the first place.

Clearly, the per-UID limits were added to setrlimit(2) to allow preventing a runaway program or bad actor from exhausting kernel memory. To make these rlimits fit into the existing model, it was accepted that changes can't take effect immediately. To fix this and add more features in this area, rctl was created. I don't understand why we're creating a new feature that replicates part of rctl when rctl already exists, especially if it breaks how setrlimit(2) currently works (it looks like the changes will make setrlimit(2) do nothing for per-UID limits).