Details

Reviewers

brooks
markj
emaste
olce

Commits

rG5ac9320ddf50: procstat(1): add rlimitusage subcommand
rG6126f4ea646a: libprocstat: add procstat_getrlimitusage()
rGc85d3064c454: sysctl: add KERN_PROC_RLIMIT_USAGE
rG9b29fc89ae20: Userspace enablement for getrlimitusage(2)
rG9c3e516ad081: Add proc_nfiles(9)
rGf028f44ef3cb: Add getrlimitusage(2)

Summary

... which are limited by RLIMIT.

This was extremely useful for me when I looked at the pipebuf limits bugs.

[No manpages ATM, will write after feedback on the interfaces. Overall, it is trivial and clear IMO].

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib created this revision.Sep 22 2024, 1:11 PM

Herald added a reviewer: olce. · View Herald TranscriptSep 22 2024, 1:11 PM

Herald added subscribers: olce, imp. · View Herald Transcript

kib requested review of this revision.Sep 22 2024, 1:11 PM

IMO it would be better to add this to procstat rlimit output, rather than adding a new utility.

Why not implement this interface with a sysctl, similar to sysctl_kern_proc_rlimit()?

In D46747#1065703, @markj wrote:

IMO it would be better to add this to procstat rlimit output, rather than adding a new utility.

Why not implement this interface with a sysctl, similar to sysctl_kern_proc_rlimit()?

This is somewhat ortogonal. The syscall is supposed to be used by the process itself to query its usage of the resources. Same as getrlimit(2) queries the limit, despite the presence of sysctl.

For instance, this syscall might finally give the answer to question how much usable anon memory can be allocated, assuming the limits are enforced.

In D46747#1065718, @kib wrote:

In D46747#1065703, @markj wrote:

IMO it would be better to add this to procstat rlimit output, rather than adding a new utility.

Why not implement this interface with a sysctl, similar to sysctl_kern_proc_rlimit()?

This is somewhat ortogonal. The syscall is supposed to be used by the process itself to query its usage of the resources. Same as getrlimit(2) queries the limit, despite the presence of sysctl.

But it takes a PID as a parameter. Having a simple syscall interface is nice I suppose, IMO it is nice to follow the precedent of:

have getrlimitusage(2) operate on the current process
provide a sysctl interface to let (lib)procstat etc. query the limit usage of other processes

For instance, this syscall might finally give the answer to question how much usable anon memory can be allocated, assuming the limits are enforced.

Remove pid from the syscall args.
Add sysctl.
Add libprocstat function.
Add procstat limresusage command.

Any advice how to properly use libxo would be appreciated.

markj added inline comments.Sep 24 2024, 8:29 AM

lib/libsys/Symbol.sys.map
383 ↗	(On Diff #143625)	This should sort before kcmp.
sys/kern/kern_resource.c
810	`td` is passed in but not used, it is used as a local variable in the RLIMIT_CPU case below.
825	I don't think it's possible right now, but perhaps we should check for `vm == NULL` as a seatbelt.
1754	Is it possible to support a mode where the user picks a specific rlimit to fetch?
1775	Suppose RLIM_NLIMITS is bumped in the future. Then old callers will get an error since their output buffer is too small. Should we handle that and just copy out as many limits as requested?
sys/sys/resource.h
194	Since we have getrlimit() and setrlimit(), why not call it getrlimitusage() instead of abbreviating "rlimit" here?
sys/sys/sysctl.h
1043 ↗	(On Diff #143625)	Call it RLIMIT_USAGE, for consistency with the existing KERN_PROC_RLIMIT?
usr.bin/procstat/procstat.c
108 ↗	(On Diff #143625)	As below, I'd prefer to call it rlimitusage.
usr.bin/procstat/procstat_limresusage.c
50 ↗	(On Diff #143625)	procstat_getrlimitusage() would be more consistent with similar libprocstat functions.

kib marked 9 inline comments as done.Sep 24 2024, 4:34 PM

kib added inline comments.

sys/kern/kern_resource.c
825	It is ingrained into a lot of places that use vmspace_acq_ref() (but not all).
1775	Callers are supposed to call with *len == 0 initially, to get the size of the response. I did it this way in libprocstat. But changed.

Handle review.

The code looks ok to me. I didn't try testing it yet (I am away from home right now), but will try it within a day.

sys/kern/kern_resource.c
1780	I think `req->oldlen` will not be truncated, but this looks suspicious anyway.

kib added inline comments.Sep 25 2024, 8:21 AM

sys/kern/kern_resource.c
1780	I am not sure I understand your point. req->oldlen must be capped to avoid memory exposure.

In some inline comments, I refer to a proposed diff. Here it is, to be applied in addition to this revision's diff:

D46747-olce.patch3 KBDownload

sys/kern/kern_descrip.c
4349–4350 ↗	(On Diff #143679)	This function is now orphaned. Remove?
sys/kern/kern_resource.c
819–823	Locking can be avoided for the current process. See my proposed diff.
824–825	There are no cases where both `ui` and `vm` are needed at once, and there are cases where none is needed. So I'd propose to split the big `switch` into mostly independent parts, which IMO makes things generally more readable, at the expense of testing for `which` twice (or even three times if the compiler is not smart enough or decides saving on this is not worth it). See my proposed diff.
834	AFAICT, `p_rux.rux_runtime` needs to be accessed under the proc stat mtx (this matters for 32-bit arches). On the other hand, cpu_tickrate() needs no lock. See my proposed diff.
usr.bin/procstat/procstat_rlimitusage.c
66 ↗	(On Diff #143679)
70 ↗	(On Diff #143679)	Inconsistency with padding in the header.
72 ↗	(On Diff #143679)

olce added inline comments.Sep 25 2024, 4:43 PM

sys/kern/kern_descrip.c
4349–4350 ↗	(On Diff #143679)	Oops, it isn't.

olce added inline comments.Sep 25 2024, 6:48 PM

sys/kern/kern_resource.c
1763–1768	The `sysctl()` machinery already takes care of not overflowing the passed buffer on output.
1780	Same as above.

kib marked 9 inline comments as done.Sep 25 2024, 11:18 PM

kib added inline comments.

sys/kern/kern_resource.c
819–823	This syscall is not perf-critical. I know that curproc can be handled directly, but just prefer less code to the complication.
824–825	Where is the diff?
usr.bin/procstat/procstat_rlimitusage.c
66 ↗	(On Diff #143679)	Why?
72 ↗	(On Diff #143679)	Again, why?

Lock rux access. Tweak len calculation for sysctl.

olce attached a referenced file: F96600764: D46747-olce.patch. (Show Details)Sep 26 2024, 5:34 AM

olce added inline comments.Sep 26 2024, 7:38 AM

sys/kern/kern_resource.c
819–823	Yes, but maybe some other more critical operations also compete for that lock? The proc lock protects lots of things already.
824–825	Forgot that Phab doesn't allow to see an uploaded file if it hasn't been explicitly attached (and displays no warning/reminder about that). You should be able to access the file now (click on the icon in the main comment).
824–825	So I'd propose to split the big switch into mostly independent parts, which IMO makes things generally more readable, at the expense of testing for which twice (snip). Having separate (sub-) functions would be even clearer than what is in the attached diff.
829–836	I now think I've been overcautious here. `p->p_rux.rux_runtime` will be read atomically, so it doesn't really matter if under the lock, the only thing that can happen is that `p->p_rux.rux_runtime` is updated in the meantime, and I don't see it as a problem. So I'd rather see setting `*res` out of the locks.
usr.bin/procstat/procstat_rlimitusage.c
66 ↗	(On Diff #143679)	This is what `xo_open_list(3)` mandates: The name given to all calls must be identical, (...) and what its example code effectively does. Haven't looked up that code, but I suspect this may be in connection with the fact that, depending on the output, the list of objects appears as a list or just successive elements, printing only the list name in the first case, and only each element's name in the second, and for consistency they are expected to be identical.
72 ↗	(On Diff #143679)	See above.

The kernel bits look ok to me.

sys/kern/kern_resource.c
829–836	How can that value be read atomically on 32-bit platforms? I'd expect the proc lock to be required.
usr.bin/procstat/procstat_rlimitusage.c
56 ↗	(On Diff #143732)	I don't really understand the desired formatting here: root@freebsd:~ # procstat rlimitusage $$ PID RESOURCE ID USAGE 938 cpu 0 0 938 fsize 1 -1 938 data 2 12288 938 stack 3 131072 938 core 4 -1 938 rss 5 3657728 938 memlock 6 0 938 nproc 7 29 938 nofile 8 4 938 sbsize 9 348160 938 vmem 10 14032896 938 npts 11 0 938 swap 12 55404 938 kqueues 13 1 938 umtx 14 0 938 pipebuf 15 40960 Probably there should be an extra space preceding all values in the ID column, and we can remove the "18" in the header string?
71 ↗	(On Diff #143732)

olce added inline comments.Sep 26 2024, 1:55 PM

sys/kern/kern_resource.c
829–836	Yes, ignore my last comment on this topic, I had forgotten that `rux_runtime` is also 64 bits on 32-bit platforms. Instead, the proposal in the attached diff still stands (`cpu_tickrate()` doesn't have to be called under the locks).
usr.bin/procstat/procstat_rlimitusage.c
56 ↗	(On Diff #143732)	All these problems should be fixed after applying my suggestions below.
68 ↗	(On Diff #143732)
71 ↗	(On Diff #143732)	Includes @markj's fix.

To @olce I did considered doing the separate switches to ref proc or vmspace before the operation. I rejected it because it doubles the amount of code for non-perf critical syscall. Similarly I considered doing a table of case attributes, like I did for procctl(2), and decided that it is not worth it for the same reason.

Really, this should be as compact as straightforward as possible, not micro-optimized.

In D46747#1066964, @kib wrote:

Really, this should be as compact as straightforward as possible, not micro-optimized.

I'm not suggesting to micro-optimize, except for the proc lock as it is used for lots (too many?) things already.

I'm instead suggesting to separate different cases that have different requirements and to factorize boilerplate code. In the current version, e.g., you have to test whether vm == NULL for each case where vm must be used. By separating cases where vm is needed from those where ui is, you can apply boilerplate code (alloc, checks, dealloc) once and for all and simplify control flow. This also has the (nice) side effect that you then don't need to lookup ui when it is vm that is needed.

The only drawback doing this is duplicating the case XXX: in two switches. But then it is almost instantaneous to grasp what is going on and why the code is globally correct (then, we can concentrate on each particular case's specific code). With the current state, both are intermingled, boilerplate code is duplicated, and we have to check more carefully for control flow.

This function is not *that* long, so I don't insist, but I'll hope you'll keep considering doing that in other cases.

Move division out of the locks.
More fighting with libxo.

root@r-freeb44:~ #procstat rlimitusage $$
    PID     RESOURCE   ID              USAGE
   2303          cpu   0                  0
    2303        fsize   1                 -1
    2303         data   2              12288
    2303        stack   3             131072
    2303         core   4                 -1
    2303          rss   5            3649536
    2303      memlock   6                  0
    2303        nproc   7                 26
    2303       nofile   8                  4
    2303       sbsize   9             920528
    2303         vmem  10           14036992
    2303         npts  11                  2
    2303         swap  12              71441
    2303      kqueues  13                  1
    2303         umtx  14                  0
    2303      pipebuf  15                  0

No idea about the 'eaten' space in the cpu line.

In D46747#1067001, @kib wrote:

No idea about the 'eaten' space in the cpu line.

I think it's rather that the other lines have an extra space. There are still two changes missing, I've highlighted them with two inline comments. With these, I think the display should be OK.

usr.bin/procstat/procstat_rlimitusage.c
71 ↗	(On Diff #143762)	`\n` is still inside the brackets (the "field description"). And the space after it is then superfluous.
68 ↗	(On Diff #143732)	This one still needs to be applied (a space is missing after `{:resource/%12s}`).