Page MenuHomeFreeBSD

getrlimitusage(2): a syscall and a tool to query current resource usage
ClosedPublic

Authored by kib on Sep 22 2024, 1:11 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Nov 11, 5:51 AM
Unknown Object (File)
Sun, Nov 10, 7:48 PM
Unknown Object (File)
Sun, Nov 10, 7:46 PM
Unknown Object (File)
Sun, Nov 10, 5:07 PM
Unknown Object (File)
Sun, Nov 10, 3:24 PM
Unknown Object (File)
Wed, Nov 6, 11:39 AM
Unknown Object (File)
Wed, Nov 6, 3:18 AM
Unknown Object (File)
Mon, Nov 4, 7:38 PM
Subscribers

Details

Summary

... which are limited by RLIMIT.

This was extremely useful for me when I looked at the pipebuf limits bugs.

[No manpages ATM, will write after feedback on the interfaces. Overall, it is trivial and clear IMO].

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib requested review of this revision.Sep 22 2024, 1:11 PM

IMO it would be better to add this to procstat rlimit output, rather than adding a new utility.

Why not implement this interface with a sysctl, similar to sysctl_kern_proc_rlimit()?

IMO it would be better to add this to procstat rlimit output, rather than adding a new utility.

Why not implement this interface with a sysctl, similar to sysctl_kern_proc_rlimit()?

This is somewhat ortogonal. The syscall is supposed to be used by the process itself to query its usage of the resources. Same as getrlimit(2) queries the limit, despite the presence of sysctl.

For instance, this syscall might finally give the answer to question how much usable anon memory can be allocated, assuming the limits are enforced.

In D46747#1065718, @kib wrote:

IMO it would be better to add this to procstat rlimit output, rather than adding a new utility.

Why not implement this interface with a sysctl, similar to sysctl_kern_proc_rlimit()?

This is somewhat ortogonal. The syscall is supposed to be used by the process itself to query its usage of the resources. Same as getrlimit(2) queries the limit, despite the presence of sysctl.

But it takes a PID as a parameter. Having a simple syscall interface is nice I suppose, IMO it is nice to follow the precedent of:

  • have getrlimitusage(2) operate on the current process
  • provide a sysctl interface to let (lib)procstat etc. query the limit usage of other processes

For instance, this syscall might finally give the answer to question how much usable anon memory can be allocated, assuming the limits are enforced.

Remove pid from the syscall args.
Add sysctl.
Add libprocstat function.
Add procstat limresusage command.

Any advice how to properly use libxo would be appreciated.

lib/libsys/Symbol.sys.map
383 ↗(On Diff #143625)

This should sort before kcmp.

sys/kern/kern_resource.c
810

td is passed in but not used, it is used as a local variable in the RLIMIT_CPU case below.

825

I don't think it's possible right now, but perhaps we should check for vm == NULL as a seatbelt.

1754

Is it possible to support a mode where the user picks a specific rlimit to fetch?

1775

Suppose RLIM_NLIMITS is bumped in the future. Then old callers will get an error since their output buffer is too small. Should we handle that and just copy out as many limits as requested?

sys/sys/resource.h
194

Since we have getrlimit() and setrlimit(), why not call it getrlimitusage() instead of abbreviating "rlimit" here?

sys/sys/sysctl.h
1043 ↗(On Diff #143625)

Call it RLIMIT_USAGE, for consistency with the existing KERN_PROC_RLIMIT?

usr.bin/procstat/procstat.c
108 ↗(On Diff #143625)

As below, I'd prefer to call it rlimitusage.

usr.bin/procstat/procstat_limresusage.c
50 ↗(On Diff #143625)

procstat_getrlimitusage() would be more consistent with similar libprocstat functions.

kib marked 9 inline comments as done.Sep 24 2024, 4:34 PM
kib added inline comments.
sys/kern/kern_resource.c
825

It is ingrained into a lot of places that use vmspace_acq_ref() (but not all).

1775

Callers are supposed to call with *len == 0 initially, to get the size of the response. I did it this way in libprocstat. But changed.

kib marked 2 inline comments as done.
kib retitled this revision from getrlimusage(2): a syscall and a tool to query current resource usage to getrlimitusage(2): a syscall and a tool to query current resource usage.

Handle review.

The code looks ok to me. I didn't try testing it yet (I am away from home right now), but will try it within a day.

sys/kern/kern_resource.c
1780

I think req->oldlen will not be truncated, but this looks suspicious anyway.

sys/kern/kern_resource.c
1780

I am not sure I understand your point. req->oldlen must be capped to avoid memory exposure.

In some inline comments, I refer to a proposed diff. Here it is, to be applied in addition to this revision's diff:

sys/kern/kern_descrip.c
4349–4350 ↗(On Diff #143679)

This function is now orphaned. Remove?

sys/kern/kern_resource.c
819–823

Locking can be avoided for the current process.

See my proposed diff.

824–825

There are no cases where both ui and vm are needed at once, and there are cases where none is needed. So I'd propose to split the big switch into mostly independent parts, which IMO makes things generally more readable, at the expense of testing for which twice (or even three times if the compiler is not smart enough or decides saving on this is not worth it).

See my proposed diff.

834

AFAICT, p_rux.rux_runtime needs to be accessed under the proc stat mtx (this matters for 32-bit arches). On the other hand, cpu_tickrate() needs no lock.

See my proposed diff.

usr.bin/procstat/procstat_rlimitusage.c
66 ↗(On Diff #143679)
70 ↗(On Diff #143679)

Inconsistency with padding in the header.

72 ↗(On Diff #143679)
sys/kern/kern_descrip.c
4349–4350 ↗(On Diff #143679)

Oops, it isn't.

sys/kern/kern_resource.c
1763–1768

The sysctl() machinery already takes care of not overflowing the passed buffer on output.

1780

Same as above.

kib marked 9 inline comments as done.Sep 25 2024, 11:18 PM
kib added inline comments.
sys/kern/kern_resource.c
819–823

This syscall is not perf-critical. I know that curproc can be handled directly, but just prefer less code to the complication.

824–825

Where is the diff?

usr.bin/procstat/procstat_rlimitusage.c
66 ↗(On Diff #143679)

Why?

72 ↗(On Diff #143679)

Again, why?

kib marked 3 inline comments as done.

Lock rux access. Tweak len calculation for sysctl.

sys/kern/kern_resource.c
819–823

Yes, but maybe some other more critical operations also compete for that lock? The proc lock protects lots of things already.

824–825

Forgot that Phab doesn't allow to see an uploaded file if it hasn't been explicitly attached (and displays no warning/reminder about that). You should be able to access the file now (click on the icon in the main comment).

824–825

So I'd propose to split the big switch into mostly independent parts, which IMO makes things generally more readable, at the expense of testing for which twice (snip).

Having separate (sub-) functions would be even clearer than what is in the attached diff.

829–836

I now think I've been overcautious here. p->p_rux.rux_runtime will be read atomically, so it doesn't really matter if under the lock, the only thing that can happen is that p->p_rux.rux_runtime is updated in the meantime, and I don't see it as a problem. So I'd rather see setting *res out of the locks.

usr.bin/procstat/procstat_rlimitusage.c
66 ↗(On Diff #143679)

This is what xo_open_list(3) mandates:

The name given to all calls must be identical, (...)

and what its example code effectively does.

Haven't looked up that code, but I suspect this may be in connection with the fact that, depending on the output, the list of objects appears as a list or just successive elements, printing only the list name in the first case, and only each element's name in the second, and for consistency they are expected to be identical.

72 ↗(On Diff #143679)

See above.

The kernel bits look ok to me.

sys/kern/kern_resource.c
829–836

How can that value be read atomically on 32-bit platforms? I'd expect the proc lock to be required.

usr.bin/procstat/procstat_rlimitusage.c
56 ↗(On Diff #143732)

I don't really understand the desired formatting here:

root@freebsd:~ # procstat rlimitusage $$
    PID     RESOURCE   ID              USAGE
    938          cpu   0 0
    938        fsize   1 -1    
    938         data   2 12288   
    938        stack   3 131072
    938         core   4 -1   
    938          rss   5 3657728
    938      memlock   6 0
    938        nproc   7 29   
    938       nofile   8 4
    938       sbsize   9 348160
    938         vmem  10 14032896
    938         npts  11 0
    938         swap  12 55404
    938      kqueues  13 1
    938         umtx  14 0
    938      pipebuf  15 40960

Probably there should be an extra space preceding all values in the ID column, and we can remove the "18" in the header string?

71 ↗(On Diff #143732)
sys/kern/kern_resource.c
829–836

Yes, ignore my last comment on this topic, I had forgotten that rux_runtime is also 64 bits on 32-bit platforms.

Instead, the proposal in the attached diff still stands (cpu_tickrate() doesn't have to be called under the locks).

usr.bin/procstat/procstat_rlimitusage.c
56 ↗(On Diff #143732)

All these problems should be fixed after applying my suggestions below.

68 ↗(On Diff #143732)
71 ↗(On Diff #143732)

Includes @markj's fix.

To @olce I did considered doing the separate switches to ref proc or vmspace before the operation. I rejected it because it doubles the amount of code for non-perf critical syscall. Similarly I considered doing a table of case attributes, like I did for procctl(2), and decided that it is not worth it for the same reason.

Really, this should be as compact as straightforward as possible, not micro-optimized.

In D46747#1066964, @kib wrote:

Really, this should be as compact as straightforward as possible, not micro-optimized.

I'm not suggesting to micro-optimize, except for the proc lock as it is used for lots (too many?) things already.

I'm instead suggesting to separate different cases that have different requirements and to factorize boilerplate code. In the current version, e.g., you have to test whether vm == NULL for each case where vm must be used. By separating cases where vm is needed from those where ui is, you can apply boilerplate code (alloc, checks, dealloc) once and for all and simplify control flow. This also has the (nice) side effect that you then don't need to lookup ui when it is vm that is needed.

The only drawback doing this is duplicating the case XXX: in two switches. But then it is almost instantaneous to grasp what is going on and why the code is globally correct (then, we can concentrate on each particular case's specific code). With the current state, both are intermingled, boilerplate code is duplicated, and we have to check more carefully for control flow.

This function is not *that* long, so I don't insist, but I'll hope you'll keep considering doing that in other cases.

Move division out of the locks.
More fighting with libxo.

root@r-freeb44:~ #procstat rlimitusage $$
    PID     RESOURCE   ID              USAGE
   2303          cpu   0                  0
    2303        fsize   1                 -1
    2303         data   2              12288
    2303        stack   3             131072
    2303         core   4                 -1
    2303          rss   5            3649536
    2303      memlock   6                  0
    2303        nproc   7                 26
    2303       nofile   8                  4
    2303       sbsize   9             920528
    2303         vmem  10           14036992
    2303         npts  11                  2
    2303         swap  12              71441
    2303      kqueues  13                  1
    2303         umtx  14                  0
    2303      pipebuf  15                  0

No idea about the 'eaten' space in the cpu line.

In D46747#1067001, @kib wrote:

No idea about the 'eaten' space in the cpu line.

I think it's rather that the other lines have an extra space. There are still two changes missing, I've highlighted them with two inline comments. With these, I think the display should be OK.

usr.bin/procstat/procstat_rlimitusage.c
71 ↗(On Diff #143762)

\n is still inside the brackets (the "field description"). And the space after it is then superfluous.

68 ↗(On Diff #143732)

This one still needs to be applied (a space is missing after {:resource/%12s}).

kib marked 6 inline comments as done.

More tweaks for libxo.

This revision is now accepted and ready to land.Sep 26 2024, 8:19 PM