Page MenuHomeFreeBSD

getrlimit.2: Document RSS, AS/VMEM limit behavior more clearly
ClosedPublic

Authored by cem on Aug 19 2015, 6:31 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 5, 4:41 PM
Unknown Object (File)
Mon, Nov 4, 1:29 AM
Unknown Object (File)
Oct 19 2024, 3:44 AM
Unknown Object (File)
Oct 5 2024, 12:47 PM
Unknown Object (File)
Oct 2 2024, 9:05 PM
Unknown Object (File)
Oct 1 2024, 4:13 PM
Unknown Object (File)
Sep 30 2024, 1:23 PM
Unknown Object (File)
Sep 30 2024, 12:08 AM
Subscribers

Details

Summary

Alphabetize the RLIMIT_ list while here.

Test Plan

igor clean.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

cem retitled this revision from to getrlimit.2: Document RSS, AS/VMEM limit behavior more clearly.
cem updated this object.
cem edited the test plan for this revision. (Show Details)
cem added reviewers: jilles, kib, markj.
jilles requested changes to this revision.Aug 19 2015, 7:17 PM
jilles edited edge metadata.
jilles added inline comments.
lib/libc/sys/getrlimit.2
96 ↗(On Diff #8068)

Unmodified mmap(2) pages may be discarded even when there is no swap, so there may still be an effect from RLIMIT_RSS.

187 ↗(On Diff #8068)

This paragraph should be below or combined with the one about stack space and brk(2), since those may fail because of RLIMIT_AS as well.

RLIMIT_VMEM should not be mentioned again here; it just clutters the text.

This revision now requires changes to proceed.Aug 19 2015, 7:17 PM
lib/libc/sys/getrlimit.2
178 ↗(On Diff #8068)

"When mmap(2) would..."?

189 ↗(On Diff #8068)

Maybe, "Processes that exceed the RLIMIT_RSS limit are not signaled..."?

The phrasing suggests that an rlimit can signal or kill a process.

192 ↗(On Diff #8068)

I would avoid mentioning swap or the vm daemon: the vm daemon actually just deactivates pages until the RLIMIT_RSS limit is reached. This makes them candidates for eviction by the page daemon and doesn't necessarily involve swapping, for example if the corresponding vm object is backed by a vnode. I could be missing something though.

Why not group this paragraph with the description of RLIMIT_RSS above?

lib/libc/sys/getrlimit.2
96 ↗(On Diff #8068)

Really? How? The only use of RLIMIT_RSS is in vm_pageout(), which is compiled conditional on #if !defined(NO_SWAPPING) and only kicked under two conditions:

1323         if (vm_swap_enabled && page_shortage > 0)
1324                 vm_req_vmdaemon(VM_SWAP_NORMAL);

or

1438         if (vm_swap_idle_enabled) {
1439                 static long lsec;
1440                 if (time_second != lsec) {
1441                         vm_req_vmdaemon(VM_SWAP_IDLE);

Clearly it cannot have an effect if -DNO_SWAPPING, and even if that is defined, it can still be disabled with tunables/sysctls.

189 ↗(On Diff #8068)

The earlier wording *does* suggest that an rlimit can signal or kill a process, and for some rlimits it can. But, not all rlimits produce signals.

lib/libc/sys/getrlimit.2
189 ↗(On Diff #8068)

Sorry, I understand what you mean, it's just that the text doesn't make sense when read literally. An rlimit is a number, it can't _do_ anything.

cem edited edge metadata.

Address some review feedback.

Combine the RSS sections under the RSS bullet. Reword to make it clear the
numerical rlimit cannot raise a signal by itself.

Amend the broad soft/hard limit descriptions to point out that exceptions are
expected.

cem marked 5 inline comments as done.Aug 19 2015, 9:11 PM
cem added inline comments.
lib/libc/sys/getrlimit.2
185 ↗(On Diff #8071)

Reworded to just "an operation." As jilles@ points out, this could be mmap(2), brk(2), or really anything that allocates VM (new thread, stack growth, ...).

lib/libc/sys/getrlimit.2
96 ↗(On Diff #8071)

Oh, right. That code is indeed very much related to swapping entire processes out, which doesn't really make sense if you don't have a swap device. RLIMIT_RSS is implemented as a more general case of swapping in that some number of pages are not deactivated.

I just guessed instead of reading the code.

189 ↗(On Diff #8071)

This is incorrect for stack extension. If stack extension fails, SIGSEGV is raised and it can only be caught by a handler using the signal stack. As with [ENOMEM] errors, the handler could raise the soft limit and return.

Apart from the already mentioned RLIMIT_DATA, RLIMIT_STACK and RLIMIT_AS, it is also possible for RLIMIT_MEMLOCK to cause memory allocation to fail: if mlockall(MCL_FUTURE) is in effect. The effect is similar to a failure caused by RLIMIT_AS.

wblock added inline comments.
lib/libc/sys/getrlimit.2
78 ↗(On Diff #8071)

Seems like there should be an "allowed" in here to show that this is a limit. Maybe

The maximum number of kqueues this user id is allowed to create.
89 ↗(On Diff #8071)

Similar to above:

The maximum number of pseudo-terminals this user id is allowed to create.
92 ↗(On Diff #8071)

"resident set size may maintain in memory" is unclear. "maintain" has several meanings, probably just "keep" or "manage" here. But "size" does not really apply to those. Winging it with what I think it might mean:

When there is memory pressure and swap is available, limit the resident set size of a process to this amount (in bytes).
96 ↗(On Diff #8071)

How about:

When memory is not under pressure, this limit is effectively ignored.
Even when there is memory pressure, the amount of available swap space and
some tunable settings like
.Xr something-some-reference-to-tunables
can also affect what happens to processes that have exceeded this size.
100 ↗(On Diff #8071)

Don't need to repeat "limit", it's part of the name. Not sure about "nominal". "Set" or "declared" or "configured" might be better.
Avoid the semicolon when at all possible, it's a vague thing somewhere between a comma and a colon, and usually is used to splice sentences anyway. "Simply" has multiple meanings. So maybe:

Processes that exceed their set
.Dv RLIMIT_RSS
are not signalled or halted.
The limit is merely a hint to the VM daemon
101 ↗(On Diff #8071)

It might not be clear what "any such process" is, be specific:

to prefer to deactivate pages from processes that have exceeded their
.Dv RLIMIT_RSS .
129 ↗(On Diff #8071)

This sentence requires a lot of backtracking and context switching because of the asides. Maybe move the examples to separate sentences afterwards? Also, need a comma after "exceeded".

When a soft limit is exceeded, a process might or might not receive a signal.
For example, signals are generated when the CPU time or file size is exceeded, but not if the address
space soft limit or either RSS limit is exceeded.
A program that exceeds the soft limit is allowed to continue execution until it reaches the hard limit,
or modifies its own resource limits.
131 ↗(On Diff #8071)

Split these sentences:

Even reaching the hard limit does not necessarily halt a process.
For example, when the RSS hard limit is exceeded, nothing happens.
188 ↗(On Diff #8071)

Argh, semicolon again. Just break the sentence there:

no signal is raised.
However, the operation fails with
cem marked an inline comment as done.Aug 19 2015, 10:06 PM
cem added inline comments.
lib/libc/sys/getrlimit.2
189 ↗(On Diff #8071)

Can you point to the code for userspace stack extension? It looks to me as if T_STKFLT from userspace is treated as SIGBUS and no attempt is made to allocate AS for more stack (besides, how would the kernel know how/where to do so?).

Userspace that is simply faulting in already mapped stack pages does not use additional AS and cannot run into RLIMIT_AS.

lib/libc/sys/getrlimit.2
78 ↗(On Diff #8071)

Sure. This was just a re-shuffle, but the proposed change seems straightforward.

92 ↗(On Diff #8071)

The proposed rewording is still flawed. The RSS is never actually limited. The name RLIMIT_RSS is misleading, unfortunately.

96 ↗(On Diff #8071)

Sure. I'd replace 'limit' with 'rlimit'. It's an rlimit, but not a 'limit' by any conventional meaning of the English word.

100 ↗(On Diff #8071)

The proposed text looks fine.

Avoid the semicolon when at all possible, it's a vague thing somewhere between a comma and a colon, and usually is used to splice sentences anyway.

I disagree with this writing advice.

"Simply" has multiple meanings.

What doesn't? ¯\_(ツ)_/¯

129 ↗(On Diff #8071)

Works for me. I tried to limit my changes to the original text, but this is clearer.

188 ↗(On Diff #8071)
.Dv RLIMIT_AS ,
the operation fails with ENOMEM and no signal is raised.
cem marked 9 inline comments as done.
cem edited edge metadata.

Address wblock's feedback.

lib/libc/sys/getrlimit.2
97 ↗(On Diff #8073)

Avoid the semicolon when at all possible, it's a vague thing somewhere between a comma and a colon, and usually is used to splice sentences anyway.

As long as we're nit-picking, the quoted sentence is a great example of where semi-colons ought to be used. :/

markj edited edge metadata.
markj added inline comments.
lib/libc/sys/getrlimit.2
132 ↗(On Diff #8073)

"either" seems to be misplaced here.

lib/libc/sys/getrlimit.2
132 ↗(On Diff #8073)

It is intentional (to refer to both the soft and hard limits). Instead, how about just dropping the 'soft' from the AS description?

For example, signals are generated when the cpu time or file size is exceeded,
but not if the address space or RSS limit is exceeded.
lib/libc/sys/getrlimit.2
132 ↗(On Diff #8073)

Oops, I see. I think it's ok either way, I just didn't parse properly it the first time and assumed it was an error. Up to you.

cem edited edge metadata.

One last minor cleanup; drop confusing "either" language (MarkJ's feedback)

cem marked an inline comment as done.Aug 19 2015, 11:13 PM
cem added inline comments.
lib/libc/sys/getrlimit.2
132 ↗(On Diff #8074)

Fixed to be more clear.

This revision was automatically updated to reflect the committed changes.
cem marked an inline comment as done.
lib/libc/sys/getrlimit.2
100 ↗(On Diff #8071)

It's not just an esthetic thing. Shorter and simpler sentences are easier to understand. For documentation, that is the main goal. Other types of writing can effectively use more complicated language to good effect.

lib/libc/sys/getrlimit.2
100 ↗(On Diff #8071)

Yes, you're right. Sorry.

lib/libc/sys/getrlimit.2
189 ↗(On Diff #8071)

vm_map_growstack() in sys/vm/vm_map.c, called from vm_fault_hold() in sys/vm/vm_fault.c. This uses sysctl kern.sgrowsiz.

T_STKFLT is generally unrelated to stack overflows. It occurs when segment-related limits are violated in implied stack references, where other references generate a general protection fault. On amd64, this typically occurs when an address with %rsp or %rbp as base is outside the permitted 2**48 byte space (top 17 bits not all 1 or all 0).

lib/libc/sys/getrlimit.2
189 ↗(On Diff #8071)

Thanks!

Wow, vm_map_growstack is some black magic code.

I guess we should also document that RLIMIT_STACK only applies to the main process thread? The man page could be read as suggesting that that any thread's stack extension is subject to RLIMIT_STACK, where it is not.

lib/libc/sys/getrlimit.2
189 ↗(On Diff #8071)

Yeah, vm_map_growstack is sort of a second level of overcommit. I guess it saves page tables and related structures for memory that is very unlikely to be used.

It should probably be documented that RLIMIT_STACK only applies to the main thread but this is not consistent across systems (in Linux it applies to all thread stacks).