Page MenuHomeFreeBSD

setrlimit: Take stack gap into account.
ClosedPublic

Authored by dgr_semihalf.com on Aug 12 2021, 8:31 AM.

Details

Summary

Calling setrlimit with stack gap enabled and with low values of stack
resource limit often caused the program to abort immediately after
exiting the syscall. This happened due to the fact that the resource
limit was calculated assuming that the stack started at sv_usrstack,
while with stack gap enabled the stack is moved by a random number
of bytes.

Save information about stack size in struct vmspace and adjust the
rlim_cur value. If the rlim_cur and stack gap is bigger than rlim_max,
then the value is truncated to rlim_max.

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

dgr_semihalf.com created this revision.

I have reverted (locally) the NTP mitigation commits. This patch totally resolves the NTP issues. The code looks good. Good to go from this perspective.

Secondly, firefox still segfaults (unless procctl (proccontrol -m aslr -s disable firefox) is used. I think this might be a juxtaposed issue.

This revision is now accepted and ready to land.Aug 16 2021, 5:02 PM
In D31516#711576, @cy wrote:

I have reverted (locally) the NTP mitigation commits. This patch totally resolves the NTP issues. The code looks good. Good to go from this perspective.

Secondly, firefox still segfaults (unless procctl (proccontrol -m aslr -s disable firefox) is used. I think this might be a juxtaposed issue.

Yeah, firefox still segfaults. Both the firefox issue and ntpd issue are caused by the fact that the kernel assumes that stack starts at a constant address. This is not the case with stack gap anymore. There are more places in which the kernel assumes that stack starts at a constant address, and that's why firefox still doesn't work. I'll send a patch which should fix the rest of the issues this week.

Sorry, but I do not agree with this approach. We have to start stack at USRSTACK (and I had to provide knob to shrink LA57 to LA48 in part due to this issue), and IMO we should keep the traditional interpretation of RLIMIT_STACK. In particular, RLIMIT_STACK use by programs that to mprotect(MLOCK_ALL) is to limit the amount of wired stack memory.

You can try something less intrusive if you want to make this programs work without elf note flag, for instance you could auto-adjust soft RLIMIT_STACK to not be less than gap + new value, but still in hard limit of RLIMIT_STACK. The vm_stkgap part of your patch would be reused, with a small addition in kern_resource.c. But I would not do even that: just set a flag.

dgr_semihalf.com edited the summary of this revision. (Show Details)

Updated setrlimit to automatically adjust stack rlim_cur value by the stack gap size. This also works, however now the rlim_max is properly respected. Ntpd also doesn't crash.

This revision now requires review to proceed.Aug 26 2021, 1:54 PM

Hi @kib,

Do you have any remarks about this version of the diff (and also D31692)?

Personally I don't think that disabling stack gap for every misbehaving program is the right approach.

sys/kern/imgact_elf.c
2698

I suggest to move the assignment from there to exec_stackgap().

Both to have the set up/cleaning in the same place, and to make vm_stkgap initialization non depended on specific image activator.

sys/kern/kern_exec.c
1576

Make sv_stackgap method return the gap value, and assign it to vm_stkgap there.

sys/kern/kern_resource.c
675

This needs to be clamped against hard limit.

sys/vm/vm_map.h
296

We have vm_size_t type. Also please note in the comment that the gap size is in bytes.

  • Made sv_stackgap return the size of the stack gap.
  • Moved vm_stkgap assignment into exec_stackgap().
  • Changed vm_stkgap type to vm_size_t
  • Modified a comment to mention that stack gap size is in bytes.

Hi, do you have any further comments or remarks? If there are no objections, I'm going to merge this patch by EOW.

sys/kern/kern_exec.c
1576

I suggest to add local struct proc *p var, initialized with imgp->proc. Then you can simplify at least three lines in the function.

Added struct proc *p helper variable.

sys/kern/kern_resource.c
675

You are clamping against the hard limit passed by userspace? This is wrong. You need to clamp it against process rlim_max.

Hmm, I think this is already done at line 692. I am sorry for not noting it initially.

sys/kern/kern_resource.c
675

Yeah, looking at this more closely, this is already being done for every resource limit slightly later in this function, so we don't need it here.

This revision is now accepted and ready to land.Oct 14 2021, 2:26 PM
This revision was automatically updated to reflect the committed changes.
markj added inline comments.
sys/kern/kern_resource.c
690

If userspace does rlim_cur = rlim_max = <value>, which IMHO is reasonable, then the stack gap is not taken into account, and the vm_map_protect() call below can clobber the existing stack if the gap is big enough. Why don't we adjust the hard limit by the gap as well?

sys/kern/kern_resource.c
690

The original patch did something similar (although it didn't adjust the rlimit structure, so it was invisible for the user). I adjusted this to be less intrusive according to kib's suggestion.

sys/kern/kern_resource.c
690

We must not allow user to increase hard limit, unless user has appropriate privilege. Then, this automatic tweaking should not depend on the user uid.

If there is an issue with vm_map_protect() over-extending its region, it should be clamped accordingly.