Details

Reviewers

markj
mjg

Commits

rS350199: Check and avoid overflow when incrementing fp->f_count in

Summary

I believe that f_count can be legitimately very large, e.g. malicious code can dup same fd up to the per-process filedescriptors limit, and then fork as much as it can. On some machine, I see
kern.maxfilesperproc: 939132
kern.maxprocperuid: 34203
which already overflows u_int. More, the malicious code can create transient references by sending fds over unix sockets.

I realized that this check is missed after reading https://secfault-security.com/blog/FreeBSD-SA-1902.fd.html

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib created this revision.Jul 13 2019, 9:08 PM

Herald added a subscriber: imp. · View Herald TranscriptJul 13 2019, 9:08 PM

Harbormaster completed remote builds in B25334: Diff 59732.Jul 13 2019, 9:08 PM

kib added inline comments.Jul 13 2019, 9:09 PM

sys/kern/kern_descrip.c
2660 ↗	(On Diff #59732)	This is obviously unrelated style cleanup.

I don't think this is the right fix. There is other code bumping the count (with fhold()) and that is still unchecked.

Looks like the type needs to be grown to an unsigned long instead. size-wise the struct already exceeds one cache line and I failed to find a way to shrink it.

Not sure what to do about refcount_* stuff. perhaps it should grow a refcount_long (or ref_long?) variant.

In D20947#454095, @mjg wrote:

I don't think this is the right fix. There is other code bumping the count (with fhold()) and that is still unchecked.

Indeed.

Looks like the type needs to be grown to an unsigned long instead. size-wise the struct already exceeds one cache line and I failed to find a way to shrink it.

Long would not help on ILP32 arches.

Not sure what to do about refcount_* stuff. perhaps it should grow a refcount_long (or ref_long?) variant.

I think we should add refcount_acquire_checked() and propagate the error up.

In D20947#454096, @kib wrote:

In D20947#454095, @mjg wrote:

Looks like the type needs to be grown to an unsigned long instead. size-wise the struct already exceeds one cache line and I failed to find a way to shrink it.

Long would not help on ILP32 arches.

Do they need this to be 64 bit in their case though? Their effective limits may be small enough not to get into trouble.

Not sure what to do about refcount_* stuff. perhaps it should grow a refcount_long (or ref_long?) variant.

I think we should add refcount_acquire_checked() and propagate the error up.

I think wider refcounts (alongside ints) are inevitable and I don't think it's a problem to start introducing them now. We will eventually run into a place which needs to bump the count and where error recovery is impossible.

It seems that there are two related problems:

Refcount overflows triggered by a kernel bug, such as the one dissected in the link.
Refcount overflows triggered by malicious or buggy userspace, for instance by allocating 2^32 descriptors. (Note, at present this requires sizeof(filedescent) * (1 << 32) = 206GB of memory on amd64, including for descriptors internalized in unix socket buffers.)

I believe that we should unconditionally mitigate 1), either by widening the refcount type or by checking for signed overflow in all refcount operations like Linux does on x86. It is difficult to know whether a given refcount overflow is exploitable or not, so it seems prudent to accept the small cost of checking all operations and panic if an overflow is detected. 2) should not be permitted to trigger a kernel panic, so that suggests that we have a separate KPI that allows the caller to handle the overflow policy. I would therefore propose having refcount.h intrinsics panic on overflow conditions, and provide a bool refcount_acquire_notchecked(u_int *); function that allows the caller to detect overflow and do whatever they wish to mitigate the problem.

32-bit refcounts are used in some frequently allocated structures like struct file, vm_object, vm_page, etc., so I don't really like widening the field just to cope with kernel bugs.

Handle overflow in fhold().

Harbormaster completed remote builds in B25338: Diff 59741.Jul 14 2019, 6:14 PM

kib retitled this revision from Check and avoid overflow in fget_unlocked(). to Check and avoid overflow when incrementing fp->f_count in fget_unlocked() and fhold()..Jul 14 2019, 6:15 PM

In D20947#454137, @markj wrote:

It seems that there are two related problems:

Refcount overflows triggered by a kernel bug, such as the one dissected in the link.

Refcount overflows triggered by malicious or buggy userspace, for instance by allocating 2^32 descriptors. (Note, at present this requires sizeof(filedescent) * (1 << 32) = 206GB of memory on amd64, including for descriptors internalized in unix socket buffers.)

I believe that we should unconditionally mitigate 1), either by widening the refcount type or by checking for signed overflow in all refcount operations like Linux does on x86. It is difficult to know whether a given refcount overflow is exploitable or not, so it seems prudent to accept the small cost of checking all operations and panic if an overflow is detected. 2) should not be permitted to trigger a kernel panic, so that suggests that we have a separate KPI that allows the caller to handle the overflow policy. I would therefore propose having refcount.h intrinsics panic on overflow conditions, and provide a bool refcount_acquire_notchecked(u_int *); function that allows the caller to detect overflow and do whatever they wish to mitigate the problem.

In principle I agree, but let's postpone the generic refcounting KPI change to later pass.

I already named the function refcount_acquire_checked().

32-bit refcounts are used in some frequently allocated structures like struct file, vm_object, vm_page, etc., so I don't really like widening the field just to cope with kernel bugs.

I agree, and even if we wanted to expand refcount to uint64_t (long cannot be used on ILP32), not all 32bit arches provide 64bit atomics still.

Note I did not propose converting refcount to long (or 64-bit). I proposed adding a long (or 64-bit) variant.

My main point is that sooner or later there will be a case where a 32-bit counter can overflow and one of places which bump it can't do error recovery. So if playing with refcounts because of struct file, perhaps the extra variant should be added instead of error recovery in several places. The need to switch the type for files will show up sooner than later and widening the struct is not substantial.

In D20947#454166, @mjg wrote:

Note I did not propose converting refcount to long (or 64-bit). I proposed adding a long (or 64-bit) variant.

My main point is that sooner or later there will be a case where a 32-bit counter can overflow and one of places which bump it can't do error recovery. So if playing with refcounts because of struct file, perhaps the extra variant should be added instead of error recovery in several places. The need to switch the type for files will show up sooner than later and widening the struct is not substantial.

And I still do not understand how would it work or help. long == int on ILP32, so changing f_count to long is not too helpful even if we take into the account Mark' note about filedesc size. Then, changing to uint64_t does not work because not all arches have 64bit atomics.

Architectures problematic in this regard definitely don't have anywhere near enough resources for a 32-bit overflow to be realistic, so they can stick to that size.

Note I don't think this is blocking. The new patch works for me if the above concerns can't be easily resolved (although would be nice to grow something like __must_check for the new primitive).

kib added a subscriber: pho.Jul 18 2019, 9:42 AM

IMO fhold() and refcount_acquire_checked() should have the __result_use_check annotation.

sys/kern/kern_descrip.c
2684 ↗	(On Diff #59741)	Should we add a refcount KPI for this check+atomic operation?

markj added inline comments.Jul 18 2019, 9:01 PM

sys/sys/file.h
289 ↗	(On Diff #59741)	Style, missing newline.

kib added inline comments.Jul 18 2019, 9:36 PM

sys/kern/kern_descrip.c
2684 ↗	(On Diff #59741)	Could you explain your proposal in more details ? refcount_acquire_checked() cannot be used there due to need of reload fdt in the loop.
sys/sys/file.h
289 ↗	(On Diff #59741)	After r348594 the blank line is optional.

Annotate with __result_use_check.

Harbormaster completed remote builds in B25411: Diff 59899.Jul 18 2019, 9:37 PM

markj added inline comments.Jul 18 2019, 10:09 PM

sys/kern/kern_descrip.c
2684 ↗	(On Diff #59741)	I was thinking of something like: static inline int refcount_acquire_if_equal_checked(volatile u_int countp, u_int valp) { u_int nval; nval = *valp; if (nval + 1 < nval) return (-1); return (atomic_fcmpset(countp, valp, nval + 1)); } Then return EBADF if the return value is -1. This is all somewhat awkward, but I dislike that we are otherwise manipulating the refcount value outside of the refcount KPI. Anyway, it is just an idea.
sys/kern/uipc_usrreq.c
2219 ↗	(On Diff #59899)	This can be leaked.
sys/sys/file.h
287 ↗	(On Diff #59899)	Oh, it may not be necessary to add it to fhold() too if __result_use_check is transitive.

kib marked an inline comment as done.Jul 18 2019, 10:25 PM

kib added inline comments.

sys/sys/file.h
287 ↗	(On Diff #59899)	I think it is easier to add it to fhold() instead of relying on too much smartness in the compiler.

Reorder to avoid leak.

Harbormaster completed remote builds in B25415: Diff 59904.Jul 18 2019, 10:26 PM

markj accepted this revision.Jul 18 2019, 10:32 PM

This revision is now accepted and ready to land.Jul 18 2019, 10:32 PM

mjg accepted this revision.Jul 18 2019, 10:46 PM

mjg added inline comments.

sys/kern/kern_descrip.c
1912 ↗	(On Diff #59904)	this can be done prior to taking the lock to shorten the hold time
sys/sys/refcount.h
67 ↗	(On Diff #59904)	these could use prediction hints - the overflow almost never happens and fcmpset almost always succeeds