Page MenuHomeFreeBSD

Check and avoid overflow when incrementing fp->f_count in fget_unlocked() and fhold().
ClosedPublic

Authored by kib on Jul 13 2019, 9:08 PM.

Details

Summary

I believe that f_count can be legitimately very large, e.g. malicious code can dup same fd up to the per-process filedescriptors limit, and then fork as much as it can. On some machine, I see
kern.maxfilesperproc: 939132
kern.maxprocperuid: 34203
which already overflows u_int. More, the malicious code can create transient references by sending fds over unix sockets.

I realized that this check is missed after reading https://secfault-security.com/blog/FreeBSD-SA-1902.fd.html

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

kib created this revision.Jul 13 2019, 9:08 PM
kib added inline comments.Jul 13 2019, 9:09 PM
sys/kern/kern_descrip.c
2660 ↗(On Diff #59732)

This is obviously unrelated style cleanup.

mjg added a comment.EditedJul 13 2019, 9:47 PM

I don't think this is the right fix. There is other code bumping the count (with fhold()) and that is still unchecked.

Looks like the type needs to be grown to an unsigned long instead. size-wise the struct already exceeds one cache line and I failed to find a way to shrink it.

Not sure what to do about refcount_* stuff. perhaps it should grow a refcount_long (or ref_long?) variant.

kib added a comment.Jul 13 2019, 10:11 PM
In D20947#454095, @mjg wrote:

I don't think this is the right fix. There is other code bumping the count (with fhold()) and that is still unchecked.

Indeed.

Looks like the type needs to be grown to an unsigned long instead. size-wise the struct already exceeds one cache line and I failed to find a way to shrink it.

Long would not help on ILP32 arches.

Not sure what to do about refcount_* stuff. perhaps it should grow a refcount_long (or ref_long?) variant.

I think we should add refcount_acquire_checked() and propagate the error up.

mjg added a comment.EditedJul 14 2019, 12:16 PM
In D20947#454096, @kib wrote:
In D20947#454095, @mjg wrote:

Looks like the type needs to be grown to an unsigned long instead. size-wise the struct already exceeds one cache line and I failed to find a way to shrink it.

Long would not help on ILP32 arches.

Do they need this to be 64 bit in their case though? Their effective limits may be small enough not to get into trouble.

Not sure what to do about refcount_* stuff. perhaps it should grow a refcount_long (or ref_long?) variant.

I think we should add refcount_acquire_checked() and propagate the error up.

I think wider refcounts (alongside ints) are inevitable and I don't think it's a problem to start introducing them now. We will eventually run into a place which needs to bump the count and where error recovery is impossible.

markj added a comment.Jul 14 2019, 5:03 PM

It seems that there are two related problems:

  1. Refcount overflows triggered by a kernel bug, such as the one dissected in the link.
  2. Refcount overflows triggered by malicious or buggy userspace, for instance by allocating 2^32 descriptors. (Note, at present this requires sizeof(filedescent) * (1 << 32) = 206GB of memory on amd64, including for descriptors internalized in unix socket buffers.)

I believe that we should unconditionally mitigate 1), either by widening the refcount type or by checking for signed overflow in all refcount operations like Linux does on x86. It is difficult to know whether a given refcount overflow is exploitable or not, so it seems prudent to accept the small cost of checking all operations and panic if an overflow is detected. 2) should not be permitted to trigger a kernel panic, so that suggests that we have a separate KPI that allows the caller to handle the overflow policy. I would therefore propose having refcount.h intrinsics panic on overflow conditions, and provide a bool refcount_acquire_notchecked(u_int *); function that allows the caller to detect overflow and do whatever they wish to mitigate the problem.

32-bit refcounts are used in some frequently allocated structures like struct file, vm_object, vm_page, etc., so I don't really like widening the field just to cope with kernel bugs.

kib updated this revision to Diff 59741.Jul 14 2019, 6:14 PM

Handle overflow in fhold().

kib retitled this revision from Check and avoid overflow in fget_unlocked(). to Check and avoid overflow when incrementing fp->f_count in fget_unlocked() and fhold()..Jul 14 2019, 6:15 PM
kib added a comment.Jul 14 2019, 6:17 PM

It seems that there are two related problems:

  1. Refcount overflows triggered by a kernel bug, such as the one dissected in the link.
  2. Refcount overflows triggered by malicious or buggy userspace, for instance by allocating 2^32 descriptors. (Note, at present this requires sizeof(filedescent) * (1 << 32) = 206GB of memory on amd64, including for descriptors internalized in unix socket buffers.)

I believe that we should unconditionally mitigate 1), either by widening the refcount type or by checking for signed overflow in all refcount operations like Linux does on x86. It is difficult to know whether a given refcount overflow is exploitable or not, so it seems prudent to accept the small cost of checking all operations and panic if an overflow is detected. 2) should not be permitted to trigger a kernel panic, so that suggests that we have a separate KPI that allows the caller to handle the overflow policy. I would therefore propose having refcount.h intrinsics panic on overflow conditions, and provide a bool refcount_acquire_notchecked(u_int *); function that allows the caller to detect overflow and do whatever they wish to mitigate the problem.

In principle I agree, but let's postpone the generic refcounting KPI change to later pass.

I already named the function refcount_acquire_checked().

32-bit refcounts are used in some frequently allocated structures like struct file, vm_object, vm_page, etc., so I don't really like widening the field just to cope with kernel bugs.

I agree, and even if we wanted to expand refcount to uint64_t (long cannot be used on ILP32), not all 32bit arches provide 64bit atomics still.

mjg added a comment.EditedJul 14 2019, 6:53 PM

Note I did not propose converting refcount to long (or 64-bit). I proposed adding a long (or 64-bit) variant.

My main point is that sooner or later there will be a case where a 32-bit counter can overflow and one of places which bump it can't do error recovery. So if playing with refcounts because of struct file, perhaps the extra variant should be added instead of error recovery in several places. The need to switch the type for files will show up sooner than later and widening the struct is not substantial.

kib added a comment.Jul 14 2019, 7:04 PM
In D20947#454166, @mjg wrote:

Note I did not propose converting refcount to long (or 64-bit). I proposed adding a long (or 64-bit) variant.
My main point is that sooner or later there will be a case where a 32-bit counter can overflow and one of places which bump it can't do error recovery. So if playing with refcounts because of struct file, perhaps the extra variant should be added instead of error recovery in several places. The need to switch the type for files will show up sooner than later and widening the struct is not substantial.

And I still do not understand how would it work or help. long == int on ILP32, so changing f_count to long is not too helpful even if we take into the account Mark' note about filedesc size. Then, changing to uint64_t does not work because not all arches have 64bit atomics.

mjg added a comment.EditedJul 14 2019, 7:14 PM

Architectures problematic in this regard definitely don't have anywhere near enough resources for a 32-bit overflow to be realistic, so they can stick to that size.

Note I don't think this is blocking. The new patch works for me if the above concerns can't be easily resolved (although would be nice to grow something like __must_check for the new primitive).

kib added a subscriber: pho.Jul 18 2019, 9:42 AM
markj added a comment.Jul 18 2019, 8:59 PM

IMO fhold() and refcount_acquire_checked() should have the __result_use_check annotation.

sys/kern/kern_descrip.c
2684 ↗(On Diff #59741)

Should we add a refcount KPI for this check+atomic operation?

markj added inline comments.Jul 18 2019, 9:01 PM
sys/sys/file.h
289 ↗(On Diff #59741)

Style, missing newline.

kib added inline comments.Jul 18 2019, 9:36 PM
sys/kern/kern_descrip.c
2684 ↗(On Diff #59741)

Could you explain your proposal in more details ? refcount_acquire_checked() cannot be used there due to need of reload fdt in the loop.

sys/sys/file.h
289 ↗(On Diff #59741)

After r348594 the blank line is optional.

kib updated this revision to Diff 59899.Jul 18 2019, 9:37 PM

Annotate with __result_use_check.

markj added inline comments.Jul 18 2019, 10:09 PM
sys/kern/kern_descrip.c
2684 ↗(On Diff #59741)

I was thinking of something like:

static inline int
refcount_acquire_if_equal_checked(volatile u_int *countp, u_int *valp)
{
    u_int nval;

    nval = *valp;
    if (nval + 1 < nval)
        return (-1);
    return (atomic_fcmpset(countp, valp, nval + 1));
}

Then return EBADF if the return value is -1. This is all somewhat awkward, but I dislike that we are otherwise manipulating the refcount value outside of the refcount KPI. Anyway, it is just an idea.

sys/kern/uipc_usrreq.c
2219 ↗(On Diff #59899)

This can be leaked.

sys/sys/file.h
287 ↗(On Diff #59899)

Oh, it may not be necessary to add it to fhold() too if __result_use_check is transitive.

kib marked an inline comment as done.Jul 18 2019, 10:25 PM
kib added inline comments.
sys/sys/file.h
287 ↗(On Diff #59899)

I think it is easier to add it to fhold() instead of relying on too much smartness in the compiler.

kib updated this revision to Diff 59904.Jul 18 2019, 10:26 PM

Reorder to avoid leak.

markj accepted this revision.Jul 18 2019, 10:32 PM
This revision is now accepted and ready to land.Jul 18 2019, 10:32 PM
mjg accepted this revision.Jul 18 2019, 10:46 PM
mjg added inline comments.
sys/kern/kern_descrip.c
1912 ↗(On Diff #59904)

this can be done prior to taking the lock to shorten the hold time

sys/sys/refcount.h
67 ↗(On Diff #59904)

these could use prediction hints - the overflow almost never happens and fcmpset almost always succeeds

kib updated this revision to Diff 59912.Jul 19 2019, 12:50 PM
kib marked 2 inline comments as done.

Move code out of locked region.
Add __prefict_true/false.

This revision now requires review to proceed.Jul 19 2019, 12:50 PM
mjg accepted this revision.Jul 19 2019, 12:51 PM
This revision is now accepted and ready to land.Jul 19 2019, 12:51 PM
markj accepted this revision.Jul 19 2019, 3:02 PM
kib updated this revision to Diff 59944.Jul 19 2019, 8:59 PM

Update after r350156.

This revision now requires review to proceed.Jul 19 2019, 8:59 PM
pho added a comment.Jul 21 2019, 9:48 AM

I completed a full stress2 test with D20947.59944.diff on amd64. No problems seen.

This revision was not accepted when it landed; it landed in state Needs Review.Jul 21 2019, 3:07 PM
This revision was automatically updated to reflect the committed changes.