Page MenuHomeFreeBSD

Check and avoid overflow when incrementing fp->f_count in fget_unlocked() and fhold().
ClosedPublic

Authored by kib on Jul 13 2019, 9:08 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mar 14 2024, 7:07 AM
Unknown Object (File)
Dec 20 2023, 12:55 AM
Unknown Object (File)
Nov 28 2023, 8:17 AM
Unknown Object (File)
Nov 11 2023, 2:36 PM
Unknown Object (File)
Nov 10 2023, 11:07 AM
Unknown Object (File)
Nov 9 2023, 6:00 PM
Unknown Object (File)
Nov 7 2023, 6:00 PM
Unknown Object (File)
Oct 10 2023, 1:35 PM
Subscribers

Details

Summary

I believe that f_count can be legitimately very large, e.g. malicious code can dup same fd up to the per-process filedescriptors limit, and then fork as much as it can. On some machine, I see
kern.maxfilesperproc: 939132
kern.maxprocperuid: 34203
which already overflows u_int. More, the malicious code can create transient references by sending fds over unix sockets.

I realized that this check is missed after reading https://secfault-security.com/blog/FreeBSD-SA-1902.fd.html

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 25419

Event Timeline

sys/kern/kern_descrip.c
2672

This is obviously unrelated style cleanup.

I don't think this is the right fix. There is other code bumping the count (with fhold()) and that is still unchecked.

Looks like the type needs to be grown to an unsigned long instead. size-wise the struct already exceeds one cache line and I failed to find a way to shrink it.

Not sure what to do about refcount_* stuff. perhaps it should grow a refcount_long (or ref_long?) variant.

In D20947#454095, @mjg wrote:

I don't think this is the right fix. There is other code bumping the count (with fhold()) and that is still unchecked.

Indeed.

Looks like the type needs to be grown to an unsigned long instead. size-wise the struct already exceeds one cache line and I failed to find a way to shrink it.

Long would not help on ILP32 arches.

Not sure what to do about refcount_* stuff. perhaps it should grow a refcount_long (or ref_long?) variant.

I think we should add refcount_acquire_checked() and propagate the error up.

In D20947#454096, @kib wrote:
In D20947#454095, @mjg wrote:

Looks like the type needs to be grown to an unsigned long instead. size-wise the struct already exceeds one cache line and I failed to find a way to shrink it.

Long would not help on ILP32 arches.

Do they need this to be 64 bit in their case though? Their effective limits may be small enough not to get into trouble.

Not sure what to do about refcount_* stuff. perhaps it should grow a refcount_long (or ref_long?) variant.

I think we should add refcount_acquire_checked() and propagate the error up.

I think wider refcounts (alongside ints) are inevitable and I don't think it's a problem to start introducing them now. We will eventually run into a place which needs to bump the count and where error recovery is impossible.

It seems that there are two related problems:

  1. Refcount overflows triggered by a kernel bug, such as the one dissected in the link.
  2. Refcount overflows triggered by malicious or buggy userspace, for instance by allocating 2^32 descriptors. (Note, at present this requires sizeof(filedescent) * (1 << 32) = 206GB of memory on amd64, including for descriptors internalized in unix socket buffers.)

I believe that we should unconditionally mitigate 1), either by widening the refcount type or by checking for signed overflow in all refcount operations like Linux does on x86. It is difficult to know whether a given refcount overflow is exploitable or not, so it seems prudent to accept the small cost of checking all operations and panic if an overflow is detected. 2) should not be permitted to trigger a kernel panic, so that suggests that we have a separate KPI that allows the caller to handle the overflow policy. I would therefore propose having refcount.h intrinsics panic on overflow conditions, and provide a bool refcount_acquire_notchecked(u_int *); function that allows the caller to detect overflow and do whatever they wish to mitigate the problem.

32-bit refcounts are used in some frequently allocated structures like struct file, vm_object, vm_page, etc., so I don't really like widening the field just to cope with kernel bugs.

Handle overflow in fhold().

kib retitled this revision from Check and avoid overflow in fget_unlocked(). to Check and avoid overflow when incrementing fp->f_count in fget_unlocked() and fhold()..Jul 14 2019, 6:15 PM

It seems that there are two related problems:

  1. Refcount overflows triggered by a kernel bug, such as the one dissected in the link.
  2. Refcount overflows triggered by malicious or buggy userspace, for instance by allocating 2^32 descriptors. (Note, at present this requires sizeof(filedescent) * (1 << 32) = 206GB of memory on amd64, including for descriptors internalized in unix socket buffers.)

I believe that we should unconditionally mitigate 1), either by widening the refcount type or by checking for signed overflow in all refcount operations like Linux does on x86. It is difficult to know whether a given refcount overflow is exploitable or not, so it seems prudent to accept the small cost of checking all operations and panic if an overflow is detected. 2) should not be permitted to trigger a kernel panic, so that suggests that we have a separate KPI that allows the caller to handle the overflow policy. I would therefore propose having refcount.h intrinsics panic on overflow conditions, and provide a bool refcount_acquire_notchecked(u_int *); function that allows the caller to detect overflow and do whatever they wish to mitigate the problem.

In principle I agree, but let's postpone the generic refcounting KPI change to later pass.

I already named the function refcount_acquire_checked().

32-bit refcounts are used in some frequently allocated structures like struct file, vm_object, vm_page, etc., so I don't really like widening the field just to cope with kernel bugs.

I agree, and even if we wanted to expand refcount to uint64_t (long cannot be used on ILP32), not all 32bit arches provide 64bit atomics still.

Note I did not propose converting refcount to long (or 64-bit). I proposed adding a long (or 64-bit) variant.

My main point is that sooner or later there will be a case where a 32-bit counter can overflow and one of places which bump it can't do error recovery. So if playing with refcounts because of struct file, perhaps the extra variant should be added instead of error recovery in several places. The need to switch the type for files will show up sooner than later and widening the struct is not substantial.

In D20947#454166, @mjg wrote:

Note I did not propose converting refcount to long (or 64-bit). I proposed adding a long (or 64-bit) variant.

My main point is that sooner or later there will be a case where a 32-bit counter can overflow and one of places which bump it can't do error recovery. So if playing with refcounts because of struct file, perhaps the extra variant should be added instead of error recovery in several places. The need to switch the type for files will show up sooner than later and widening the struct is not substantial.

And I still do not understand how would it work or help. long == int on ILP32, so changing f_count to long is not too helpful even if we take into the account Mark' note about filedesc size. Then, changing to uint64_t does not work because not all arches have 64bit atomics.

Architectures problematic in this regard definitely don't have anywhere near enough resources for a 32-bit overflow to be realistic, so they can stick to that size.

Note I don't think this is blocking. The new patch works for me if the above concerns can't be easily resolved (although would be nice to grow something like __must_check for the new primitive).

IMO fhold() and refcount_acquire_checked() should have the __result_use_check annotation.

sys/kern/kern_descrip.c
2682–2683

Should we add a refcount KPI for this check+atomic operation?

sys/sys/file.h
289

Style, missing newline.

sys/kern/kern_descrip.c
2682–2683

Could you explain your proposal in more details ? refcount_acquire_checked() cannot be used there due to need of reload fdt in the loop.

sys/sys/file.h
289

After r348594 the blank line is optional.

Annotate with __result_use_check.

sys/kern/kern_descrip.c
2682–2683

I was thinking of something like:

static inline int
refcount_acquire_if_equal_checked(volatile u_int *countp, u_int *valp)
{
    u_int nval;

    nval = *valp;
    if (nval + 1 < nval)
        return (-1);
    return (atomic_fcmpset(countp, valp, nval + 1));
}

Then return EBADF if the return value is -1. This is all somewhat awkward, but I dislike that we are otherwise manipulating the refcount value outside of the refcount KPI. Anyway, it is just an idea.

sys/kern/uipc_usrreq.c
2232

This can be leaked.

sys/sys/file.h
287–292

Oh, it may not be necessary to add it to fhold() too if __result_use_check is transitive.

kib marked an inline comment as done.Jul 18 2019, 10:25 PM
kib added inline comments.
sys/sys/file.h
287–292

I think it is easier to add it to fhold() instead of relying on too much smartness in the compiler.

This revision is now accepted and ready to land.Jul 18 2019, 10:32 PM
mjg added inline comments.
sys/kern/kern_descrip.c
1914

this can be done prior to taking the lock to shorten the hold time

sys/sys/refcount.h
67

these could use prediction hints - the overflow almost never happens and fcmpset almost always succeeds

kib marked 2 inline comments as done.

Move code out of locked region.
Add __prefict_true/false.

This revision now requires review to proceed.Jul 19 2019, 12:50 PM
This revision is now accepted and ready to land.Jul 19 2019, 12:51 PM
This revision now requires review to proceed.Jul 19 2019, 8:59 PM

I completed a full stress2 test with D20947.59944.diff on amd64. No problems seen.

This revision was not accepted when it landed; it landed in state Needs Review.Jul 21 2019, 3:07 PM
This revision was automatically updated to reflect the committed changes.