Page MenuHomeFreeBSD

fix a use after free in nfscl_cleanupkext()
ClosedPublic

Authored by rmacklem on Feb 22 2022, 12:21 AM.
Tags
None
Referenced Files
F106067569: D34334.id103209.diff
Tue, Dec 24, 8:30 PM
F106067093: D34334.id103129.diff
Tue, Dec 24, 8:19 PM
F106045739: D34334.id103073.diff
Tue, Dec 24, 10:38 AM
Unknown Object (File)
Fri, Dec 20, 7:58 AM
Unknown Object (File)
Fri, Dec 20, 2:07 AM
Unknown Object (File)
Fri, Dec 13, 6:38 AM
Unknown Object (File)
Nov 25 2024, 4:11 AM
Unknown Object (File)
Nov 24 2024, 7:56 AM
Subscribers

Details

Summary

ler@, mark2 reported a use after free in nfscl_cleanupkext().
They also provided two possible causes:

  1. In nfscl_cleanup_common(), "own" is the owner string owp->nfsow_owner. If we free that particular owner structure, than in subsequent comparisons "own" will point to freed memory.
  1. nfscl_cleanup_common() can apparently free more than one owner, so isn't the use of LIST_FOREACH_SAFE() in nfscl_cleanupkext() insufficient? That is, it looks like nfscl_cleanup_common() could free both "owp" and "nowp".

I also believe there is a 3rd, if nfscl_freeopenowner() or nfscl_freelockowner()
is called without the NFSCLSTATE mutex held. This could happen when
the exclusive lock is held on the client, such as when delegations are being
returned.

This patch fixes them as follows:
1 - Copy the owner string to a local variable before the nfscl_cleanup_common() call.
2 - Modify nfscl_cleanup_common() to return whether or not a free was done.

When a free was done, do a goto to restart the loop, instead of using FOREACH_SAFE,
which was not safe in this case.

3 - Acquire the NFSCLSTATE mutex in nfscl_freeopenowner() and nfscl_freelockowner(),

if it not already held. This serializes all of these calls with the ones done in
nfscl_cleanup_common().
Test Plan

I am testing it and monitoring the number of openowners and
lockowners allocated in the client.

Hopefully ler@ can test it for their environment, to see if the
use after free still occurs?

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

sys/fs/nfsclient/nfs_clstate.c
1895

Why is a restart needed in this case?

sys/fs/nfsclient/nfs_clstate.c
1895

It is not. I just thought it would be weird to
return true when freeing open owners, but not
when freeing lock owners.

I can change the function to only return true
when open owners are free'd, if you think that
is appropriate? (And comment it as such.)

markj added inline comments.
sys/fs/nfsclient/nfs_clstate.c
1895

I think it's logically fine as-is, but I do wonder if the extra restarts might make nfscl_cleanupkext() a lot more expensive in the face of a large number of delegations. Consider that nfscl_cleanupkext() holds all of the PID hash locks. I don't have much intuition for whether this is likely to be a problem in practice, though.

Hmm, actually, isn't a restart indeed required for the second loop in nfscl_cleanupkext()? That is, nfscl_cleanup_common() can potentially free multiple lock owners for a given delegation.

This revision is now accepted and ready to land.Feb 22 2022, 3:29 PM

Instead of returning a boolean, nfscl_cleanup_common()
returns flag bits for which of open/lock owner(s) have
been free'd.

This way, the loops only restart when they need to, fixing
the inefficiency caused by returning true for both cases.

As suggested (or at least hinted) by markj@.

This revision now requires review to proceed.Feb 22 2022, 4:57 PM
This revision is now accepted and ready to land.Feb 22 2022, 5:22 PM

cy@ reported via email that he had a problem when
running with the previous patch. He observed the mount
"come to a grinding halt" when under heavy load.
(He was doing "make -j16" builds.)
The CPU was busy, which would have indicated that
the renew thread was very busy.
The "goto tryagain(2)" was only done when an entry
was deleted, so it wasn't exactly an infinite loop, but
it appears that the overhead of repeating the loops
from the beginning was excessive when there were many
open owners.

I do not believe that there should ever be multiple elements
within a list with the same open/lock owner. There can be
multiple elements with the same open/lock owner (which
represents a process on the client), but they should always
be in different lsist.
--> As such, #2 should be ok with the FOREACH_SAFE loops

in nfscl_cleanupkext().

This version of the patch retains the original code that uses
FOREACH_SAFE loops, but has "break"s added, to ensure that
nfscl_cleanup_common() only removes the first element that
matches.

This provides a "safety belt (and optimization) to ensure
that nfscl_cleanup_common() only frees the first matching element.
If there ever is an additional element in the list (due to a bug?), it will
be handled on a subsequent call to nfscl_cleanupkext().

cy@ has confirmed that this version of the patch works fine
for his test case.

The previous patch has been reverted from main.

sys/fs/nfsclient/nfs_clstate.c
1901

These three lines and the last line of the loop could just become a LIST_FOREACH(), I believe.

sys/fs/nfsclient/nfs_clstate.c
1895

Yes, the second restart loop is for lock owners, so a return of "true" is
needed for the free lock owner case.

There is a certain amount of inefficiency caused by a boolean re

1901

Yes, there are still places in the NFS code where FOREACH_SAFE is not used.
(This code is so old, some of it was written before FOREACH_SAFE
existed in all the BSDen I was maintaining a port to.)

I have replaced a bunch of them with FOREACH_SAFE and will
probably do this one with a FOREACH soon (as you noted, with
"break" added, it does not need to be FOREACH_SAFE), since I
noticed this while doing the patch.
However, changing it is just code cleanup, which I would rather
do separately and later. (No need to get this into 13.1.)