Page MenuHomeFreeBSD

fix the NFSv4 client so that it doesn't call pfind() with a mutex held
ClosedPublic

Authored by rmacklem on Apr 12 2019, 3:00 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Apr 18, 11:50 AM
Unknown Object (File)
Tue, Apr 16, 8:09 AM
Unknown Object (File)
Dec 29 2023, 8:20 AM
Unknown Object (File)
Dec 24 2023, 3:20 AM
Unknown Object (File)
Dec 23 2023, 1:23 AM
Unknown Object (File)
Nov 6 2023, 3:03 PM
Unknown Object (File)
Nov 6 2023, 8:58 AM
Unknown Object (File)
Sep 1 2023, 11:40 PM
Subscribers

Details

Summary

r340744 removed pfind_locked() and replaced the pfind_locked() call in the NFSv4 client with pfind().
This is incorrect, since the call in the NFSv4 client must be done with a mutex held and _pfind() currently
uses an "sx" lock for the pid hash list.

Although mjg@ has suggested other ways to implement this without using any veriant of pfind(), those
suggestions will take time to work out (and may not be practicable). As such, this patch replaces pfind()
with an NFSv4 client custom version that does not do any locking and replaces the acquisition of the "allproc"
lock with the acquisition of all the pidhashtbl_locks.
Since this is only done once/sec, acquiring all the locks shouldn't be a serious performance problem.

Test Plan

I have tested this for some NFSv4 mounts and the cleanup of the lock/open owners seems to be working.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 23641

Event Timeline

sys/fs/nfsclient/nfs_clport.c
1150

This check is an optimization to avoid locking in certain cases. It is harmful for this consumer where the check will always be false and locks are taken anyway.

1154

The process may be a zombie in which case you probably also want to fail to find it.

sys/kern/kern_proc.c
198

If this is to appease WITNESS you can add SX_DUPOK instead. No opinion one way or the other, but most of the time these kind of locks are not numbered.

I would greatly prefer that kern_proc.c provided the services that nfscl used, instead of open-coding. E.g., it would be the 'lock all pidhash chains', and then pfind_locked(). The later can also assert that the hash chain it accesses is locked.

This version of the patch brings in pfind_locked() from stable/12, but with the
sx_assert() changed.
Since the code the inline comments were applied to has been removed from this patch,
I'll try to address mjg@'s comments here:

  • Yes, the "optimization" I left in the code wouldn't have ever happened. I just left it in the code so that it looked more like _pfind(). Since that code isn't in pfind_locked(), it is gone.
  • The old pfind_locked() found zombie processes. I know that works ok and I'm not sure that the code would work correctly if zombie processes aren't found. Some of these open/lock_owners refer to opens, where a child of the zombie process might still have the file open. Delaying the cleanup of it until the children die only delays the garbage collection, so all that happens is a little malloc()d memory doesn't get released as soon. Since freeing the openowner prematurely would break things badly, I'd rather leave it finding zombie processes. (The code should delay the cleanup until all opens are closed, but...)

I didn't replace the "naming" patch with SX_DUPOK, since I thought that allowing duplicates
just to make WITNESS happy seemed like "cheating" to me. However, if you guys think it
is the better way to deal with it, I can easily change the patch.

Ok, the inline comments were still there, so I have just responded to them.

sys/fs/nfsclient/nfs_clport.c
1150

Gone, since the stable/12 pfind_locked() doesn't have it.

1154

As explained in the main comment, the old code (which is what is now
back in kern_proc.c) does fine zombie processes and I think that is safe to do.

sys/kern/kern_proc.c
198

I don't have a strong opinion either, but this way broken code that
does acquire the same lock multiple times will be detected.
(Or is that caught by the check for a recursive lock acquisition anyhow?)
If the only use for SX_DUPOK is to allow the name to be duplicated, then
I should probably change the patch to do that?

I just read the man page (I know, never read the documentation;-) and it seems that
the recursion check is for exclusive locks, so adding SX_DUPOK would allow a process
to acquire a shared lock that it already has.
So, maybe, sticking with the current patch that applies a unique name to each element
is preferred?

I just read the man page (I know, never read the documentation;-) and it seems that
the recursion check is for exclusive locks, so adding SX_DUPOK would allow a process
to acquire a shared lock that it already has.
So, maybe, sticking with the current patch that applies a unique name to each element
is preferred?

I do not think it is a good idea to unload that many lock (names) to WITNESS, it would blow the size of the n^2 tracking table.

sys/fs/nfsclient/nfs_clstate.c
1800

When I mean that the primitives should be encapsulated and provided by kern_proc.c, I also mean the helpers which lock and unlock all pidhash chains.

This version of the patch uses SX_DUPOK instead of a separate name for each
lock. It also adds a couple of small functions to kern_proc.c to acquire/release
all the pid hash locks, so that the NFS code can call those instead of doing it directly.

I have included responses to the inline comments.

kib added inline comments.
sys/kern/kern_proc.c
381

The condition can be written as i <= pidhashlock

This revision is now accepted and ready to land.Apr 13 2019, 9:50 PM

Replied to inline comment.

sys/kern/kern_proc.c
381

I just copied the line in procinit() that initializes them, so that the code is
consistent with what was aready there.

sys/kern/kern_proc.c
398

But now pfind and pfind_locked have different semantics with the latter not returning zombies. At the very least this needs a comment. Or you can rename this to pfind_any_locked and you are set.

pfind_locked() has been renamed pfind_any_locked() and a comment
indicating that it finds zombies has been added, per mjg@'s request.

This revision now requires review to proceed.Apr 14 2019, 8:41 PM

Updated patch renames pfind_locked() to pfind_any_locked(), per mjg@'s request.

This revision is now accepted and ready to land.Apr 14 2019, 9:03 PM
This revision was automatically updated to reflect the committed changes.