Page MenuHomeFreeBSD

fix for NFS mount hang on "mntref" caused by r353150
ClosedPublic

Authored by rmacklem on Mar 11 2020, 5:02 AM.
Tags
None
Referenced Files
F106105039: D24022.diff
Wed, Dec 25, 11:49 AM
Unknown Object (File)
Fri, Dec 13, 7:24 PM
Unknown Object (File)
Nov 25 2024, 2:01 AM
Unknown Object (File)
Nov 21 2024, 7:46 AM
Unknown Object (File)
Nov 8 2024, 6:01 AM
Unknown Object (File)
Nov 7 2024, 4:09 PM
Unknown Object (File)
Nov 7 2024, 2:30 AM
Unknown Object (File)
Nov 7 2024, 2:25 AM
Subscribers

Details

Summary

r353150 added mnt_rootvnode and this seems to have broken NFS mounts when the
VFS_STATFS() called just after VFS_MOUNT() returns an error.
Then the code calls VFS_UNMOUNT(), which calls vflush(), which returns EBUSY.
Then the thread get stuck sleeping on "mntref" in vfs_mount_destroy().

This patch seems to fix the problem.

Test Plan

I have run several cycles of a test where the NFS server returns EACCES to the RPC done
in the NFS client's VFS_STATFS().
Without the patch, the mount thread gets stuck sleeping on "mntref".
With the patch, the mount attempts fail as expected.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

pho reported a similar problem some time ago, also with nfs. I think the real fix would fold both statfs and root calls into mount so that the fs can handle it however it sees fit, but that's beyond the scope of this review.

thank you for fixing

This revision is now accepted and ready to land.Mar 11 2020, 5:16 AM

Stuffing VFS_STATFS() inside VFS_MOUNT() would be useful for NFS.
The ideal for NFS is that the nmount(2) syscall never does any RPCs,
since if the NFS server has crashed/network partitioned just as the
nmount(2) syscall is done the result will be a stuck mount_nfs process.

Once the nmount(2) is done, "umount -N" can be used to get rid of
the mount against an unresponsive server, so I think it is preferable to
have the first RPC after nmount(2) is done get stuck on the unresponsive
server than nmount(2).

Of course, since mount_nfs has done RPC(s) on the server, the time
window for failure during nmount(2) is short and, as such, not common
in practice.

As you note, well beyond this bug/patch.