Page MenuHomeFreeBSD

Add an NFS forced dismount option to umount(8)
ClosedPublic

Authored by rmacklem on Jul 25 2017, 10:34 PM.
Tags
None
Referenced Files
F106792845: D11735.id31233.diff
Sun, Jan 5, 11:54 AM
Unknown Object (File)
Fri, Jan 3, 12:04 PM
Unknown Object (File)
Nov 6 2024, 10:47 AM
Unknown Object (File)
Nov 6 2024, 3:58 AM
Unknown Object (File)
Nov 6 2024, 12:21 AM
Unknown Object (File)
Nov 5 2024, 11:20 PM
Unknown Object (File)
Nov 5 2024, 12:36 AM
Unknown Object (File)
Nov 4 2024, 1:04 PM
Subscribers

Details

Summary

If an NFS mount point is hung due to an unresponsive NFS server, a "umount -f" will unmount
the mount point, if the "umount" executes the NFS VFS_UNMOUNT() call.
Unfortunately, this often won't happen. Typically this fails when a process, such as "df" or
a "umount" without "-f" is hung on the mountpoint while holding a lock, such as the vnode
lock for the mounted-on vnode.

This patch adds a new option to umount called "-N" which does the forced dismount but
bypasses the checking in umount.c (which often gets hung as above) and by doing an nfssvc()
syscall to ensure that any process hung on the mount point fails, so any locks get released.
It can then reliably do the forced dismount.
Unfortunately, since it doesn't do any checking, it only works if the mounted-on path is
specified exactly as it was at mount time (and is stored in mnt_stat.f_mntonname).

Test Plan

It has been tested on mount points when the NFS server is network partitioned, including cases
where a "df" or "umount" without "-f" is hung on the mount.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

sys/fs/nfsclient/nfs_clport.c
1399 ↗(On Diff #31206)

I believe this is unsafe. Other, possibly stuck, unmount process might be active in parallel. In particular, the MNTK_UNMOUNTF flag might be already set. More, the parallel unmount might be unstuck, in which case newnfs_nmcancelreqs() might operate on freed memory since mnt_data is released.

Traditionally, we weed out other unmounts by vfs_busy(9). I understand that using vfs_busy(9) there would contradict the goal of the patch, but what you do is dangerous.

I'm not sure if I see exactly why setting MNTK_UNMOUNTF was unsafe, but I believe you and
will note it was a layering violation. I do believe there was a race between a stuck "umount"
that might continue when a network partition heals and the "umount -N" kernel code, which
assumed that mnt_data would remain valid.

This patch defines a new NFS specific flag NFSMNTP_FORCEDISM, which it sets instead of
MNTK_UNMOUNTF. I also added a handshake between nfs_unmount (VFS_UNMOUNT() for NFS)
and the NFSSVC_FORCEDISM code, so that the mnt_data will remain valid.
(Depending on which code sequence wins the race, the flag NFSMNTP_CANCELRPCS gets set
until the NFSSVC_FORCEDISM code completes and nfs_unmount() waits for that
OR mnt_data is NULL and the NFSSVC_FORCEDISM code does nothing.)

Unfortunately, the patch got a lot bigger, since there are a lot of places it checked
MNTK_UNMOUNTF and that check now needs to also check NFSMNTP_FORCEDISM.

This patch defines a new NFS specific flag NFSMNTP_FORCEDISM, which it sets instead of
MNTK_UNMOUNTF. I also added a handshake between nfs_unmount (VFS_UNMOUNT() for NFS)
and the NFSSVC_FORCEDISM code, so that the mnt_data will remain valid.
(Depending on which code sequence wins the race, the flag NFSMNTP_CANCELRPCS gets set
until the NFSSVC_FORCEDISM code completes and nfs_unmount() waits for that
OR mnt_data is NULL and the NFSSVC_FORCEDISM code does nothing.)

Perhaps. This sounds feasible. I suggest to ask Peter Holm to test the new code for races with the forced unmount.

sys/fs/nfs/nfs_commonkrpc.c
514 ↗(On Diff #31233)

You might define some macro like NFSMNT_UNMOUNTF(nmp) which would incapsulate the flag checks, I even suggest to commit a version which only checks the MNTK_UNMOUNTF for now, in advance. Then the patch for review shrinks back to the original size.

sys/fs/nfsclient/nfs_clvfsops.c
1724 ↗(On Diff #31233)

Do not use PZERO, use either 0 or perhaps PVFS.

As suggested by kib@, I committed a patch that replaces the checks for MNTK_UNMOUNTF
with a macro called NFSCL_FORCEDISM(). As such, this patch shrinks back to about the original
size.
Other than changing PZERO->PVFS, the semantics are the same as the previous one.
(I will be contacting Peter Holms w.r.t. testing.)

Peter Holm has done some testing of umounts and has not found a problem.
(Kostik, can I consider this patch reviewed by you now?)

This revision is now accepted and ready to land.Jul 29 2017, 9:12 AM
This revision was automatically updated to reflect the committed changes.