Page MenuHomeFreeBSD

VFS: add retry limit and delay for failed recursive unmounts
ClosedPublic

Authored by jah on Aug 7 2021, 6:34 AM.

Details

Summary

A forcible unmount attempt may fail due to a transient condition, but
it may also fail due to some issue in the filesystem implementation
that will indefinitely prevent successful unmount. In such a case,
the retry logic in the recursive unmount facility will cause the
deferred unmount taskqueue to execute constantly.

Avoid this scenario by imposing a retry limit, with a default value
of 10, beyond which the recursive unmount facility will emit a log
message and give up. Additionally, introduce a grace period, with
a default value of 1s, between successive unmount retries on the
same mount. These values can be tuned through the
vfs.deferred_unmount_retries and vfs.deferred_unmount_retry_delay_hz
sysctls, respectively.

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

jah requested review of this revision.Aug 7 2021, 6:34 AM
kib added inline comments.
sys/kern/vfs_mount.c
105

Did you considered adding a node vfs.deferred_unmount and putting retries and delay_hz under it? It might be interesting to put the total number of failed retries for the whole system lifetime there, as well.

2089

This is arguably a separate change.

This revision is now accepted and ready to land.Aug 7 2021, 12:03 PM
sys/kern/vfs_mount.c
105

I didn't consider that, but I like the idea.

2089

I probably should have used PCATCH from the beginning, but it seems even more necessary to avoid an unkillable thread now that we can abandon a recursive unmount attempt, so I decided to do it as part of this change.

Add sysctl node for managing deferred unmount behavior

This revision now requires review to proceed.Aug 8 2021, 2:31 AM
sys/kern/vfs_mount.c
2089

I mean that this should be a separate commit.

Split PCATCH into a separate commit

sys/kern/vfs_mount.c
2089

Yes, I know. I was explaining why I didn't make it a separate commit to begin with.
I've split it into a separate commit locally now, which I guess doesn't show up in Phabricator. It seemed like a good opportunity to learn how to use 'git add -i'.

Clean up error handling logic, properly release the mount on error in the blocking case

This revision is now accepted and ready to land.Aug 10 2021, 10:40 AM

Overall looks good. One suggestion, one possible nit.

sys/kern/vfs_mount.c
108

The tick rate can vary between machines and is generally not known. I suggest that you make this in some time units. I suggest seconds would be appropriate.

1968–1984

I think that this statement should be done inside the MNT_ILOCK(mp);

sys/kern/vfs_mount.c
108

The 'hz' expression guarantees 1 second timeout for taskqueue_enqueue_timeout()

sys/kern/vfs_mount.c
1968–1984

The deferred_unmount thread is the only thread that will update these fields, so they won't need locking or atomics. Perhaps a comment to that effect would be better instead?

(In the dounmount() code below, I do check the retry count while holding a mount interlock, but only because that makes the code to continue the loop slightly cleaner. It also wouldn't be the "right" interlock for synchronization purposes, since the lower mount's interlock is held but the upper mount's retry count is being checked.)

A couple more comments.

sys/kern/vfs_mount.c
108

This is a user-settable variable. If I want to change the default from one second to two seconds, I need to know the hz value to do so. The variable should be in seconds and the variable multiplied by hz where it is used in deferred_unmount_enqueue (where timeout_ticks should be called jjust timeout or perhaps timeout_seconds).

1968–1984

I concur with your argument, though rather than adding an explanation of why bit does not need to be under the lock, it might be simpler to just move it up one line so that it is under the lock (i.e., there is not extra cost since you already take/free the lock).

sys/kern/vfs_mount.c
108

IMO making the variable be an integer number of seconds would be too coarse. I could delineate the timeout in milliseconds and convert to hz, or instead use taskqueue_enqueue_timeout_sbt(), but to be honest both of those seem like overkill. I think that if a user reaches the point of wanting to tweak this variable, it's probably very easy for them to figure out that they should check kern.hz. I can further ease that discovery by mentioning kern.hz in the sysctl description.

1968–1984

I don't think that would buy anything though. Immediately below this line is a similar non-atomic update of a global variable, and it wouldn't make any sense to move that under a per-mount lock. An explanation would still be useful for that line.

Add comment on (lack of) synchronization for counters, clarify sysctl description

This revision now requires review to proceed.Aug 15 2021, 1:07 AM

Sorry for the delay in responding, been vacationing this week. Looks good to go.

This revision is now accepted and ready to land.Aug 20 2021, 12:38 AM