Stop sync restarts on vget failure due to doomed vnode
AbandonedPublic
Actions

Authored by mjg on Nov 8 2022, 12:04 AM.

Details

Reviewers

Summary

commit a0a4dd9dcb2c947aa0fb46f0f8641b13c8b9c694
Author: Mateusz Guzik <mjg@FreeBSD.org>
Date:   Mon Nov 7 23:43:30 2022 +0000

    Stop sync restarts on vget failure due to doomed vnode
    
    All implementations look copy-pasted from ufs.
    
    Restart was there already in the initial BSD 4.4 Lite import, most
    likely used to damage control broken linkage should the target vnode
    "change identity".
    
    This definitely is not a problem.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

mjg created this revision.Nov 8 2022, 12:04 AM

Herald added a subscriber: imp. · View Herald TranscriptNov 8 2022, 12:04 AM

mjg requested review of this revision.Nov 8 2022, 12:04 AM

I do not think it is correct/intended to return from ffs_sync() if we saw and skipped a doomed vnode. It might be still dirty, and some other thread works on finishing syncing it. If we return ignoring the vnode, the VFS_SYNC contract is broken.

but dooming keeps the vnode locked the entire time, thus by that time vget succeds and sees ENOENT, there is nothing to do there.

I also note the loop can trivially race against a just remove vnode.

If you are worried about htel ock be temporarily dropped somewhere, the code can explicitly check for VBAD which is only set at the end.

In D37306#856338, @mjg wrote:

but dooming keeps the vnode locked the entire time, thus by that time vget succeds and sees ENOENT, there is nothing to do there.

I also note the loop can trivially race against a just remove vnode.

If you are worried about htel ock be temporarily dropped somewhere, the code can explicitly check for VBAD which is only set at the end.

The point is that, if you do not wait for unlock, dirty data might not reach the storage when ffs_sync() returns.

In D37306#856371, @kib wrote:

In D37306#856338, @mjg wrote:

but dooming keeps the vnode locked the entire time, thus by that time vget succeds and sees ENOENT, there is nothing to do there.

I also note the loop can trivially race against a just remove vnode.

If you are worried about htel ock be temporarily dropped somewhere, the code can explicitly check for VBAD which is only set at the end.

The point is that, if you do not wait for unlock, dirty data might not reach the storage when ffs_sync() returns.

But the code *is* waiting for unlock with and without the patch. The difference is that the patched variant does not loop when it happens.

Excluding variants which pass LK_NOWAIT, in which case this does not matter.

Yes, I think we need to ensure that doomed vnode is completely reclaimed, before returning from the VFS_SYNC() implementations. Yes, at least UFS reclaim might drop the vnode lock, e.g. due to inactive->truncate->fsync->softdep locking dvp

mjg abandoned this revision.Mar 12 2024, 10:33 PM

Herald added a subscriber: olce. · View Herald TranscriptMar 12 2024, 10:33 PM