Page MenuHomeFreeBSD

Stop sync restarts on vget failure due to doomed vnode
Needs ReviewPublic

Authored by mjg on Nov 8 2022, 12:04 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Jan 8, 6:59 AM
Unknown Object (File)
Sat, Jan 7, 7:39 AM
Unknown Object (File)
Dec 14 2022, 3:00 AM
Subscribers

Details

Reviewers
kib
Summary
commit a0a4dd9dcb2c947aa0fb46f0f8641b13c8b9c694
Author: Mateusz Guzik <mjg@FreeBSD.org>
Date:   Mon Nov 7 23:43:30 2022 +0000

    Stop sync restarts on vget failure due to doomed vnode
    
    All implementations look copy-pasted from ufs.
    
    Restart was there already in the initial BSD 4.4 Lite import, most
    likely used to damage control broken linkage should the target vnode
    "change identity".
    
    This definitely is not a problem.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

mjg requested review of this revision.Nov 8 2022, 12:04 AM

I do not think it is correct/intended to return from ffs_sync() if we saw and skipped a doomed vnode. It might be still dirty, and some other thread works on finishing syncing it. If we return ignoring the vnode, the VFS_SYNC contract is broken.

but dooming keeps the vnode locked the entire time, thus by that time vget succeds and sees ENOENT, there is nothing to do there.

I also note the loop can trivially race against a just remove vnode.

If you are worried about htel ock be temporarily dropped somewhere, the code can explicitly check for VBAD which is only set at the end.

In D37306#856338, @mjg wrote:

but dooming keeps the vnode locked the entire time, thus by that time vget succeds and sees ENOENT, there is nothing to do there.

I also note the loop can trivially race against a just remove vnode.

If you are worried about htel ock be temporarily dropped somewhere, the code can explicitly check for VBAD which is only set at the end.

The point is that, if you do not wait for unlock, dirty data might not reach the storage when ffs_sync() returns.

In D37306#856371, @kib wrote:
In D37306#856338, @mjg wrote:

but dooming keeps the vnode locked the entire time, thus by that time vget succeds and sees ENOENT, there is nothing to do there.

I also note the loop can trivially race against a just remove vnode.

If you are worried about htel ock be temporarily dropped somewhere, the code can explicitly check for VBAD which is only set at the end.

The point is that, if you do not wait for unlock, dirty data might not reach the storage when ffs_sync() returns.

But the code *is* waiting for unlock with and without the patch. The difference is that the patched variant does not loop when it happens.

Excluding variants which pass LK_NOWAIT, in which case this does not matter.

Yes, I think we need to ensure that doomed vnode is completely reclaimed, before returning from the VFS_SYNC() implementations. Yes, at least UFS reclaim might drop the vnode lock, e.g. due to inactive->truncate->fsync->softdep locking dvp