Page MenuHomeFreeBSD

unionfs: Implement VOP_GETLOWVNODE and employ it for basicmount-time deadlock detection
AcceptedPublic

Authored by jah on Sat, Nov 29, 9:29 PM.

Details

Reviewers
olce
kib
markj
pho
Summary

vnode_if.src: fix function name in locking annotation
getwritevnode->getlowvnode

unionfs: Implement VOP_GETLOWVNODE

This function returns the vnode that will be used to resolve the
access type specified in the 'flags' argument, and is useful for
optimal behavior of vn_copy_file_range(). While most filesystems
can simply use the default implementation which returns the passed-
in vnode, unionfs (like nullfs) ideally should resolve the access
request to whichever base layer vnode will be used for the I/O.

For unionfs, write accesses must be resolved through the upper vnode,
while read accesses will be resolved through the upper vnode if
present or the lower vnode otherwise. In the case of write access
when an upper vnode has not yet been created, the unionfs vnode
itself should be treated as the resolving vnode so that unionfs can
handle the required copy when processing the write. Provide a simple
unionfs_getlowvnode() implementation that reflects this policy.

unionfs: detect common deadlock-producing mount misconfigurations

When creating a unionfs mount, it's fairly easy to shoot oneself
in the foot by specifying upper and lower file hierarchies that
resolve back to the same vnodes. This is fairly easy to do if
the sameness is not obvious due to aliasing through nullfs or other
unionfs mounts (as in the associated PR), and will produce either
deadlock or failed locking assertions on any attempt to use the
resulting unionfs mount.

Leverage VOP_GETLOWVNODE() to detect the most common cases of
foot-shooting at mount time and fail the mount with EDEADLK.
This is not meant to be an exhaustive check for all possible
deadlock-producing scenarios, but it is an extremely cheap and
simple approach that, unlike previous proposed fixes, also works
in the presence of nullfs aliases.

PR: 172334
Reported by: ngie, Karlo Miličević <karlo98.m@gmail.com>

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 68981
Build 65864: arc lint + arc unit

Event Timeline

jah requested review of this revision.Sat, Nov 29, 9:29 PM
jah retitled this revision from unionfs: Implement VOP_GETLOWVNODE and employ it for basic mount-time deadlock detection to unionfs: Implement VOP_GETLOWVNODE and employ it for basicmount-time deadlock detection.Sat, Nov 29, 9:30 PM
jah added reviewers: kib, markj, pho.
sys/fs/unionfs/union_vnops.c
3086

I know I don't technically need this since vop_default is specified above, but I prefer to make it clear that we're intentionally using the default implementation.

sys/fs/unionfs/union_vnops.c
2137

Don't you have to do the instantiation of the upper vnode there? Otherwise, vn_copy_file_range() malfunctions.

It is wrong right now without unionfs specialization for the VOP, but having the VOP implemented but wrong is IMO worse.

sys/fs/unionfs/union_vnops.c
2137

See the comment block I added above this.

In fact, I had written an earlier version of the patch which did exactly that, but it ended up being fairly messy. The requirements of calling unionfs_copyfile() here mean that we have to do additional locking and rechecking of the vnode, and we also have to change VOP_GETLOWVNODE() to pass the caller creds so that the upper file can be created correctly. All of that is doable, but rather ugly and IMO not something we should be doing in a fairly simple vnode accessor if we can avoid it.

Instead, the idea here is that if the unionfs vnode has not yet instantiated an upper vnode, then we simply fall back to the default implementation which returns the unionfs vnode itself instead of a base-layer vnode. That is no different that what already happens today, and in that case vn_copy_file_range() should end up falling back to vn_generic_copy_file_range() (Even if both the source and destination vnodes were unionfs vnodes, it would still end up calling vn_generic_copy_file_range() because unionfs does not specialize VOP_COPY_FILE_RANGE). Since the 'outvp' to vn_generic_copy_file_range() will then be a unionfs vnode, the write issued against it will result in the upper vnode being instantiated at that point.

It's possible I missed something, but I think this should work?

I ran the uniofs tests in a loop for 5 hours without observing any issues.

sys/fs/unionfs/union_vnops.c
2137

Actually perhaps I do need to go back to the messier implementation. I'd been thinking that vn_[generic]_copy_file_range() opened the target vnode for write access, which is where the copy-up actually happens in unionfs. But looking back over the code, it looks as though I misremembered and there is no open there.

That said, it looks as though vn_copy_file_range() should only be called in cases in which the target vnode has already been opened for write? If we can guarantee that, then I should be able to do something even simpler here and return EBADF if no upper vnode is present.

sys/fs/unionfs/union_vnops.c
2137

I am not sure that e.g. NFS is required to call VOP_OPEN() before doing block copy. And generally, you are adding the new requirement: VOP_OPEN(for write) must be done before VOP_GETLOWVNODE(FWRITE) is ever called.

I would not object if you enforce the later for unionfs. You might indeed check for upper vnode, and return something. EBADF does not sound right, it is for different situation.

I thought about using v_writecount to assert that open for write was done, but there is no vnode to check on, or when the vnode is there, it is already fine.

sys/fs/unionfs/union_vnops.c
2137

I didn't think the NFS code uses vn_copy_file_range(), does it? That of course doesn't mean that NFS (or something else) might not start making use of VOP_GETLOWVNODE() at some point in the future. Also anything involving the interaction between NFS and unionfs is really its own can of worms since there are quite a few known deficiencies there.

Interestingly, unionfs_write() issues VOP_WRITE() against the lower vnode if the upper vnode is not present. I'm not sure if it does that because of NFS or some other case that doesn't first open the vnode, but it's been that way since at least 2006. That behavior seems quite wrong for a filesystem that's supposed to have CoW semantics, but that is a problem for another day.

My preference would definitely be to just return an error in this case (EACCES?). I did save off the original bigger patch that called unionfs_copyfile(), so it wouldn't be too hard to bring it back if we ended up needing it for some reason.

sys/fs/unionfs/union_vnops.c
2137

I am completely fine with returning an error in this case.

Might be, ENXIO or (somewhat strange but often used for strange purposes EDADLK) look somewhat better for me. But selecting the error code should not be a stopper there.

Return error if no upper vnode present

This revision is now accepted and ready to land.Tue, Dec 2, 1:13 AM