Page MenuHomeFreeBSD

lockf: ensure atomicity of lockf for open(O_CREAT|O_EXCL|O_EXLOCK)
ClosedPublic

Authored by kib on Feb 16 2021, 1:37 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Nov 21, 3:32 PM
Unknown Object (File)
Thu, Nov 21, 1:34 PM
Unknown Object (File)
Fri, Nov 15, 10:46 PM
Unknown Object (File)
Fri, Nov 15, 5:45 PM
Unknown Object (File)
Thu, Nov 14, 7:05 PM
Unknown Object (File)
Thu, Nov 14, 2:27 PM
Unknown Object (File)
Wed, Nov 13, 10:51 PM
Unknown Object (File)
Sun, Nov 10, 12:18 AM
Subscribers

Details

Summary

or EX_SHLOCK. Do it by setting a vnode iflag indicating that the locking exclusive open is in progress, and not allowing F_LOCK request to make a progress until the first open finishes.

NB. Perhaps this should be enabled only for local filesystems.

Test Plan

Peter has a test written to expose the problem.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib requested review of this revision.Feb 16 2021, 1:37 AM
kib created this revision.
sys/kern/vfs_vnops.c
288

So VOP_VPUT_PAIR may update vp, so that ndp->ni_vp != vp. Is it intentional?

331

Why do you use both ndp->ni_vp and vp here?

sys/sys/vnode.h
257
kib marked 3 inline comments as done.Feb 16 2021, 2:45 PM
kib added inline comments.
sys/kern/vfs_vnops.c
288

Except for sillyness in clearing VI_FOPENING that you pointed out, I do not think that code uses ndp->ni_vp at all in this path after assignment to vp.

Other pathes do use ni_vp, but not the path where VOP_VPUT_PAIR() is executed.

Am I missing something?

331

It was thinko from copy/paste.

kib marked 2 inline comments as done.

Fix use of nd_vp instead of vp in clearing of the first_open flag.
Grammar fix in comment.

sys/kern/vfs_default.c
436

So what about filesystems that do not use the default implementation? Should this be done in a vop_advlock_pre() routine?

sys/kern/vfs_vnops.c
288

I think it is ok now, I was just trying to see if VI_FOPENING may be leaked somehow.

kib marked an inline comment as done.Feb 16 2021, 4:02 PM
kib added inline comments.
sys/kern/vfs_default.c
436

No I do not think so. The idea of atomicity for O_EXCL|O_EXLOCK does not make sense for remote filesystems, because we cannot enforce anything on protocol, and they are exactly the filesystems that use non-default vop_advlock implementation.

This is why the comment about local filesystems in the commit message.
I.e. I would rather add the check for MNT_LOCAL and do nothing if not set, but so far it seems to be not needed.

markj added inline comments.
sys/kern/vfs_default.c
436

There is also FUSE. The latter falls back to stdadvlock in some cases, not sure if it coincides exactly with remote vs. local.

This revision is now accepted and ready to land.Feb 16 2021, 4:15 PM

Overall looks good.

sys/kern/vfs_default.c
436

I would suggest either putting a KASSERT that MNT_LOCAL is set or add the MNT_LOCAL test if FUSE may use it on a non-local filesystem.

kib marked an inline comment as done.

Only wait for VI_FOPENING for local filesystems.

This revision now requires review to proceed.Feb 17 2021, 8:29 AM
sys/kern/vfs_default.c
436

Is this an unlocked load?

kib marked an inline comment as done.Feb 17 2021, 3:42 PM
kib added inline comments.
sys/kern/vfs_default.c
436

Yes, it is fine. The flag is never changed after mount.

markj added inline comments.
sys/kern/vfs_default.c
436

I meant, is it guaranteed that the reference to mp itself is safe? Mount points are type-stable at least, but the flags value may be stale, unless I misunderstood. It looks harmless but might deserve a comment, again assuming I did not misunderstand.

This revision is now accepted and ready to land.Feb 17 2021, 4:13 PM
kib marked an inline comment as done.Feb 17 2021, 4:44 PM
kib added inline comments.
sys/kern/vfs_default.c
436

In the worst case of unmount/momental mp reuse, we might see undeserved MNT_LOCAL and wait for the passing open when we should not.

Or we might not see MNT_LOCAL and then at unmount it would happen that lock is obtained while open is still processing. But in this case, open(2) would return a file descriptor referencing reclaimed vnode, otherwise mp cannot be reused. So no future operations with it is possible.