Page MenuHomeFreeBSD

Add renameat(2) and AT_RENAME_NOREPLACE flag for it
AcceptedPublic

Authored by kib on Thu, Feb 26, 7:17 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Mar 2, 7:25 PM
Unknown Object (File)
Mon, Mar 2, 2:10 PM
Unknown Object (File)
Mon, Mar 2, 11:17 AM
Unknown Object (File)
Mon, Mar 2, 8:13 AM
Unknown Object (File)
Mon, Mar 2, 7:45 AM
Unknown Object (File)
Mon, Mar 2, 4:55 AM
Unknown Object (File)
Mon, Mar 2, 4:44 AM
Unknown Object (File)
Mon, Mar 2, 3:57 AM

Details

Reviewers
brooks
olce
markj

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.Thu, Feb 26, 7:17 PM

Improve cleanup for failed AT_RENAME_NOREPLACE.
List EOPNOTSUPP in the man page.

Syscall bits look good. A few comments.

lib/libsys/rename.2
135 ↗(On Diff #172807)

Or perhaps "call operates identically to"

334–338 ↗(On Diff #172807)

Or alternatively something like "...returned by the renameat system call, the renameat2 system call..."

sys/fs/tmpfs/tmpfs_vnops.c
1024

I'm surprised by ~(AT_RENAME_NOREPLACE) here

kib marked 3 inline comments as done.
kib added a reviewer: markj.

Fix grammar in the man page.
Fix thinko in tmpfs check for the flag.
Add comment explaining the race in kern_renameat() for fs that does not implement RENAME_NOREPLACE.

Add a comment in sys/fcntl.h, same as for other AT_ constants.

Is my understanding correct that it is not supposed to work with ZFS yet?

Is my understanding correct that it is not supposed to work with ZFS yet?

It looks like it. I think the flag could be correctly handled by checking for tzp != NULL in zfs_rename_relock_lookup() or in its caller.

sys/kern/vfs_syscalls.c
3829

I don't quite follow: VOP implementations return EOPNOTSUPP when they do not handle some flag(s). e.g., zfs returns EOPNOTSUPP if any flags are specified, and ufs returns EOPNOTSUPP if any unrecognized flags are passed, and EEXIST if the target exists.

I do not see how EOPNOTSUPP can be returned due to a race.

sys/ufs/ufs/ufs_vnops.c
1297

Why does each VOP need to check this? It seems more natural to assert that no unknown flags are set in vop_rename_pre().

kib marked 2 inline comments as done.Sun, Mar 1, 1:31 PM
kib added inline comments.
sys/kern/vfs_syscalls.c
3829

Suppose that a filesystem does not implement the AT_RENAME_NOREPLACE flag. Next, suppose that the flag is specified and that the check in kern_renameat() sees tvp != NULL. The result is EEXIST. Now, suppose that kern_renameat() sees tvp == NULL. Then the result, for a filesystem that does not implement the flag, is EOPNOTSUPP. So depending on the moment when tvp is created, we might report different error codes from the syscall for such fs.

Also, in tbis case, if filesystem does not add the check for unsupported flags in VOP_RENAME(), the target will be overwritten by the 'from' file. This is user-visible race.

Another point for this design, some renameat2() flags might be too hard to implement in any reasonable scope for some filesystems, for instance, AT_RENAME_EXCHANGE, for UFS (it most likely requires new type of the journal entry, which means a lot of changes to SU code, and to fsck journal reply). Or AT_RENAME_NOREPLACE for zfs, where I am not confident in my reading of zfs code and do not want to spend a lot of time on it.

AT_RENAME flags must be introduced in a way that allows filesystems to refuse to implement them, reliably. Might be the comment is somewhat confusingly formulated, I will try to improve that.

sys/ufs/ufs/ufs_vnops.c
1297

How could the _pre method know if the filesystem below implements specific flag? I do not want to add MNT-level flags, it is too much plumbing for this.

kib marked 2 inline comments as done.

Compiled-only implementation of RENAME_NOREPLACE for zfs.

lib/libsys/rename.2
343 ↗(On Diff #172964)

"A file exists at the path specified by .Fa to ." sounds more right to me.

347 ↗(On Diff #172964)
sys/compat/linux/linux_file.c
855

We still return EINVAL here.

sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
5531

These two conditions can be joined with &&.

sys/kern/vfs_syscalls.c
3829

Suppose the flag is specified and kern_renameat() sees tvp == NULL. Then, a filesystem which does not implement the flag will always return EOPNOTSUPP, even if there is no race. In particular, I don't really understand "while most of the time this check works." Or I am missing something.

sys/kern/vfs_syscalls.c
3829

Let me put it this way: there is a race, yes, but if the filesystem does not support the flag, then renameat2(AT_RENAME_NOREPLACE) will always fail, either with EEXIST or EOPNOTSUPP. Indeed, there is some non-determinism if the syscall races with creation or removal of the target file, but I'm not sure why it is significant if the syscall always fails anyway.

Or, if it is possible that application developers will care, then it should perhaps be mentioned in the man page.

The last revision worked for me on ZFS.

I patched Qt to make use of this syscall and verified that QFile::rename implementation now uses the renameat2 code path. What's even more important is that calling QFile::rename() now results in paired IN_MOVED_FROM and IN_MOVED_TO inotify events instead of disconnected IN_CREATE and IN_DELETE. This is a great improvement!

What's worrying me is how the previous revision failed for ZFS. The syscall returned EINVAL and after that the test program entered an uninterruptible busyloop. It was not possible to stop it with kill -KILL and running cat somefile afterwards also resulted in busy-looping. The system quickly degraded into unusable state. It feels like something in kernel does not properly handle the case when AT_RENAME_NOREPLACE is not implemented for a given FS.

kib marked 6 inline comments as done.Sun, Mar 1, 10:42 PM

The last revision worked for me on ZFS.

I patched Qt to make use of this syscall and verified that QFile::rename implementation now uses the renameat2 code path. What's even more important is that calling QFile::rename() now results in paired IN_MOVED_FROM and IN_MOVED_TO inotify events instead of disconnected IN_CREATE and IN_DELETE. This is a great improvement!

What's worrying me is how the previous revision failed for ZFS. The syscall returned EINVAL and after that the test program entered an uninterruptible busyloop. It was not possible to stop it with kill -KILL and running cat somefile afterwards also resulted in busy-looping. The system quickly degraded into unusable state. It feels like something in kernel does not properly handle the case when AT_RENAME_NOREPLACE is not implemented for a given FS.

Are you sure that it was EINVAL? EINVAL path should not result in such condition.

I found a case where EEXISTS could do it, and hopefully fixed.

sys/kern/vfs_syscalls.c
3829

I tried to add the formulation.

I believe that the app writers do care, because the way to report that 'noreplace' failed is EEXISTS, vs. something else.

kib marked an inline comment as done.

Man page updates.
Do not clear mp and tmp in the case of EEXISTS in kern_renameat(), they must be unbusied/unlocked as needed.

markj added inline comments.
lib/libsys/rename.2
352 ↗(On Diff #172988)

or "does not implement the .Dv AT_RENAME_NOREPLACE flag"

355 ↗(On Diff #172988)
This revision is now accepted and ready to land.Mon, Mar 2, 9:55 AM
kib marked 2 inline comments as done.

Man page: grammar.

This revision now requires review to proceed.Mon, Mar 2, 10:50 AM
markj added inline comments.
lib/libsys/rename.2
128 ↗(On Diff #173000)
This revision is now accepted and ready to land.Wed, Mar 4, 12:56 AM
kib marked an inline comment as done.Wed, Mar 4, 1:15 AM