Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
Fix grammar in the man page.
Fix thinko in tmpfs check for the flag.
Add comment explaining the race in kern_renameat() for fs that does not implement RENAME_NOREPLACE.
It looks like it. I think the flag could be correctly handled by checking for tzp != NULL in zfs_rename_relock_lookup() or in its caller.
| sys/kern/vfs_syscalls.c | ||
|---|---|---|
| 3829 | I don't quite follow: VOP implementations return EOPNOTSUPP when they do not handle some flag(s). e.g., zfs returns EOPNOTSUPP if any flags are specified, and ufs returns EOPNOTSUPP if any unrecognized flags are passed, and EEXIST if the target exists. I do not see how EOPNOTSUPP can be returned due to a race. | |
| sys/ufs/ufs/ufs_vnops.c | ||
| 1297 | Why does each VOP need to check this? It seems more natural to assert that no unknown flags are set in vop_rename_pre(). | |
| sys/kern/vfs_syscalls.c | ||
|---|---|---|
| 3829 | Suppose that a filesystem does not implement the AT_RENAME_NOREPLACE flag. Next, suppose that the flag is specified and that the check in kern_renameat() sees tvp != NULL. The result is EEXIST. Now, suppose that kern_renameat() sees tvp == NULL. Then the result, for a filesystem that does not implement the flag, is EOPNOTSUPP. So depending on the moment when tvp is created, we might report different error codes from the syscall for such fs. Also, in tbis case, if filesystem does not add the check for unsupported flags in VOP_RENAME(), the target will be overwritten by the 'from' file. This is user-visible race. Another point for this design, some renameat2() flags might be too hard to implement in any reasonable scope for some filesystems, for instance, AT_RENAME_EXCHANGE, for UFS (it most likely requires new type of the journal entry, which means a lot of changes to SU code, and to fsck journal reply). Or AT_RENAME_NOREPLACE for zfs, where I am not confident in my reading of zfs code and do not want to spend a lot of time on it. AT_RENAME flags must be introduced in a way that allows filesystems to refuse to implement them, reliably. Might be the comment is somewhat confusingly formulated, I will try to improve that. | |
| sys/ufs/ufs/ufs_vnops.c | ||
| 1297 | How could the _pre method know if the filesystem below implements specific flag? I do not want to add MNT-level flags, it is too much plumbing for this. | |
| lib/libsys/rename.2 | ||
|---|---|---|
| 343 | "A file exists at the path specified by .Fa to ." sounds more right to me. | |
| 347 | ||
| sys/compat/linux/linux_file.c | ||
| 863 | We still return EINVAL here. | |
| sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c | ||
| 5540 | These two conditions can be joined with &&. | |
| sys/kern/vfs_syscalls.c | ||
| 3829 | Suppose the flag is specified and kern_renameat() sees tvp == NULL. Then, a filesystem which does not implement the flag will always return EOPNOTSUPP, even if there is no race. In particular, I don't really understand "while most of the time this check works." Or I am missing something. | |
| sys/kern/vfs_syscalls.c | ||
|---|---|---|
| 3829 | Let me put it this way: there is a race, yes, but if the filesystem does not support the flag, then renameat2(AT_RENAME_NOREPLACE) will always fail, either with EEXIST or EOPNOTSUPP. Indeed, there is some non-determinism if the syscall races with creation or removal of the target file, but I'm not sure why it is significant if the syscall always fails anyway. Or, if it is possible that application developers will care, then it should perhaps be mentioned in the man page. | |
The last revision worked for me on ZFS.
I patched Qt to make use of this syscall and verified that QFile::rename implementation now uses the renameat2 code path. What's even more important is that calling QFile::rename() now results in paired IN_MOVED_FROM and IN_MOVED_TO inotify events instead of disconnected IN_CREATE and IN_DELETE. This is a great improvement!
What's worrying me is how the previous revision failed for ZFS. The syscall returned EINVAL and after that the test program entered an uninterruptible busyloop. It was not possible to stop it with kill -KILL and running cat somefile afterwards also resulted in busy-looping. The system quickly degraded into unusable state. It feels like something in kernel does not properly handle the case when AT_RENAME_NOREPLACE is not implemented for a given FS.
Are you sure that it was EINVAL? EINVAL path should not result in such condition.
I found a case where EEXISTS could do it, and hopefully fixed.
| sys/kern/vfs_syscalls.c | ||
|---|---|---|
| 3829 | I tried to add the formulation. I believe that the app writers do care, because the way to report that 'noreplace' failed is EEXISTS, vs. something else. | |
Man page updates.
Do not clear mp and tmp in the case of EEXISTS in kern_renameat(), they must be unbusied/unlocked as needed.
| lib/libsys/rename.2 | ||
|---|---|---|
| 128 | ||
I retried and it is actually not EINVAL nor EEXISTS - I'm getting EOPNOTSUPP.
I also tried removing the tmp = mp = NULL; line from vfs_syscalls.c, but it didn't fix the problem.
I put my Qt changes to https://github.com/arrowd/freebsd-ports/tree/qt-rename-pufb together with a reproducer program. If you want to try reproducing it yourself:
- make -C devel/qt6-base install
- c++ -fPIC -lQt6Core -L/usr/local/lib/qt6 -I/usr/local/include/qt6 -I/usr/local/include/qt6/QtCore devel/qt6-base/t.cpp
- touch /tmp/a
- ./a.out