vfs_syscalls: Call VFS_SYNC with MNT_WAIT argument.
Needs RevisionPublic
Actions

Authored by dgr_semihalf.com on Jul 9 2020, 2:01 PM.

Details

Reviewers

mw
imp
kib
mckusick

Summary

Calling sync did not cause the changes in root filesystem to be written
to disk. Changing kern_sync function to call VFS_SYNC with MNT_WAIT
option causes the syscall to wait for I/O operations and fixes
the problem.

The bug could be easily reproduced by writing to a file in root
filesystem, calling sync and then forcefully powering off the machine
(e.g. by pulling out the power cable). After rebooting the newly
written file was absent.

During testing this bug was noticed to only happen on UFS partitions
mounted as rootfs. ZFS was not affected and UFS filesystems mounted
in different directories also did not seem to be affected. Powering
off normally did sync the data properly.

The bug was observed on aarch64 and amd64 but probably also affects
other architectures.

After some further testing it seems that the problem was introduced
somewhere between 8.0 and 9.0, as sync works properly in 8.0
but fails in 9.0.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

dgr_semihalf.com requested review of this revision.Jul 9 2020, 2:01 PM

dgr_semihalf.com created this revision.

sync(2) was never supposed to guarantee write-out of the caches to the storage. If you look at the man page for sync(2), section BUGS, it is written explicitly.

Your change somewhat improves the situation but is definitely not sufficient. An obvious problem is that VFS_SYNC() could be simply not called for any filesystem in the list, if the mount point is under (non-forced) unmount, or if write suspension is performed on it.

Traditional unix incantation was to run sync(8) at least twice, with the reasoning that second sync(2) cannot start until first sync(2) completes (actually it was thrice but I am not aware of any rational explanation for it). With SMP VFS locking, our sync(2) (rather ffs_sync()) no longer blocks waiting for parallel sync to finish.

kib added a reviewer: mckusick.Jul 9 2020, 2:36 PM

kd added a subscriber: kd.Jul 10 2020, 7:10 AM

If data is being written to the filesystem faster than the disk can write it then a sync with MNT_WAIT will never finish. The only safe way to use sync with MNT_WAIT is to first suspend the filesystem to create a finite number of write operations that need to be done.

You can achieve this effect by doing an `umount' on the filesystem. The umount command (at least for UFS) does a suspend and then a sync with MNT_WAIT. If there are any open descriptors on the filesystem, the umount will fail, but all data that was dirty at the time that the umount was initiated will have been written.

As kib noted, the sync command has never promised to have everything written before returning.

This should not be done as written as it can cause a denial of service (infinite loop) as described in my previous comment.

This revision now requires changes to proceed.Jul 13 2020, 11:10 PM

Revision Contents
Changeset List

Path

Size

sys/

kern/

vfs_syscalls.c

2 lines

Diff 74235

View Options

vfs_syscalls: Call VFS_SYNC with MNT_WAIT argument.Needs RevisionPublicActions