Page MenuHomeFreeBSD

Add man page for VOP_FDATASYNC(9)
ClosedPublic

Authored by asomers on Mar 22 2019, 2:18 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 18, 10:00 PM
Unknown Object (File)
Sat, Jan 18, 5:38 PM
Unknown Object (File)
Sat, Jan 18, 5:12 PM
Unknown Object (File)
Fri, Jan 17, 7:14 PM
Unknown Object (File)
Fri, Jan 17, 5:57 PM
Unknown Object (File)
Thu, Jan 9, 2:15 PM
Unknown Object (File)
Dec 23 2024, 2:22 PM
Unknown Object (File)
Nov 30 2024, 9:17 PM
Subscribers

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 23251
Build 22290: arc lint + arc unit

Event Timeline

share/man/man9/VOP_FSYNC.9
46

It should state that both user data and filesystem metadata required to access the user data (e.g., inode itself, all directories entries created for accessing the file, and any other filesystem data) are synced. In other words, it should guarantee that user can open and read the data which was written to the file, after sucessfull fsync(2), regardless of possible abrupt system stop.

82

I do not think that mentioning 'waitfor' is called for. IMO if you remove 'Instead, ' from the start of the next sentence, it would sound better.

88

The vnode should be exclusively locked on entry, and stays locked on return.

share/man/man9/VOP_FSYNC.9
46

I don't think that's 100% true. VOP_FSYNC doesn't require that a file's directory entry is synced. That's why you often have to call fsync(2) on a newly created file's parent directory. I'm not aware of any different requirement for VOP_FDATASYNC.

share/man/man9/VOP_FSYNC.9
46

Of course it is not 100% because not all filesystems implement it. But e.g. UFS in SU mode, and msdosfs do.

After fsync(2), caller expects to be able to find his data later, and if all new dirents leading to the inode are not written out, knowing that the data is somewhere on disk is not useful. This is the only reasonable way to provide the required functionality.

Note that fsyncing dirfd of the directory containing the file name, is typically not enough.

share/man/man9/VOP_FSYNC.9
46

How can it be a requirement if not all filesystems implement it?

After fsync(2), the caller expects to be able to find his data later, but only the caller knows which of possibly many directory entries he'll need to use. Plus, the filesystem may not even be able to find an inode's dirents, depending on the filesystem implementation. That's why the caller should always fsync the containing directory, if the directory entry may have been changed since the last sync.

share/man/man9/VOP_FSYNC.9
46

Caller cannot know what it takes for a directory block to be writeable, so filesystem has to track all that data anyway. SU tracks all the stuff so the directory page write depends on the inode write.

Msosfs writes out whole devvp dirty buffers AFAIR, which takes care of both FAT and directory content.

share/man/man9/VOP_FSYNC.9
46

Ok, so UFS+SU is really good, and msdosfs is really bad. I accept those statements. But that doesn't mean that syncing directory entries is required by VOP_FSYNC; it just means that those filesystems go beyond the minimum requirements, right?

share/man/man9/VOP_FSYNC.9
46

I do not see what makes you think that msdosfs is 'bad'.

Both UFS and msdosfs try to be useful in their handling of fsync(2), and I say that any other filesystem, to have useful fsync(2), should do the same.

share/man/man9/VOP_FSYNC.9
46

I say that msdosfs is bad, because it has to sync potentially many blocks of the FAT. It's "bad" in a performance sense. But that also means that it goes further than fsync(2) strictly requires.

My point is that syncing directory entries, while allowed, isn't a requirement of VOP_FSYNC. Neither fsync(2) nor OpenGroup says nothing about directory entries, and if we make that a requirement, then other filesystems (ZFS and ext2fs, IIRC) will suddenly be non-compliant.

share/man/man9/VOP_FSYNC.9
46

It is not (only) about the directory entry, I am saying that fsync(2) should write out all metadata required to access the data.

And I highly suspect that ZFS is already compliant because otherwise it would require an analog of fsck to find leaks. I suspect that this is implemented by means of transactions.

Ext2 is almost certainly non-compliant.

share/man/man9/VOP_FSYNC.9
46

Ok, that language makes sense. I thought you were asking me to add language specifically about directory entries.

BTW, when ZFS fsync()s it doesn't sync a transaction, because that would require syncing the entire storage pool. Instead, it writes the file's dirty records to the ZIL, a kind of journal. On import, it replays the ZIL. The records' space usage isn't actually recorded in the metaslab until ZIL replay or txg sync, whichever comes first.

BTW, does UFS+SU handle a situation like this?

  1. create and open a file
  2. Make a hardlink to the file in a different directory
  3. Unlink the original directory entry
  4. fsync()
  5. Crash

Would UFS automatically fsync the directory with the link created in step 2?

Add more detail about recoverability

share/man/man9/VOP_FSYNC.9
46

Hardlinking the file increases the inode ref count, which should create a dependency between increment and directory page update where new dirent is located. Similarly, removing the name create a dependency between writing out the directory page and later decrement. Similar dependency chain exists for rename, where the link count is incremented -> new name created -> old name removed -> link count returned back to original value.
This track of dependencies allows fsync() to flush all of them, ensuring that the rename is completed on stable storage before fsync(2) returns.

80

s/and/but not/ ?

a little more explicit about attributes

This revision is now accepted and ready to land.Mar 22 2019, 6:40 PM
This revision was automatically updated to reflect the committed changes.