Page MenuHomeFreeBSD

vnode: move write cluster support data to inodes.
ClosedPublic

Authored by kib on Feb 15 2021, 4:40 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Mar 11, 6:20 PM
Unknown Object (File)
Mon, Mar 11, 6:20 PM
Unknown Object (File)
Mon, Mar 11, 6:20 PM
Unknown Object (File)
Mon, Mar 11, 6:20 PM
Unknown Object (File)
Mon, Mar 11, 6:20 PM
Unknown Object (File)
Mon, Mar 11, 6:20 PM
Unknown Object (File)
Mon, Mar 11, 6:20 PM
Unknown Object (File)
Mon, Mar 11, 6:20 PM

Details

Summary

The data is only needed by filesystems that 1. use buffer cache 2. utilize clustering write support.

ext2fs change to reset clusterw on truncation should be a bug fix.

Remove #define _KERNEL hacks from libprocstat

Make sys/buf.h, sys/pipe.h, sys/fs/devfs/devfs*.h headers usable in userspace, assuming that the consumer has an idea that it is for. Unhide more material from sys/mount.h and sys/ufs/ufs/inode.h, sys/ufs/ufs/ufsmount.h for consumption of userspace tools, with the same caveat.

Remove unacceptable hack from usr.sbin/makefs which relied on sys/buf.h being unusable in userspace, where it override struct buf with its own definition. Instead, provide struct m_buf and struct m_vnode and adapt code to use local variants.

This is the diff of the whole branch, individual commits can be seen at
https://kib.kiev.ua/git/gitweb.cgi?p=deviant3.git;a=shortlog;h=refs/heads/vn_clusterw

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.Feb 15 2021, 4:40 AM
kib created this revision.

The fuse parts look good. I don't know what's going on with ext2, though, and it's not clear from the commit message.

sys/sys/buf.h
519

Odd place for a blank line.

This revision is now accepted and ready to land.Feb 15 2021, 5:30 AM
kib marked an inline comment as done.

cosmetics: fix two blank line/spaces issues.

This revision now requires review to proceed.Feb 15 2021, 5:51 AM

The fuse parts look good. I don't know what's going on with ext2, though, and it's not clear from the commit message.

ext2 missed reset of the cluster gathering data on truncation. Basically, look for added cluster_init_vn() call in ext2_ext_truncate(). I am not sure about fuse.

this does not buildworld, for example:

usr/obj/usr/src/amd64.amd64/tmp/usr/include/sys/buf.h:430:7: error: expression result unused [-Werror,-Wunused-value]
            ("bstrategy: no bop_strategy bp=%p", bp));
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/obj/usr/src/amd64.amd64/tmp/usr/include/sys/buf.h:437:2: error: implicit declaration of function 'KASSERT' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        KASSERT((bp->b_flags & B_IOSTARTED) == 0,
        ^
/usr/obj/usr/src/amd64.amd64/tmp/usr/include/sys/buf.h:438:7: error: expression result unused [-Werror,-Wunused-value]
            ("recursed buf_start %p", bp));

Make userspace compile.

I have no competence in this area. Maybe you wanted to add @fsu to the reviewers? He was working on improving ext2/3/4 support.

kib removed a subscriber: fsu.

Overall looks good. A few inline comments.

sys/fs/ext2fs/inode.h
123

Add comment /* Buffer clustering information */

sys/fs/msdosfs/denode.h
147

Add comment /* Buffer clustering information */

sys/sys/_buf_cluster.h
45 ↗(On Diff #83935)

Why create a whole new header file? We did not do this when clustering was part of vnode.h. Clustering is simply one of the features that is available when using buffers. We already define cluster_init_vn() in buf.h so it seems like the obvious place to put the definition of vn_clusterw.

sys/ufs/ufs/inode.h
119

Add comment /* Buffer clustering information */

kib marked 4 inline comments as done.Feb 17 2021, 8:27 AM
kib added inline comments.
sys/sys/_buf_cluster.h
45 ↗(On Diff #83935)

The cause is that misc. inode.h files are used by userspace by tools like fstat/lsof and similar. buf.h is not prepared to be used in the non-kernel compilation environment, and quick conclusion was that it is too large change for it to be prepared.

First version of the patch did just that, defined vn_clusterw structure in buf.h and included sys/buf.h into inode.h from filesystems, but it was reported that buildworld is broken.

kib marked an inline comment as done.

Add comments.

@kib, you mentioned in SUMMARY:
"ext2fs change to reset clusterw on truncation should be a bug fix."

Could you please provide more information, like bug fix for which bug?

sys/fs/ext2fs/ext2_inode.c
539

If you want to invalidate inode clustering information in all cases of truncation, you need to place this cluster_init_vn() call in the end of ext2_truncate(), I think.
Currently clustering information will be invalidated only in case of extents (IN_E4EXTENTS inode flag).

In D28679#643111, @fsu wrote:

@kib, you mentioned in SUMMARY:
"ext2fs change to reset clusterw on truncation should be a bug fix."

Could you please provide more information, like bug fix for which bug?

i_clusterw tracks information that allows buffer code to eventually combine a contiguous series of the delayed-write vnode buffers into single, up to MAXPHYS, write of the fabricated buffer consisting of the combined runs of the pages from vnode buffers. This is done when cluster_write() is called by fs: cluster_write() checks if the supplied buffer continues the contiguous run of previous calls to cluster_write(), and if yes the buffer is marked for delayed write with bdwrite() while also updating the tracking info in i_clusterw.

Now consider the case when inode is truncated. Then potentially i_clusterw contains the reference to the indexes that are no longer valid. I believe that cluster code in this situation does a lot of unnecessary attempts to get hold of the buffers that would form the recorded cluster, and then it skips several opportunities to start growing new cluster until it completely clears the failing state.

I.e., it is not a functional bug, but a performance issue.

This would be a separate commit from the main part of moving the cluster data to inodes.

Move ext2 cluster_init_vn() into more appropriate place.

Looks good from ext2fs side.

This revision is now accepted and ready to land.Feb 17 2021, 1:47 PM

To avoid the _buf_cluster.h file, how about making the inclusion of buf.h in ufs/inode.h and ext2fs/inode.h conditional on #ifdef KERNEL? Both vfs_cluster.c and fuse_node.h already include buf.h so are not an issue. I don't know if msdosfs/denode.h is used outside the kernel, but if so could make inclusion of buf.h conditional on #ifdef KERNEL.

To avoid the _buf_cluster.h file, how about making the inclusion of buf.h in ufs/inode.h and ext2fs/inode.h conditional on #ifdef KERNEL? Both vfs_cluster.c and fuse_node.h already include buf.h so are not an issue. I don't know if msdosfs/denode.h is used outside the kernel, but if so could make inclusion of buf.h conditional on #ifdef KERNEL.

Note that struct inode embeds struct i_clusterw, not a pointer to it. So avoiding inclusion of sys/buf.h into userspace would still break the compilation, if vn_clusterw is defined in sys/buf.h.

The problem is in lib/libprocstat that wants to read inodes out of the kernel so needs to know their size. So, I agree avoiding _buf_cluster.h is hard. I reluctantly agree with that solution.

Kill sys/_buf_clusterw.h. Instead, make it possible to include sys/buf.h into userspace. Remove #define _KERNEL hacks from libprocstat.

This revision now requires review to proceed.Feb 18 2021, 1:49 PM
kib added a reviewer: emaste.

Getting rid of _buf_cluster.h was a lot more work than I expected, but is definitely cleaner and has the added bonus of cleaning up some other cruft as well.

This revision is now accepted and ready to land.Feb 18 2021, 10:49 PM

I note that lib/libprocstat/libprocstat.c includes ufs/ufs/inode.h but in fact does not need to do so.

As noted by Kirk, ufs includes are not needed for libprocstat.c.

This revision now requires review to proceed.Feb 19 2021, 3:31 PM
This revision is now accepted and ready to land.Feb 20 2021, 1:03 AM

I ran the 13 msdosfs tests I have two times followed by one write-intensive test in a loop for one hour.
No problems seen.

This broke the cross-build bootstrap. It also means the MSDOSFS code now uses the definition from sys/buf.h instead of the local ffs/buf.h.

In file included from /local/scratch/alr48/cheri/freebsd/usr.sbin/makefs/msdos.c:61:
/local/scratch/alr48/cheri/build/freebsd-amd64-build/local/scratch/alr48/cheri/freebsd/amd64.amd64/tmp/legacy/usr/include/fs/msdosfs/denode.h:55:10: fatal error: 'sys/buf.h' file not found
#include <sys/buf.h>

I've posted D28835 as a potential fix.

Note: it's not just Linux/macOS that's affected, also any FreeBSD version prior to this commit after D27619