Details

Reviewers

Commits

rS333576: Detect and optimize reads from the hole on UFS.

Summary

If UFS file is read from the hole, detect it and optimize by avoiding instantiating the buffer, allocating pages and bmap-ing it.

TODO: calculate the number of blocks used by fully populated file of the given size and use it to cheaply avoid bmap.

Suggested by: jeff

Test Plan

Peter, could you, please, test this change ?

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib created this revision.Mar 31 2018, 11:31 AM

Herald added a subscriber: imp. · View Herald TranscriptMar 31 2018, 11:31 AM

Harbormaster completed remote builds in B15907: Diff 40939.Mar 31 2018, 11:31 AM

jeff added inline comments.Mar 31 2018, 1:20 PM

sys/ufs/ffs/ffs_vnops.c
572–576 ↗	(On Diff #40939)	The problem with this is paying the cost of bmap on every read which is very expensive. Ideally we'd bmap before allocating pages but after looking up the buf. Maybe the buffer cache could detect this and we could use a new GB_ flag.

kib added inline comments.Mar 31 2018, 1:36 PM

sys/ufs/ffs/ffs_vnops.c
572–576 ↗	(On Diff #40939)	If indirect blocks are already read, then the cost is mostly the call and the walk over the pointers. I do not see how would the buffer cache detect the hole without doing VOP_BMAP() as well. I thought about adding GB flag and failing bread() with specific error code, but this only changed the place where VOP_BMAP() is done. I will look some more at this. The comment indicates how the normal reads of non-sparse files can be optimized.

For what it's worth, I did test this with my sparse file dd test and we well exceed the performance of linux at this benchmark now that we're using the same technique. Unfortunately it defeats a convenient way to create a lot of paging traffic.

sys/ufs/ffs/ffs_vnops.c
572–576 ↗	(On Diff #40939)	The cost is going to be the getblk() call for every level of indirection. At high data rates getblk()/brelse() consumes a large fraction of the time. bread could see if the data is cached. If not it will have to do bmap in any event. So you could lookup the buf, lookup the pages, if they don't exist, bmap, then vm_page_grab if it is not sparse. If it is sparse, you could create a buf that holds pointers to zero pages which have to be fixed on write. Or you could return an error code.

In D14917#313652, @jeff wrote:

For what it's worth, I did test this with my sparse file dd test and we well exceed the performance of linux at this benchmark now that we're using the same technique. Unfortunately it defeats a convenient way to create a lot of paging traffic.

Are you saying that the patch from the review helps, or we outperform linux in the test already, without the patch ?

This is another variant, which short-circuits ufs_strategy() right after bmap is done. As result, we deallocate the fresh buffer for hole instead of leaving it to buffer daemon, and do not instantiate file blocks.

This version is much slower than the other version but still much faster than head. I think there is some bug though because after a while it started consuming space.

I think if we can save the page allocation by doing bmap earlier as I described and make a uiozero this will be as fast as the first version if not faster.

sys/kern/vfs_bio.c
3892 ↗	(On Diff #40943)	I would prefer we keep with the sparse terminology.
4105 ↗	(On Diff #40943)	Some prefix indicating not to return sparse would be better. NOSPARSE, FAILSPARSE, etc.
sys/ufs/ffs/ffs_vnops.c
474 ↗	(On Diff #40943)	Do we not have a uiozero()? If not we need one. It will save a memory fetch and cache pollution.
622–623 ↗	(On Diff #40943)	This part I like much better overall. No impact to the normal path. We could do the VOP_BMAP before strategy though to avoid even more work. If the filesystem passes in a parameter saying it doesn't want sparse bufs we can assume VOP_BMAP will work and do it right away in getblk(). It saves us from needing the B_FLAG to preserve an argument to getblk(). Ideally this could be done before pages are allocated. That should work.

Rename _HOLE to _NOSPARSE.
Fix bug where vn_io_fault() holds were ignored.

Really commit and regenerate diff for flags renames.

Rework the patch to not even create buffer and to not allocate page for holes, by doing BMAP early. Also, the BMAP result is saved and the block number is not recalculated in the strategy.

We should think about what other filesystems could be trivially converted to this interface.

sys/kern/vfs_bio.c
2149–2152 ↗	(On Diff #41211)	The only thing I don't really like in this version of the patch is the error handling. This is good to test though. We could make an internal version of getblk that returns an error and takes a buf **. It may be useful in other circumstances.
sys/ufs/ffs/ffs_vnops.c
474 ↗	(On Diff #40943)	again is there no uiozero? Can we do that?

swills added a subscriber: swills.Apr 9 2018, 2:07 PM

kib added inline comments.Apr 9 2018, 2:52 PM

sys/ufs/ffs/ffs_vnops.c
474 ↗	(On Diff #40943)	It is easy to implement, but where do you intend to use uio_zero() ? I can think about /dev/zero zero_read() and uiomove_object_page(). Note that ffs_read_hole() is not a suitable user of the function, due to vn_io_fault(). Also, what do you mean by the memory fetch save ?

Add getblkx().

One moment which I am unsure about are the error returns from cluster_read() and bread(). Now that getblkx() can return a wide spectrum of errors, I keep the leaking out of the read functions. I think it is useful to provide the detailed classification of errors from getblkx(). I can translate all of them except EJUSTRETURN back to EBUSY, as it was before, but I would try to leave as is.

I investigated some limited number of callers, and I believe that it is fine. If testing reveals something, I will convert to EBUSY in cluster_read() and breadn_flags().

Do not optimize UIO_NOCOPY. In this case, the pages are already allocated.

kib added a child revision: D15055: Map constant zero page on read faults which touch non-existing anon page..Apr 27 2018, 4:36 PM

kib mentioned this in D15055: Map constant zero page on read faults which touch non-existing anon page..Apr 28 2018, 5:40 PM

It would be nice to implement it in other filesystems that support sparse files.

This revision is now accepted and ready to land.May 11 2018, 9:23 AM

In D14917#324431, @jeff wrote:

It would be nice to implement it in other filesystems that support sparse files.

Which ? The only other significant writeable filesystem which uses buffer cache is msdosfs, where we cannot have a hole on FAT. Support for holes read optimization could be useful for zfs and tmpfs. For tmpfs we already have the short-circuit to avoid instantiate the backing page on read. ZFS is out of my interest.

kib edited the test plan for this revision. (Show Details)May 11 2018, 9:57 AM

kib added a subscriber: pho.

Tested with all of the stress2 tests + a buildworld / installworld.
No problems seen.

Closed by commit rS333576: Detect and optimize reads from the hole on UFS. (authored by kib). · Explain WhyMay 13 2018, 9:50 AM

This revision was automatically updated to reflect the committed changes.

Detect reads from the hole.
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 42471

head/sys/kern/vfs_bio.c

head/sys/kern/vfs_cluster.c

head/sys/sys/buf.h

head/sys/ufs/ffs/ffs_vnops.c

Detect reads from the hole.ClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 42471

head/sys/kern/vfs_bio.c

head/sys/kern/vfs_cluster.c

head/sys/sys/buf.h

head/sys/ufs/ffs/ffs_vnops.c

Detect reads from the hole.
ClosedPublic
Actions

Revision Contents
Changeset List