Page MenuHomeFreeBSD

Do not overwrite clean blocks on pageout.
ClosedPublic

Authored by kib on Oct 14 2017, 12:38 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Nov 17, 8:53 AM
Unknown Object (File)
Sep 27 2024, 1:16 PM
Unknown Object (File)
Sep 24 2024, 8:49 AM
Unknown Object (File)
Sep 23 2024, 4:31 AM
Unknown Object (File)
Sep 22 2024, 5:26 PM
Unknown Object (File)
Sep 20 2024, 3:02 PM
Unknown Object (File)
Sep 17 2024, 9:01 PM
Unknown Object (File)
Sep 9 2024, 1:13 AM
Subscribers

Details

Summary

If filesystem block size is less than the page size, it is possible that the page-out run contains partially clean pages. E.g., the chunk of the page might be bdwrite()-ed, or some thread performed bwrite() on a buffer which references a chunk of the paged out page. As result, the assertion added in r319975, which checked that all pages in the run are dirty, does not hold on such filesystems.

One solution is to remove the assert. I do not like it, because we overwrite the valid on-disk content. I cannot provide a scenario where such write would corrupt the file data, but I do not like it on principle. Another, in my opinion proper, solution is to only write parts of the pages still marked dirty. The patch implements this, it skips clean blocks and only writes the dirty block runs.

Note that due to clustering, write one page might clean other pages in the run, so the next write range must be calculated only after the current range is written out.

More, due to a possible invalidation, and the fact that the object lock is dropped and reacquired before the checks, it is possible that the whole page-out pages run appears to consist of only clean pages. For this reason, it is impossible to assert that there is some work for the pageout method to do (i.e. I cannot assert that there is at least one dirty page in the run). But such clearing can only occur due to invalidation, and not due to a parallel write, because we own the vnode lock exclusive.

I did not decided on the following question yet: should the patch keep the existing vnode_generic_putpages() as is, and create special function vnode_smallfs_putpages(), which would be used when bsize < PAGE_SIZE only. The advantage is that less calculations are performed for the typical case of UFS with its fragments always >= PAGE_SIZE, the disadvantage is the signficant code duplication.

Test Plan

Patch was tested by Peter.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 12043

Event Timeline

Update the patch to the current set of the bug fixed.

kib added reviewers: alc, markj.
sys/vm/vnode_pager.c
1392

Missing periods in some of the comments.

1423

How can this case ever be true? To get to this point, we must have seen at least one dirty block.

kib marked 2 inline comments as done.Oct 17 2017, 8:53 AM
kib added inline comments.
sys/vm/vnode_pager.c
1423

I agree with your statement. It was real for Nth iterations of the patch before it current shape.

I replaced the comparision with an assert, but I think that this mean a full retest of the patch.

kib marked an inline comment as done.

Change a condition into the assert. Add terminating dots into the comments.

I can't see any problems with this; just pointed out a couple more nits.

sys/vm/vnode_pager.c
1403

Can't we just set auio.uio_offset = prev_offset immediately before the VOP_WRITE call?

1425

"Getting here" would make more sense.

This revision is now accepted and ready to land.Oct 17 2017, 6:48 PM

Stop incrementing uio_offset.
Fix grammar.

This revision now requires review to proceed.Oct 17 2017, 6:56 PM

It appears to me that we are acquiring a write lock on the object (as opposed to a read lock) just to handle an edge case where we have to call vm_page_clear_dirty().

sys/vm/vnode_pager.c
1215

Alphabetically, poffset should come before prev_offset.

This revision is now accepted and ready to land.Oct 19 2017, 5:07 AM
This revision was automatically updated to reflect the committed changes.