Page MenuHomeFreeBSD

Report CG checksum mismatches.
ClosedPublic

Authored by imp on Jan 12 2018, 3:42 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Nov 6, 11:00 AM
Unknown Object (File)
Wed, Nov 6, 7:23 AM
Unknown Object (File)
Tue, Nov 5, 11:23 AM
Unknown Object (File)
Tue, Nov 5, 11:23 AM
Unknown Object (File)
Sun, Nov 3, 8:41 AM
Unknown Object (File)
Sun, Nov 3, 8:33 AM
Unknown Object (File)
Oct 2 2024, 7:03 AM
Unknown Object (File)
Sep 30 2024, 9:55 PM
Subscribers

Details

Summary

Previously, we'd silently fix CG mismatches. However, we see checksum
mismatch errors at runtime. In investigating them for our use case, we
couldn't find any checksum errors at rest (eg on reboot). Enable that
code for everybody to see if it's just us, or others have the same
issue.

Sponsored by: Netflix

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

sbin/fsck_ffs/pass5.c
188 ↗(On Diff #37868)

not sure if this should be fatal, or just a warning, please advise. pfatal isn't always fatal, so I'm not 100% clear on when it should be used.

FWIW I have disabled SUJ by default in the installer, in rS327890, until the underlying issue with SUJ + CG checksums is addressed.

Using pfatal means that fsck -p (preen mode) will fail causing a reboot to drop into single user mode with a request that you run fsck manually. If you use pwarn, then a message will be generated but the system will continue to multi-user. I believe that pwarn is more appropriate in this case since you are not in danger of corrupting your filesystem when the checksum is wrong.

Given the issues that have been raised by adding the checkhash, I am thinking that it may be better not to enable it by default. If it is (for now) only enabled optionally, then folks that want to test it can do so and the rest of the users do not have to deal with the testing shake-out. I want to add other checkhashes for things like superblocks, inodes and possibly indirect blocks. Once these are working as a whole, then we can expand them to the larger population. In particular, I want to add the ability to detect when a filesystem is run on an earlier kernel so that we will expect the checkhashes to be wrong. I have just one bit to work with here, so I want to defer adding this until I can use it to cover all the checkhashes. But I think that this ability to detect running on earlier kernels is important to have for the wider population of users.

This revision was not accepted when it landed; it landed in state Needs Review.Jan 14 2018, 4:55 PM
This revision was automatically updated to reflect the committed changes.