Page MenuHomeFreeBSD

Bail out of corrupt directory entries during boot
AcceptedPublic

Authored by darius-dons.net.au on Jun 16 2015, 11:09 PM.
Tags
None
Referenced Files
F82067781: D2844.diff
Thu, Apr 25, 5:16 AM
Unknown Object (File)
Mar 25 2024, 11:49 PM
Unknown Object (File)
Mar 25 2024, 11:11 PM
Unknown Object (File)
Jan 11 2024, 10:33 PM
Unknown Object (File)
Nov 26 2023, 2:13 PM
Unknown Object (File)
Nov 13 2023, 11:03 AM
Unknown Object (File)
Nov 8 2023, 4:44 AM
Unknown Object (File)
Nov 2 2023, 8:58 AM

Details

Reviewers
mckusick
imp
Summary

At work we see problems with corrupt directory entries some times. Without this change the boot would hang when these were encountered.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

darius-dons.net.au retitled this revision from to Bail out of corrupt directory entries during boot.
darius-dons.net.au updated this object.
darius-dons.net.au edited the test plan for this revision. (Show Details)
darius-dons.net.au set the repository for this revision to rS FreeBSD src repository - subversion.
eadler requested changes to this revision.Jun 16 2015, 11:17 PM
eadler added a reviewer: eadler.
eadler added a subscriber: eadler.

This change seems dangerous to me (based on the comment you added). If it is true that reclen must never equals 0 I'd prefer that this result in a panic. Booting into a hosed filesystem will only get you so far and will cause silent corruption later.

This revision now requires changes to proceed.Jun 16 2015, 11:17 PM

Is this something fsck detects and fixes?

In our case we would fail to find a file on this drive and then fail over to another one, fsck would fix the directory after boot.

eadler edited reviewers, added: mckusick; removed: eadler.

I'm going to withdraw my objection based on some IRC discussion. However, I'd like to see more analysis to determine if reclen == 0 implies certain corruption. It seems that UFS code in kernel will ignore it claiming corruption and fsck will remove the directory, so maybe this isn't terrible if gets skipped during boot.

  • Update for head
  • Apply the same fix to ufs.c

At the micro-level, this is good defensive programming. fsck fixes this stuff, but when you are booting you want to avoid getting wrapped around the axel. Short of moving fsck into the loader, I'm unsure what else we can do about this, honestly.
I do think that kirk's review is mandatory on this one, though.

This revision is now accepted and ready to land.Jul 18 2019, 3:44 PM

I concur that taking evasive action is preferable to looping forever. However, we can do better than just bailing out. The kernel code checks for a bad d_reclen and if found skips up to the beginning of the next block within the directory using this code:

if (dp->d_reclen == 0 || dp->d_reclen >  DIRBLKSIZ - (entryoffsetinblock & (DIRBLKSIZ - 1)))
                entryoffsetinblock += DIRBLKSIZ - (entryoffsetinblock & (DIRBLKSIZ - 1));

I suggest that you adapt this code for your two patches. As a practical matter, many directories contain only one 512-byte block, so the effect will be the same. But the /boot directory is three blocks long and the /boot/kernel directory is 38 blocks long so you have a good chance that the name you seek is not in the corrupted block and this code will still allow you to find it.

Any progress on getting this done?

Any progress on getting this done?

Not yet, although I did find another place where the original fix should be applied (search_directory in stand/libsa/ufsread.c)