At work we see problems with corrupt directory entries some times. Without this change the boot would hang when these were encountered.
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
This change seems dangerous to me (based on the comment you added). If it is true that reclen must never equals 0 I'd prefer that this result in a panic. Booting into a hosed filesystem will only get you so far and will cause silent corruption later.
In our case we would fail to find a file on this drive and then fail over to another one, fsck would fix the directory after boot.
I'm going to withdraw my objection based on some IRC discussion. However, I'd like to see more analysis to determine if reclen == 0 implies certain corruption. It seems that UFS code in kernel will ignore it claiming corruption and fsck will remove the directory, so maybe this isn't terrible if gets skipped during boot.
At the micro-level, this is good defensive programming. fsck fixes this stuff, but when you are booting you want to avoid getting wrapped around the axel. Short of moving fsck into the loader, I'm unsure what else we can do about this, honestly.
I do think that kirk's review is mandatory on this one, though.
I concur that taking evasive action is preferable to looping forever. However, we can do better than just bailing out. The kernel code checks for a bad d_reclen and if found skips up to the beginning of the next block within the directory using this code:
if (dp->d_reclen == 0 || dp->d_reclen > DIRBLKSIZ - (entryoffsetinblock & (DIRBLKSIZ - 1))) entryoffsetinblock += DIRBLKSIZ - (entryoffsetinblock & (DIRBLKSIZ - 1));
I suggest that you adapt this code for your two patches. As a practical matter, many directories contain only one 512-byte block, so the effect will be the same. But the /boot directory is three blocks long and the /boot/kernel directory is 38 blocks long so you have a good chance that the name you seek is not in the corrupted block and this code will still allow you to find it.
Not yet, although I did find another place where the original fix should be applied (search_directory in stand/libsa/ufsread.c)