Page MenuHomeFreeBSD

Fix pseudo-inode calculation in FAT12/FAT16
ClosedPublic

Authored by se on Feb 19 2024, 5:01 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, May 11, 4:09 AM
Unknown Object (File)
Sat, May 11, 12:21 AM
Unknown Object (File)
Sat, May 11, 12:20 AM
Unknown Object (File)
Fri, May 10, 2:08 PM
Unknown Object (File)
Fri, Apr 26, 3:33 AM
Unknown Object (File)
Apr 20 2024, 12:06 AM
Unknown Object (File)
Apr 8 2024, 10:38 PM
Unknown Object (File)
Mar 25 2024, 10:42 PM
Subscribers

Details

Summary

! This might be a candidate for FreeBSD-13.3, but I consider it as less urgent than the previous FAT file system fix. !

FAT file systems do not use inodes, instead all file meta-information is stored in directory entries.
FAT12 and FAT16 use a fixed size area for root directories, with typically 512 entries of 32 bytes each (for a total of 16 KB) on hard disk formats.
The file system data is stored in clusters of typically 512 to 4096 bytes. depending on the size of the file system.
The current code uses the offset of a DOS 8.3 style directory entry as a pseudo-inode, which leads to inode values of 0 to 16368 for typical root directory entries.
Sub-directories use 2 cluster length plus the byte offset of the directory entry in the data area for the pseudo-inode, which may be as low as 1024 in case of 512 byte clusters.
This issue is demonstrated by the following test script:

#!/bin/sh

FS=/mnt/MSDOSFS-TEST
MDUNIT=0

cleanup () {
	cd /
	umount /dev/md$MDUNIT
	rm -rf $FS
	fsck_msdosfs -n /dev/md$MDUNIT
	mdconfig -u $MDUNIT -d
}

mdconfig -u $MDUNIT -t malloc -s 4m
newfs_msdos -F 16 -c 1 /dev/md$MDUNIT
mkdir -p $FS
mount -t msdos /dev/md$MDUNIT $FS

trap "cleanup" EXIT

cd $FS
mkdir TESTDIR
touch TESTDIR/TEST

for i in $(jot 33)
do
	touch TEST.$i
done

This script reports an error when writing the 32nd entry and the file system will have been remounted r/o when the 33rd file is to be written.

The 32th file gets a pseudo-inode value of 1024, the same value already assigned to TESTDIR, leading to a directory and a file with colliding inode numbers.
The patch changes the calculation of pseudo-inodes to account for the actual number of directory entries in the root file system and avoids the collision for all supported file-system parameters.

Test Plan

Perform test and observe the following error message:

touch: TEST.32: Bad file descriptor
touch: TEST.33: Read-only file system

Apply the patch and verify that the script succeeds for up to 511 files in the root directory (one directory used for TESTDIR for a total of 512 available root directory entries).
Copying the -CURRENT source tree to a FAT32 file system resulted in 5% less system time, probably due to better a distribution of inodes in the vfs cache (the old code used only inode numbers that are a multiple of 32 as "hash" value).

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

se requested review of this revision.Feb 19 2024, 5:01 PM
se created this revision.

If I understand your change, you are coming up with a different synthetic inode value to ensure that it will be unique. Presumably this value is externally visible as the "st_ino" in the stat structure and the "d_fileno" in the dirent structure. If so, this value will be different after this change so any program that has saved it somewhere could get confused. Not sure if this is really an issue though.

This does fix the problem as described in the summary.

This revision is now accepted and ready to land.Feb 20 2024, 6:01 AM
sys/fs/msdosfs/denode.h
220

Can we simplify this? For instance, we can always use the absolute offset in bytes of the start of the direntry, from the volume start.

If I understand your change, you are coming up with a different synthetic inode value to ensure that it will be unique. Presumably this value is externally visible as the "st_ino" in the stat structure and the "d_fileno" in the dirent structure. If so, this value will be different after this change so any program that has saved it somewhere could get confused. Not sure if this is really an issue though.

I do not expect that FAT16 file systems are exported in such a way that inode stability is required (perhaps by some FUSE file-system), and there is no way to fix the potential inode collision without changing at least part of the pseudo-inode values visible via st_ino or d_fileno.
And since such an inode collision can already occur for 32 DOS 8.3 file names or 11 14-character Win95 long file names, such a situation can easily occur in case of a sub-directory in cluster 2.

What's preventing such collisions from being observed more often in practice is the fact that most file systems will first receive a few regular files in the root directory and only then the first directory entry, and a sub-directory in cluster 32 or higher will be safe to use for any number of sectors per cluster (e.g. 2 files of 8 KB in the root directory would move the directory inode of the first sub-directory beyond the 16 KB margin set by a 512 entry root directory).

This does fix the problem as described in the summary.

Thank you for accepting this review. I'll commit the patch with small white space adjustments.

sys/fs/msdosfs/denode.h
220

Yes, this would be possible, but would not simplify the calculation, IMHO.
The reason is that we then need to calculate the block position within the partition (which is also required to read or write the on-disk directory entry, but not in any function that uses DETOI).

The detobn macro defined in msdosfsmount.h could be used in DETOI, but it would lead to a more complex calculation than proposed in this review, it could somewhat simplify the source code, but not the resulting binary.

The only potential for a simplification (that I see) is to use a constant of 65536 instead of pmp->pm_RootDirEnts, but this variable is likely to be in the 1st-level cache anyway, since it is part of the msdosfsmount structure (and addressed using a short offset relative to the value of pmp, which is already used nearby).