Page MenuHomeFreeBSD

Align FAT clusters to the cluster size
ClosedPublic

Authored by se on Jun 1 2024, 1:01 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Jul 1, 3:45 AM
Unknown Object (File)
Mon, Jun 24, 4:52 PM
Unknown Object (File)
Mon, Jun 10, 6:05 AM
Unknown Object (File)
Sat, Jun 8, 11:40 AM
Unknown Object (File)
Thu, Jun 6, 11:50 PM
Unknown Object (File)
Tue, Jun 4, 5:31 PM
Unknown Object (File)
Jun 2 2024, 7:54 PM
Unknown Object (File)
Jun 2 2024, 12:28 PM
Subscribers

Details

Summary

FAT clusters that are not aligned to VM pages cause data before and after the requested range to be fetched into the buffer cache. In case of unaligned 64 KB clusters, this will require 68 KB of buffer space, which exceeds the default value of maxbcachebuf, which is 65536 bytes. See PR 277414 for a corresponing bug report.

Typical FAT12 or FAT16 file systems have data clusters starting at odd sector numbers, since there is 1 reserved sector, followed by an even number of FAT sectors and an even number of root directory sectors (assuming a default sector size of 512 bytes and 32 or 512 root directory entries). The odd number of sectors in front of the first data cluster leads to a mis-alignment with respect to buffer cache page boundaries.

This review adds 2 alignments to fix the issue:

  1. The number of directory entries is adjusted to fill full logical sectors (as required according to the FAT specification).
  2. The number of reserved sectors is adjusted to make the start of the first data cluster aligned with the size of a VM page (unless overridden by passing the -Aor -roption). The effect of the -A option has been changed to align the data region (not the root directory) with the cluster size.

The advantages of this alignment are:

  1. Better utilization of the buffer cache, since each cluster is always loaded into the minimum number of buffer pages (instead of needing 1 extra page due to mis-alignment).
  2. Less write amplification on SSDs, SD cards, or USB memory sticks. (The alignment of the end of the root directory is at least as good as the alignment of the start of the root directory, with respect to write amplification effects.)

This review improves the creation and handling of FAT file systems (e.g. UEFI boot partitions) on FreeBSD, but does not deal with mis-alignment of already created partitions or of FAT file systems created on other systems. A different approach (a GEOM module) is supposed to deal with them.

Test Plan

Apply the patch and create file systems (FAT12/16/32) without passing -A or -r.
Write some data to a file on that file system.
Verify that the first data cluster starts at an offset that is a multiple of 0x1000 (for the typical case of a 4 KB VM page).

Repeat with -A and verify that the first data cluster starts at an offset that is a multiple of the cluster size.
In that case, the root directory (if any) will end just before the start of the data area.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

se requested review of this revision.Jun 1 2024, 1:01 PM

I still think that changing msdosfs to use directory blocks owned by dir vnodes (instead of devvp) is the best fix for kernel.

This revision is now accepted and ready to land.Jun 2 2024, 8:10 AM
se retitled this revision from Align FAT clusters to VM pages to Align FAT clusters to the cluster size.
se added reviewers: jrtc27, lattera-gmail.com, cy, imp.

Modify the patch to align the data area to a cluster boundary by default.

This removes any dependency on the VM page size (of both the program and generated file systems).

The alignment will use more reserved sectors for cluster sizes > 4 KB, but those are only useful on large volumes.
The -A option will be accepted, but has no effect. It could be documented as obsolete, but IMHO the proposed change of the man-page is clear enough.

On FAT12/16 the new default behavior differs from the alignment previously provided by "-A", since it adjusts the start of the data area instead of the start of the root directory. Adjusting the the root directory could lead to mis-aligned data clusters, depending on the number of root directory entries.

In D45436#1036630, @kib wrote:

I still think that changing msdosfs to use directory blocks owned by dir vnodes (instead of devvp) is the best fix for kernel.

Has there been some prior work in that direction or is there other information that would help develop such a change?

I do not think, that changing the handling of directory blocks fixes the issue of PR 277414 (mis-alignment between FAT clusters and buffer cache pages), but I guess I'm missing some required context.

In D45436#1038074, @se wrote:
In D45436#1036630, @kib wrote:

I still think that changing msdosfs to use directory blocks owned by dir vnodes (instead of devvp) is the best fix for kernel.

Has there been some prior work in that direction or is there other information that would help develop such a change?

This is the natural way for filesystems to handle directories, instead of what msdosfs does. At least, UFS and nfsclient do this.

I must admit that for nfsclient this is somewhat strange, because nfs server does not return data blocks for directories, but still the decision is good enough even for nfs which does re-parse of the fake dir blocks.

I do not think, that changing the handling of directory blocks fixes the issue of PR 277414 (mis-alignment between FAT clusters and buffer cache pages), but I guess I'm missing some required context.

Why?

msdosfs_bmap() would return the right block number for root directory, and that is it.

This revision is now accepted and ready to land.Wed, Jun 5, 11:14 PM