Page MenuHomeFreeBSD

msdosfs(5): WIP/PoC exFAT support
Needs ReviewPublic

Authored by cem on Nov 25 2020, 9:32 PM.

Details

Reviewers
delphij
emaste
kib
Summary

Mount read-only works more or less correctly. Files and directories can be
traversed and read.

Contiguous allocations (one notable difference from FAT32) should work
correctly, although it is difficult to generate a test filesystem with any
contiguous files >1 block (fuse-exfat does not do a great job of creating these
files).

Another notable different from classic FAT is that directories do not have
concrete . and .. entries. This means the canonical directory denode "inode
number" or dirclust+offset must be the parent directory's entry in exFAT, not
the self-referential '.' entry that our msdosfs(5) uses for classic FAT.
(exFAT directory entries are also in a different format and opposite order from
VFAT -- the exFAT file entry precedes its name entrie(s).)

Things that don't work or are known gaps:

  • Any sort of mutation or writing (known gaps in inmem -> ondisk, free space finding in directories, etc).
  • Decoding exFAT datetimes. This isn't very hard (some logic is trivally shared with the existing msdosfs time support) but wasn't essential for a proof-of-concept, so I haven't done it yet.
  • '..' lookup -- I don't have any great ideas for implementing this without adding some per-mount cache datastructure with ugly locking. Or having child vnodes hold a ref on their parent and drop it during inactive or something (only works for lookup, not getfh).
  • Input sanitization: forbidden characters in filenames.
  • Some ugly copy-paste (readdir, lookup) where most of the logic is shared with classic FAT. This could be refactored into some shared routine (iterating dirents from some offset, for example).
  • Per exFAT spec, implementations SHALL verify dirent-set checksum; we don't yet.
  • Utilizing on-disk up-case table correctly.
  • geom_label support for exFAT labels. We have the code for this in fstyp(8) already, just needs adaptation for geom. Similarly, for exFAT GUIDs (although we don't have this code in fstyp(8)).
  • Utilize namehash in component lookup (performance-only; not a functional gap).

Copyright: I wrote this implementation from the published Microsoft
specification, and did not copy or reference any other implementation (e.g.,
the GPL-licensed fuse-exfat or GPL-licensed linux kernel implementation).

Patents: My understanding is that the three relevant exFAT patents (contiguous
files, namehash lookup, and the vendor-extension dirent design) are either
already expired or expire soon in 2021 or 2022. I'm not advocating for
enabling this in GENERIC at this time, but mostly because the implementation
still has so many gaps.

Test Plan

Shared in case there is interest.

High-level compare/contrast with VFAT/FAT32 off the top of my head:

Similarities to VFAT/FAT32:

  • Most of the BPB/VBR metadata exists, although in a different header for exFAT. Same concept of clusters-of-sectors, represented the same way (power of 2 for each).
  • Same layout: reserved sectors/headers, FAT region, data region; like FAT32, all directories are in the data region.
  • Vaguely similar directory contents: 32-byte record size; individual files span several entries.
  • FAT chain is basically the same as FAT32, for files allocated with the FAT chain.

Differences:

  • There is an allocation bitmap on-disk and it is referenced from a special directory entry in the root directory.
  • There is also a defined up-case table on-disk.
  • No . and .. directory entries
  • The dirent struct is completely different.
  • VFAT lays out long file names as last-to-first, then inode-like FAT32 8.3 slot. exFAT instead allocates 3 or more slots per file: inode-like File slot, inode-like Stream Extension slot, and then 1 or more File Name slots. Both FreeBSD implementations use the offset of the inode-like File as the canonical inode number for files and directories.
  • exFAT files and directories can be allocated contiguously on disk. If the file contents occupy a single contiguous extent, the File entry is marked NoFatChain, and logical-to-physical mapping for that file is greatly simplified.
  • exFAT files can be preallocated; file size (used) and file size (allocated) are tracked separately.
  • exFAT files can be >4GB (64-bit size).

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 35042
Build 32022: arc lint + arc unit

Event Timeline

cem requested review of this revision.Nov 25 2020, 9:32 PM

Two meta questions:

  • is there some readable diff between FAT and exFAT specs somewhere ?
  • could you arrange for two modules, one msdosfs.ko as it is now (MSDOS_EXFAT disabled) and another module e.g. msdosexfs.ko with MSDOS_EXFAT enabled.
sys/conf/options
286

It is too rude to put this into opt_global.h, also see my further response.

Unfortunately, no readable diff between specifications as far as I know. (I don't know of any great VFAT specification off-hand.) exFAT is relatively well defined at https://docs.microsoft.com/en-us/windows/win32/fileio/exfat-specification .

Sure, someone could make a separate kld for exFAT, but I'm not sure it makes much sense. There is a good amount of shared logic. The MSDOS_EXFAT option does not break classic FAT support, FWIW. If enabled, msdosfs(5) supports both kind of filesystem; they are easy to distinguish at mount.

In D27376#611538, @cem wrote:

Sure, someone could make a separate kld for exFAT, but I'm not sure it makes much sense. There is a good amount of shared logic. The MSDOS_EXFAT option does not break classic FAT support, FWIW. If enabled, msdosfs(5) supports both kind of filesystem; they are easy to distinguish at mount.

I mean, either MSDOS_EXFAT option should go out, or for testing period I propose to build two modules from the same code base. I do not propose to copy sources. Second module would have -DMSDOS_EXFAT added to CFLAGS, and perhaps should rename VFS_SET() so that resulting vfsconf is named differently and can be used simultaneously with msdosfs.ko.

In D27376#611605, @kib wrote:

I mean, either MSDOS_EXFAT option should go out, or for testing period I propose to build two modules from the same code base. I do not propose to copy sources. Second module would have -DMSDOS_EXFAT added to CFLAGS, and perhaps should rename VFS_SET() so that resulting vfsconf is named differently and can be used simultaneously with msdosfs.ko.

So, in this proposal we would build most sources twice. In the existing design, msdos and exfat filesystems can both be used simultaneously from the same module/kernel. I don't object to the change, but I don't understand what the goal is. I think there is a lot of lower-hanging fruit that needs to be fixed before this is committed anyway.

In D27376#611538, @cem wrote:

So, in this proposal we would build most sources twice. In the existing design, msdos and exfat filesystems can both be used simultaneously from the same module/kernel. I don't object to the change, but I don't understand what the goal is. I think there is a lot of lower-hanging fruit that needs to be fixed before this is committed anyway.

Goal is to give/show it to users earlier, while not degrading existing old FAT support while continuing the development. E.g. the issue with the (virtually) contiguous bitmap alloc makes it unfeasible to allow any form of automatic mounting of exFAT with the current patch IMO. But on the other hand I do not see why don't you want to get reports of compat issues with real-world filesystems.

At the moment where you feel that the code is mature enough, I expect that the exFAT module will be removed together with MSDOS_EXFAT option.

sys/fs/msdosfs/msdosfs_fat.c
1063

What is the max size for the exfat bitmap ? From the wikipedia article, max vol size is at least 512TB, i.e. TB of bits. We cannot do it the same was as for normal fat, because it is too large.

For normal FAT bitmap is also up to 4M. So may be do an independent change to switch to use chunked bitmap/directly read from buffer cache for bitmap data.

sys/fs/msdosfs/msdosfs_vnops.c
1823

If you put something like if (bp != NULL) brelse(bp); there and at the start of the loop above out:, then all brelse() inside the loop can be eliminated.

This function should be refactored. For instance, faking the dot and dotdot entries can be moved to helper. I suspect that code to handle specific types of the dirents also can be split into functions, but this would require more planning.

In D27376#614415, @kib wrote:

Goal is to give/show it to users earlier, while not degrading existing old FAT support while continuing the development. E.g. the issue with the (virtually) contiguous bitmap alloc makes it unfeasible to allow any form of automatic mounting of exFAT with the current patch IMO. But on the other hand I do not see why don't you want to get reports of compat issues with real-world filesystems.

At the moment where you feel that the code is mature enough, I expect that the exFAT module will be removed together with MSDOS_EXFAT option.

Ok, I'll try to separate it into two modules for now. Thanks.

sys/fs/msdosfs/msdosfs_fat.c
1063

While the volume size can represent up to 2^64, the bitmap size reflects the number of clusters, which is limited to a maximum of roughly 2^32. So, the maximum bitmap size is 4 Gb, or 512 MB. This is a factor of 2^4 bigger than the maximal FAT32 in-use bitmap (2^28 clusters), which was added in a 1994 import from NetBSD.

Because of the cluster size constraints, the smallest volume that could require a 512MB bitmap is a ~2TB volume with 512B sectors and clusters, and a 16 GB FAT. With 4Kn, the smallest volume requiring a 512MB bitmap is ~16TB.

As most exFAT volumes are a lot smaller than 2TB (and I believe it is common practice to use larger cluster sizes on large volumes), and RAM sizes have grown by at least 16x since 1994, I think it might be reasonable to keep the entire exFAT bitmap in main memory, at least as a first draft. As a future improvement (for both exFAT and FAT32), we could consider doing some form of run-length encoding to save space without hurting lookup or allocation performance.

(The minimum exFAT cluster size is 1 sector; sector size is 512 bytes minimum, or 4 kB on 4Kn. The maximum exFAT cluster size is 32 MB. (link))

sys/fs/msdosfs/msdosfs_vnops.c
1823

The ugly style (bp / brelse included) was just copied verbatim from msdosfs_readdir. I agree it should be possible to refactor quite a lot of the shared logic between these two routines out to shared subroutine(s), so I'd prefer to tackle that before making stylistic cleanups to this copy.

Do you have any thoughts on how to fake dotdot entries?

sys/fs/msdosfs/msdosfs_vnops.c
1823

What do you mean, by faking dotdot ? You mean, how to get the parent directory FAT/inode number since exFAT does not store dotdot ?

I do not see a good way. Labor-intensive approach would be to store traces of directory inodes on lookups, and maintain them on rename. Unfortunately this is not enough since fhopen{.,at}() avoids lookup.

sys/fs/msdosfs/msdosfs_vnops.c
1823

Yes, exactly that problem. The structure could probably be relatively compact:

struct exfat_dotdot_entry {
  SLIST_ENTRY()   sle;
  uint64_t    inum;
  struct exfat_dotdot_entry *dotdot;
};

Dotdot lookup can be achieved with dotdot->inum and then inum based resolution, if more is needed. Locking would be some global SX or RW lock on the mountpoint. The lifetimes would not correspond to vnode lifetimes, so the list must be used to free memory on unmount. We probably also want to remove deleted directories. We don't need parent->child resolution because we can get that from the on-disk structure.

We can probably live with EOPNOTSUPP for VFS_FHTOVP. It doesn't seem to be widely used by applications, especially typical end user USB stick or SD card activities exFAT is most commonly used for.

sys/fs/msdosfs/msdosfs_vnops.c
1823

Man consumer of fhtovp() is NFS server.

BTW another approach could be to only provide dotdot entries on the best effort basis. Then you can utilize namecache instead of what we discussed above.