Page MenuHomeFreeBSD

pseudofs: defer initialization until first mount
ClosedPublic

Authored by kevans on Aug 25 2025, 9:11 PM.
Tags
None
Referenced Files
F131893658: D52156.diff
Sun, Oct 12, 1:00 AM
F131890945: D52156.id.diff
Sun, Oct 12, 12:31 AM
F131840202: D52156.id.diff
Sat, Oct 11, 3:00 PM
F131840184: D52156.id160962.diff
Sat, Oct 11, 2:59 PM
Unknown Object (File)
Sat, Oct 11, 6:35 AM
Unknown Object (File)
Fri, Oct 3, 2:31 AM
Unknown Object (File)
Thu, Oct 2, 4:49 AM
Unknown Object (File)
Wed, Oct 1, 11:52 PM
Subscribers

Details

Summary

Currently, pseudofs all get fully constructed when the module is loaded
and vfs registered, but this is pretty unnecessary. Just loading the
fs doesn't mean that it will be used so we're adding overhead and
risk[0] by fully initializing these at the start, along with committing
resources that may not be used.

Deferring pfs_init() allows us to reduce the risk of simply loading the
module causing problems that are harder to avoid, and existing pseudo
filesystems don't really care: configuration that is context-sensitive
is generally deferred to access-time with PFS_PROCDEP.

[0] Example of such being recent bugs in linsysfs, which caused a panic
as soon as the module was loaded because we're eager to set it up.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

This revision is now accepted and ready to land.Aug 25 2025, 9:28 PM
sys/fs/pseudofs/pseudofs.c
395

Is there something that serializes parallel mounts of pseudofs?

sys/fs/pseudofs/pseudofs.c
395

... d'oh, nope, sorry. I guess I probably want to supply a vfs_init that at least initializes the pi_mutex (which doesn't *have* to belong to pfs_fileno) to coordinate initialization

Backtrace a bit to provide proper synchronization in the face of parallel mount

Existing pi_init() implementations want to sleep, so add a blockcount dance to
maintain existing semantics. blockcount_t doesn't add much overhead to our
pfs_info and already provides the wait semantics that we want.

Continue providing a vfs_init(), but don't proxy it through to the consumer. We
will just use it to initialize the bits we need to successfully orchestrate
deferred initialization.

This revision now requires review to proceed.Aug 26 2025, 12:57 AM

Is there a reason why we cannot just slap an sx in exclusive mode around the pfs_init() part of the VFS_MOUNT()? IMO it is much simpler.

In D52156#1191542, @kib wrote:

Is there a reason why we cannot just slap an sx in exclusive mode around the pfs_init() part of the VFS_MOUNT()? IMO it is much simpler.

Sure- I was mostly looking to avoid too much overhead to pfs_info, but I guess a single sx really isn't that bad.

De-crazify the diff with an sx

sys/fs/pseudofs/pseudofs.c
546

Would it make sense to wrap uninit with pi_mountlock?

sys/fs/pseudofs/pseudofs.c
546

I didn't think so, but I can see now that I'm wrong. My expectation was that vfs_unregister() would have largely prevented problems there, but it seems like it has its own problems.

vfs_unregister() will return EBUSY if vfc_refcount is non-zero, but we add/subtract from the refcount outside of the vfsconf_lock. The check seems a little pointless... vfs_byname() won't return with a reference, so in practice vfs_unregister only defends against an active mount and leaves a large window between vfs_byname{,_kld} and VFS_MOUNT() where the filesystem could be somewhere in the middle of vfs_uninit(). That seems problematic for a filesystem to defend against if it wanted to tear down its locking in vfs_uninit().

sys/fs/pseudofs/pseudofs.c
546

Am I right that pfs_uninit() is called from vfs_unregister()? May be, it is better to call it from the last unmount. The reason, besides the symmetry with pfs_init(), is that it allows to clear the state.

sys/fs/pseudofs/pseudofs.c
546

Right, that would be an obvious move. heh. If I do some refcounting based on VFS_MOUNT()/VFS_UNMOUNT() calls and release when it drops back down to 0, am I missing some less-obvious quirk where there may not be 1:1 VFS_UNMOUNT() calls? (We already reject MNT_UPDATE) I'm mostly asking as I recall getting hit by devfs d_close not being reliable for last-close.

I'll spin the vfs_unregister() problem out into a separate change/thread, it might make sense to see if pho@ would torture the area a bit. A relatively simple reproducer is [0], which races one thread doing load/unload against another thread doing mount/unmount of fdescfs, resulting in:

panic: mtx_lock() of destroyed mutex 0xffffffff83717540 @ /usr/src/sys/fs/fdescfs/fdesc_vnops.c:151
cpuid = 2                                                                                                      
time = 1756264344                                                                                                                                                                                                              
KDB: stack backtrace:                                                                                          
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe006837f840                                 
vpanic() at vpanic+0x136/frame 0xfffffe006837f970                                                                                                                                                                              
panic() at panic+0x43/frame 0xfffffe006837f9d0                                                                 
__mtx_lock_flags() at __mtx_lock_flags+0x12f/frame 0xfffffe006837fa20                                          
fdesc_allocvp() at fdesc_allocvp+0x50/frame 0xfffffe006837fa80                                                 
fdesc_mount() at fdesc_mount+0xc1/frame 0xfffffe006837fac0                                                     
vfs_domount_first() at vfs_domount_first+0x254/frame 0xfffffe006837fc00                                        
vfs_domount() at vfs_domount+0x345/frame 0xfffffe006837fd40                                                                                                                                                                    
vfs_donmount() at vfs_donmount+0x904/frame 0xfffffe006837fdd0                                                                                                                                                                  
sys_nmount() at sys_nmount+0x60/frame 0xfffffe006837fe00                                                                                                                                                                       
amd64_syscall() at amd64_syscall+0x169/frame 0xfffffe006837ff30                                                                                                                                                                
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe006837ff30

[0] https://people.freebsd.org/~kevans/mountrace.c

Take a stab at converting pfs_uninit to last-unmount

This adds a layer of refcounting distinct from vfc_refcount to avoid giving it
per-fs meaning, though maybe this is too paranoid.

sys/fs/pseudofs/pseudofs.c
549

Is this call of pi_uninit() still needed?

sys/fs/pseudofs/pseudofs.h
191–192
193
194
kevans added inline comments.
sys/fs/pseudofs/pseudofs.c
549

Yes- this is the only one left, though maybe I should rename the pfs_init() and pfs_uninit() instead of what I've done here. pfs_init() and pfs_uninit() are called on first mount and last unmount now, and vfs_init/vfs_uninit are implemented by pfs_vfsinit and pfs_vfsuninit. I'm realizing now that's not very intuitive, and these could probably be named something better (pfs_setup and pfs_teardown, maybe?)

sys/fs/pseudofs/pseudofs.c
549

Perhaps. You did not uploaded the new patch.

Incorporate review feedback

This revision is now accepted and ready to land.Aug 29 2025, 3:49 AM