Page MenuHomeFreeBSD

growfs: avoid deadlock between stdout/stderr and ufssuspend
Needs ReviewPublic

Authored by cgull_glup.org on Dec 30 2022, 8:52 AM.
Tags
None
Referenced Files
Unknown Object (File)
Jan 12 2024, 2:52 PM
Unknown Object (File)
Dec 23 2023, 2:00 AM
Unknown Object (File)
Jun 1 2023, 12:46 PM
Unknown Object (File)
Jun 1 2023, 12:43 PM
Subscribers

Details

Reviewers
mckusick
kib
jkim
Summary

Code to mostly prevent this deadlock in kernel has already been
committed as 701b3696; this userland code provides earlier, better
error messages for this problem.

Do not keep UFS suspended while writing new cylinder groups.

Since the new cylinder groups are outside the existing filesystem, UFS
does not need to be suspended while writing them. We can write the
new cylinder groups first, before editing the existing filesystem.

THis has two benefits: Most of the writes (and potential I/O errors)
are done before editing the existing filesystem, and it limits the
maximum suspended time to one cylinder group's worth of I/O. This is
quite significant for filesystems that have a lot of new cylinder
groups to write.

Unfortunately, to get write access to the filesystem's volume, we
still need to suspend UFS via the /dev/ufssuspend interface, even for
blocks beyond the existing filesystem. Do it repeatedly for each
cylinder group. This means the accumulated suspended time remains
the same, or maybe even grows, if filesystem reloads are costly.
Kernel changes in ufssuspend will be needed to remove this lmitations.

Inmprove growfs/ufssuspend to write new blocks without suspending UFS

Like the old code, this prevents unmounts while growfs is running;
this is now technically only required at the very end when growfs
rewrites what used to be the last cg and the superblock, but it
provides a better UX if unmounts are attempted during growfs.

This work is forward and backward compatible: a new growfs will work
with an unchanged kernel, and vice versa.

Test Plan

Actual growfs functionality has been tested by stubbing out
non-deterministic functions and comparing filesystems created by old
vs new growfs, iterating over a range of filesystem sizes that spans
several cylinder groups.

New growfs + unchanged kernel, new kernel + unchanged growfs

Not much testing of error handling or concurrency problems yet

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 48911
Build 45800: arc lint + arc unit

Event Timeline

This code supposedly further avoids growfs deadlocks, and may occasionally be able to grow a UFS filesystem mostly without blocking writes to the affected filesystem.

This code is lightly tested, and I've not yet fully run through style(9) changes. It's also my first time using Phabricator.

Comments welcome!

sbin/growfs/growfs.c
88–90

Oops-- forgotten devel code, will be deleted

177–198

This is moved to run after writing new cylinder groups

1595–1602

Check for deadlock on stdout/stderr by checking filesystem ids

sys/ufs/ffs/ffs_suspend.c
254

ffs_susp_resume() is a refactoring to allow an ioctl to downgrade from a suspended state to a writable state without closing /dev/ufssuspend completely

287

The VFS lock now follows the presence/absence of the cdev privdata, rather than its previous implicit association with the filesystem being suspended.

sys/ufs/ffs/fs.h
904

Naming is Hard. This started as UFSAPPEND, UFSEXTEND is better, but better names will be grafefully adopted

sys/ufs/ffs/ffs_suspend.c
326

Misuse of an unrelated variable-- fortunately I noticed it before having to debug it

What if the filesystem is already suspended at the moment when the UFSEXTEND call is done?

I suggest changing the API. Add some kind of UFSOPEN (or might be UFSSUSPBIND ?) which associates mp with the ufs/suspend file. It would always allocate cdevpriv data for file, and busy the mp. Successful bind allows writes to the unused part of the volume, through the file. Then you can do UFSSUSPEND/UFSRESUME, which are tracked in the cdevpriv, so you do not rely on the ffs_susp_suspended() alone.

Might be add UFSSUSPUNBIND for completeness, which resumes if needed, and unbusies the mp.

sbin/growfs/growfs.c
1816

Why this check is needed?

1826

Same question.

sys/ufs/ffs/ffs_suspend.c
118

Why do you need this local var?

326

mp is NULL there, no?

383

This is the only place in the switch control flow that started do return. As result, it complicates (unneeded) lock scope analysis for ffs_susp_lock, and liveness for mp.