shutdown: audit shutdown_post_sync event callbacks
ClosedPublic
Actions

Authored by mhorne on Oct 23 2023, 8:32 PM.

Details

Reviewers

markj
jhb

Commits

rG121656c2b12f: pst: improve shutdown_post_sync handler
rG951d60ee3ada: shutdown: audit shutdown_post_sync event callbacks
rGc4dacfa7f4b8: pst: improve shutdown_post_sync handler
rG428ebb7cd9f5: pst: improve shutdown_post_sync handler
rG4eb861d362d6: shutdown: audit shutdown_post_sync event callbacks

Summary

Ensure they are all panic/debugger safe.

Most handlers for this event are for disk drivers/geom modules. There
are a mix of checks being used here (or not), so let's standardize on
checking the presence of the RB_NOSYNC flag.

This flag is set whenever:

The kernel has panicked and kern.sync_on_panic=0*
We reboot from within the kernel debugger (the "reset" command)
Userspace requested it, e.g. by 'reboot -n'

Name the functions consistently.

*This sysctl is tuned to zero by default, but its existence means that
these handlers can be executed after a panic, at the user's discretion.
IMO this use-case is implicitly understood to be risky, and we'd be
better off eliminating it altogether.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

mhorne created this revision.Oct 23 2023, 8:32 PM

Herald added a subscriber: imp. · View Herald TranscriptOct 23 2023, 8:32 PM

mhorne requested review of this revision.Oct 23 2023, 8:32 PM

Harbormaster completed remote builds in B54138: Diff 129259.Oct 23 2023, 8:32 PM

mhorne added a parent revision: D42336: pst-raid.c: sort #includes.Oct 23 2023, 8:32 PM

mhorne added a child revision: D42338: aac, aacraid: use DEVICE_SHUTDOWN() method.

The changes that follow this are basically the result of a similar audit for the shutdown_pre_sync and shutdown_final handlers. Since they are numerous, I split them into individual changes.

I'm not too sure about the geom ones either, but didn't flag the ones where new RB_NOSYNC was added.
Might want to add phk to the list of reviewers.
But I think we'd still want to write the updated metadata for these raid systems so we don't come up with, for example, broken mirrors after a panic all the time.

sys/dev/pst/pst-raid.c
187 ↗	(On Diff #129259)	I'm pretty sure this is wrong. Shutting down the controller is different than doing a sync(2). It's needed to prevent data loss I believe as well as put the hardware in a known good state for reboot (including things that might keep it from rebooting, though I don't know if that's true for this).

In D42337#966249, @imp wrote:

I'm not too sure about the geom ones either, but didn't flag the ones where new RB_NOSYNC was added.

We had to make similar changes for at least g_mirror, i.e., we need to avoid doing anything after a panic. In general we can't sync metadata to disk after a panic, since that usually requires a working scheduler.

Might want to add phk to the list of reviewers.
But I think we'd still want to write the updated metadata for these raid systems so we don't come up with, for example, broken mirrors after a panic all the time.

What exactly is the problematic scenario? It would still arise after, e.g., a triple fault.

sys/dev/pst/pst-raid.c
187 ↗	(On Diff #129259)	iop_queue_wait_msg() will sleep in this case, so I'm not sure how this can work.

Echoing what Mark notes, in looking at each of these geom cases I observe that their code-paths rely on a functioning scheduler. Calls to wakeup() when the scheduler is stopped will panic recursively, and aside from this there are loops which would never terminate as they wait for the worker threads to exit.

See for example g_journal_destroy() or g_raid3_destroy().

This change at least makes the various geom classes behave consistently w.r.t. these shutdown hooks and RB_NOSYNC.

The geom ones are fine.
I'd be tempted to skip the wait for the threads code be skipped when the scheduler is stopped explicitly rather than the rb_nosync flag being set in boot howtoo. The latter seems a big magic... but that's likely my experience with nvme and the paths i took there... But I'm struggling to articulate which tests to use if we were to document this more explicitly somewhere.

What exactly is the problematic scenario? It would still arise after, e.g., a triple fault.

Yes... but I don't want to turn a simple panic when we have writes in flight for some but not all of the mirrors... though that likely requires more careful thought... and maybe adding a BIO_POLLED flag to tell the lower layers to complete the IO in an envelope without the scheduler... as a generalization of dump... but at $WORK we are moving away from gmirror so I'm not super fussy about this assuming we don't get an automatic broken mirror on panic...

Improve pst(4) handling. If the scheduler is stopped, mask all interrupts,
which will instruct iop_queue_wait_msg() to use a DELAY loop instead of a
sleep. This way we always perform device shutdown.

While here use the named constant for the relevant interrupt bit.

Harbormaster completed remote builds in B54546: Diff 130396.Nov 21 2023, 6:44 PM

mhorne added inline comments.Nov 21 2023, 6:46 PM