Page MenuHomeFreeBSD

ddb: have 'reset' command use normal reboot path
ClosedPublic

Authored by mhorne on Jan 8 2023, 6:18 PM.
Tags
None
Referenced Files
F135770099: D37981.id114846.diff
Wed, Nov 12, 5:04 PM
Unknown Object (File)
Sun, Nov 9, 4:40 PM
Unknown Object (File)
Sun, Nov 9, 4:40 PM
Unknown Object (File)
Sun, Nov 9, 4:39 PM
Unknown Object (File)
Sun, Nov 9, 4:38 PM
Unknown Object (File)
Sun, Nov 9, 4:32 PM
Unknown Object (File)
Sat, Nov 8, 3:59 PM
Unknown Object (File)
Sat, Nov 1, 10:13 PM
Subscribers

Details

Summary

This conditionally gives all registered shutdown handlers a chance to
perform the reboot, with cpu_reset() being the fallback. The
debug.ddb.full_reboot sysctl/tunable restores the previous behaviour.

The motivation is that some platforms may not be able do anything
meaningful via cpu_reset(), due to a lack of standardized reset
mechanism and/or firmware shortcomings. However, they may have a
separate device driver attached that normally performs the reboot. Such
is the case for some versions of the Raspberry Pi.

Test Plan

We get some new console output with this change:

debug.kdb.enter: 0KDB: enter: sysctl debug.kdb.enter
[ thread pid 812 tid 100118 ]
Stopped at      kdb_trap+0x448: sd      zero,0(s1)
db> reset
Waiting (max 60 seconds) for system process `vnlru' to stop... done
Uptime: 1m51s

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 48998
Build 45887: arc lint + arc unit

Event Timeline

mhorne requested review of this revision.Jan 8 2023, 6:18 PM

Note that kern_reboot() is called by vpanic(), so it is safe for this context. I'm providing the tunable out of caution, in case there is some unforeseen side effect. If the change seems agreeable I'll document the tunable in ddb(4) as well.

Looks good to me. RB_NOSYNC is critical here...

This revision is now accepted and ready to land.Jan 8 2023, 6:46 PM

Can we grab a mtx, call eventhandlers, ... etc. when called from ddb (e.g. after a panic)? What about early boot panics - is this all setup then? Given kern_reboot will never return, we'll also never have the chance to try cpu_reset(). Can we, instead of making this a sysctl have it as a /f argument to db_reset? That way it'll be available if cpu_reset fails and one can just call reset /f and give it a try... but in either way, I'd rather not change the default until I am convinced that it won't make it worse in a lot of panic situations...

In D37981#862886, @bz wrote:

Can we grab a mtx, call eventhandlers, ... etc. when called from ddb (e.g. after a panic)? What about early boot panics - is this all setup then? Given kern_reboot will never return, we'll also never have the chance to try cpu_reset(). Can we, instead of making this a sysctl have it as a /f argument to db_reset? That way it'll be available if cpu_reset fails and one can just call reset /f and give it a try... but in either way, I'd rather not change the default until I am convinced that it won't make it worse in a lot of panic situations...

Short answer: yes, this function is totally safe for this context. It is already in use today at the very end of vpanic(). Lock acquisitions and attempts to sleep will check if we are in a panic/debugger context and return early if that's the case.

cpu_reset() will still be executed by the kern_reboot() path. It is called by the very last shutdown handler, shutdown_reset().

For early-boot, we do have a small problem. Before any shutdown handlers are registered during SI_SUB_INTRINSIC, calling kern_reboot() will just end up spinning in the infinite loop. This can be observed today if you type continue instead of reset at the prompt after an early panic. To handle this edge-case I think it would be best for kern_reboot() to call shutdown_reset() directly rather than relying on the event handler machinery. That way, the most basic reset is always attempted.

I agree that a command modifier is much more natural than the tunable here, but I still propose the (unifying) change to the default behaviour of reset. I will update the review shortly.

Switch from a tunable to a ddb command modifier 's'.

Ensure we always call shutdown_reset(), even before event handlers are registered. This bit can be committed separately but I'm including it in this review.

This revision now requires review to proceed.Jan 9 2023, 4:27 PM
markj added inline comments.
sys/ddb/db_command.c
775

I'd elaborate a bit more on why this might be important, either here or in the man page.

This revision is now accepted and ready to land.Jan 10 2023, 9:08 AM

Totally belated, but would it make (have made) sense to split default behavior of reboot and reset commands?
I think that it would be more natural if reboot tried to do a reboot and reset just went for a reset (where implemented)?

In D37981#1210335, @avg wrote:

Totally belated, but would it make (have made) sense to split default behavior of reboot and reset commands?
I think that it would be more natural if reboot tried to do a reboot and reset just went for a reset (where implemented)?

I think that would make sense. In retrospect it's a bit silly that reboot and reset are aliases of each other, and both have the /s modifier.

Another thought that occurred to me is maybe we should have a separate, dedicated event for various reset / poweroff handlers.
The reason I started looking into this change is that recently I saw a number of crashes on systems where crash information is collected via ddb scripts (instead of just dumping a vmcore).
Those scripts happened to have 'reset' command at the end... Not sure maybe that was wrong / redundant as I think that the system should automatically go for a reset after finishing the script.
In any case, the reset command resulted in secondary panics which messed up things.

In other words, I think that currently db_reset runs too many things and maybe it would be better if it invoked on the reset / poweroff handlers via the proposed dedicated event.

Another thing I noticed is that db_reset, unlike vpanic, does not set TDF_INPANIC.
I wonder if that could have any weird consequences when running all those pre-/post-sync even handlers.

In D37981#1210335, @avg wrote:

Totally belated, but would it make (have made) sense to split default behavior of reboot and reset commands?
I think that it would be more natural if reboot tried to do a reboot and reset just went for a reset (where implemented)?

I think that would make sense. In retrospect it's a bit silly that reboot and reset are aliases of each other, and both have the /s modifier.

I agree. At the time when this change enabling shutdown handlers was made, there was no 'reboot' alias. Now, we could make them distinct.

In D37981#1213754, @avg wrote:

Another thought that occurred to me is maybe we should have a separate, dedicated event for various reset / poweroff handlers.
The reason I started looking into this change is that recently I saw a number of crashes on systems where crash information is collected via ddb scripts (instead of just dumping a vmcore).
Those scripts happened to have 'reset' command at the end... Not sure maybe that was wrong / redundant as I think that the system should automatically go for a reset after finishing the script.
In any case, the reset command resulted in secondary panics which messed up things.

In other words, I think that currently db_reset runs too many things and maybe it would be better if it invoked on the reset / poweroff handlers via the proposed dedicated event.

Yes. Enabling this has caused a few headaches; several for @bz who just reported another one to -CURRENT recently (see D53582). Running these paths in the debugger/panic context has exposed issues that we have fixed one-by-one, but at this point I am not sure that it has been worthwhile when it hinders development.

So I would take your suggestions, and propose we revert the ddb reset command to its previous simple behaviour, but we can keep reboot as a full reboot, that runs shutdown handlers.

Optionally we can remove the /s modifier from both. The ddb command interface does not need to be carved in stone.

What do you think?

Another thing I noticed is that db_reset, unlike vpanic, does not set TDF_INPANIC.
I wonder if that could have any weird consequences when running all those pre-/post-sync even handlers.

From my look at the limited use of this flag, there is only one consumer, within the scheduler. But I wouldn't say it is impossible.