Page MenuHomeFreeBSD

kern_reboot: Only call the shutdown handler once
Needs ReviewPublic

Authored by imp on Sep 24 2021, 12:11 AM.

Details

Summary

If we get a panic in a shutdown handler, we'll recursively call
kern_reboot. Rather than call the same routine that caused problems
again, skip them all and go right to doadump (if dumping).

Sponsored by: Netflix

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 41948
Build 38836: arc lint + arc unit

Event Timeline

imp requested review of this revision.Sep 24 2021, 12:11 AM
imp created this revision.

I'm told that these day using "bool" and true/false is preferred over using "int" and 1/0, but otherwise this is good.

This revision is now accepted and ready to land.Sep 24 2021, 12:18 AM
In D32089#724181, @chs wrote:

I'm told that these day using "bool" and true/false is preferred over using "int" and 1/0, but otherwise this is good.

Yea, me too... didn't want to change 'once' to that, so did it in the 'once' style of the function.

sys/kern/kern_shutdown.c
473–476

Why don't pre_sync handlers need the same treatment?

I agree with Mark, but would probably just extend "once" to cover everything between lines 442 and 488.

sys/kern/kern_shutdown.c
430–437

I think if we are going to have multiple of these, then they need distinct names. That is, I'd rename once to sync_once or the like. However, if you adopt mav@'s suggestion and handle all of pre_sync, bufshutdown, and post_sync under a single once then you can avoid having to rename. Doing that though adds a behavior change for RB_NOSYNC.

I think there are some questions though that might help inform how you do this, Almost what you really want instead is to do a kind of modified EVENTHANDLER_INVOKE that uses setjmp before each handler and does a longjmp on a panic to just keep going. Perhaps that is too hard, but doing it via recursion instead I think the question is if you get a panic in a pre_sync, do you want to still try to run bufshutdown and then post_sync or just skip to doadump? If the former, that probably argues for 3 separate variables (pre_sync_once, sync_once, post_sync_once). If you always want to skip, then maybe you just move all 3 things under a single 'once' (but with some weirdness around RB_NOSYNC. OTOH, a recursive panic probably always sets RB_NOSYNC, so it's not really that weird).

Chart a middle course that doesn't require setjmp/longjmp :)

This revision now requires review to proceed.Sep 24 2021, 9:33 PM
This revision is now accepted and ready to land.Sep 24 2021, 9:37 PM

If you code BASIC, please do it in BASIC style, ie. 10/20/30 ... allowing inserting something in the middle without renumbering all the items.

Hah, I had almost suggested an enum with named states that you bumped along as you went and this is functionally equivalent to that. I think this is fine.

If the restarts here happen only in case of panic, should we really insist on further shutdown process, or better dump quickly and reboot?

Re BASIUC: I don't see us adding a lot (or any) new sections. That's what the even handlers are fo. Given the low number, it's OK, especially since ++ is easy, but +=10 becomes +=5 on an insertion and may be more error prone. Elsewhere this would be a good suggestion, but I'm leading strongly against it here.

Re re-entry: I think that would make it more complicated than this simple guardrail but wouldn't make it easier.

Re enum: I thought of that, but rejected that as well, what if you forget to change the name?

We use a similar technique to this in all the busdma routines to allow easy error exits.

The enum would only be useful I think to address kib's worry about future insertion as at least you don't have to bump all the names, that is if you did something like:

static enum {
      PRE_SYNC,
      SYNC,
      UPTIME,
      ...
} progress = PRE_SYNC;
...

      if (progress == PRE_SYNC) {
          progress++;
          EVENTHANDLER_INVOKE(shutdown_pre_sync, howto);
      }
...

Then inserting a new event would just meaning inserting a new enum and the new block of code without having to worry about adjusting the conditions in all the subsequent blocks.

However, I agree that we aren't likely to introduce new blocks of code in this routine in the future since we generally depend on EVENTHANDLERs for extensibility here instead.

now with 6 part harmony and dancing enums :)

This revision now requires review to proceed.Sep 30 2021, 1:32 AM

Is there anything that prevents two threads from issuing reboot(2) in parallel? I do not see it.

With the progress tracking var, the mess there is arguably higher.

sys/kern/kern_shutdown.c
513–522

Style: spaces around '|' bin op, since you are changing the line anyway.

In D32089#727337, @kib wrote:

Is there anything that prevents two threads from issuing reboot(2) in parallel? I do not see it.

With the progress tracking var, the mess there is arguably higher.

Before the changes, life would be bad if that happened. After the change, there's fewer cases that would be bad. But arguing over degrees of badness seems... unwise... At best we can say that panicking while in reboot() is likely more common than racing there, and for that common case this is substantially better.
But since we're panicking, the last thing we want to do is to not reboot due to deadlock...
But at some point in this process, we stop scheduling new jobs, so wouldn't that mitigate things substantially? How would you even do sane locking here? We could set the panic CPU and hard loop others that wind up in kern_reboot() I guess. Ideas?

In D32089#727482, @imp wrote:
In D32089#727337, @kib wrote:

Is there anything that prevents two threads from issuing reboot(2) in parallel? I do not see it.

With the progress tracking var, the mess there is arguably higher.

Before the changes, life would be bad if that happened. After the change, there's fewer cases that would be bad. But arguing over degrees of badness seems... unwise... At best we can say that panicking while in reboot() is likely more common than racing there, and for that common case this is substantially better.
But since we're panicking, the last thing we want to do is to not reboot due to deadlock...
But at some point in this process, we stop scheduling new jobs, so wouldn't that mitigate things substantially? How would you even do sane locking here? We could set the panic CPU and hard loop others that wind up in kern_reboot() I guess. Ideas?

We do perform context switches despite shutting down, otherwise we cannot even do things like complete io requests.

Before the changes, some things could be executed twice. I argue that after the changes, some things could be executed twice or not executed at all, depending on what corruption occurs to progress from parallel un-synchronized updates.

I think you should use atomic_fetchadd_int() for progress updates and reading its current value.

update with kib's suggestions

sys/kern/kern_shutdown.c
474

This still gives a race where other thread reads progress right after your read and also starts executing shutdown_pre_sync. More, in this case both threads bump the progress indicator, so the next step (bufshutdown) is skipped.

You should do single atomic_fetchadd at entry and then switch{} based on the returned value - 1.

But I realized that there is one more issue with this approach: one thread might start a step while previous step still did not finished in the other thread.

So the question is: why not add a single global guard at the start of kern_reboot() which would reject a second attempt to call the function? We even already have the 'rebooting' variable, which might be reused if setting is moved earlier.