Page MenuHomeFreeBSD

Always stop the scheduler when entering kdb
ClosedPublic

Authored by vangyzen on Oct 24 2018, 4:57 PM.
Tags
None
Referenced Files
F103275566: D17687.diff
Fri, Nov 22, 11:04 PM
Unknown Object (File)
Sun, Nov 10, 3:07 PM
Unknown Object (File)
Oct 13 2024, 2:53 PM
Unknown Object (File)
Sep 17 2024, 4:48 PM
Unknown Object (File)
Sep 4 2024, 10:01 PM
Unknown Object (File)
Sep 3 2024, 12:39 PM
Unknown Object (File)
Aug 28 2024, 1:17 PM
Unknown Object (File)
Aug 17 2024, 3:31 PM

Details

Summary

Set curthread->td_stopsched when entering kdb via any vector.
Previously, it was only set when entering via panic, so when
entering kdb another way, mutexes and such were still "live",
and an attempt to lock an already locked mutex would panic.

Test Plan

It compiled, with and without SMP. I haven't tested it yet.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 20397
Build 19837: arc lint + arc unit

Event Timeline

  • make unset symmetric; academic, but good principle

Well, when you are trying to acquire an already locked mutex, you typically accessing some resource which state is inconsistent. You in fact trade the panic into undefined behavior and potentially corrupting the state, which means that the return from ddb to running system is unsafe.

I understand the motivation of your change. Typical answer is that ddb should not call any function that requires locks.

I very rarely return from ddb into the live system, so typically your change would be an improvement for my uses.

This revision is now accepted and ready to land.Oct 24 2018, 6:50 PM
In D17687#377924, @kib wrote:

Well, when you are trying to acquire an already locked mutex, you typically accessing some resource which state is inconsistent. You in fact trade the panic into undefined behavior and potentially corrupting the state, which means that the return from ddb to running system is unsafe.

I understand the motivation of your change. Typical answer is that ddb should not call any function that requires locks.

I very rarely return from ddb into the live system, so typically your change would be an improvement for my uses.

I totally agree. I raised many of these points in an internal discussion at Isilon. But as you say, it's probably a net-win. Thanks for the review.

Did a SMP and UP build. Ran a few kdb commands followed by "continue". Tested "dump" with both builds. No problems seen.

Well, it wouldn't panic in ddb itself as ddb catches any panics and reenters. However, it does mean that while you may fail to lock get mutex, I think you can end up unlocking the mutex (e.g. if you panicked while holding a mutex and called a DDB function that tried to recurse and the mutex was not marked as recursive). However, non-panic entries into DDB probably don't hold any locks (at least existing ones).

In D17687#378257, @jhb wrote:

Well, it wouldn't panic in ddb itself as ddb catches any panics and reenters. However, it does mean that while you may fail to lock get mutex, I think you can end up unlocking the mutex (e.g. if you panicked while holding a mutex and called a DDB function that tried to recurse and the mutex was not marked as recursive). However, non-panic entries into DDB probably don't hold any locks (at least existing ones).

sysctl entry locks Giant. I added ddb functions lockgiant and unlockgiant. They worked, but when I left ddb, it panicked because sysctl tried to unlock Giant, but sysctl didn't own the lock.

I still think this is a net-win, but I'm open to argument.

Another option is to add a stopsched ddb command that sets curthread->td_stopsched = 1. The user could set it, as needed, when they are willing to corrupt lock state because they do not plan to exit ddb. The problem is, the user would have to know that it exists.

IMO once you enter ddb at runtime and do arbitrary debugging commands, any expectation that the system must behave as if you hadn’t entered ddb go out the window. So in that sense, I don’t think this introduces a regression.

That said, I am supportive of any reasonable hardening against accidental foot shooting.

I plan to commit this soon, unless there are any objections. (I don't see any above.)

In D17687#378266, @cem wrote:

IMO once you enter ddb at runtime and do arbitrary debugging commands, any expectation that the system must behave as if you hadn’t entered ddb go out the window. So in that sense, I don’t think this introduces a regression.

I don't quite agree with that, btw. I made the ddb 'kill' command use a try-lock on purpose so you have a way on the console to kill a process gracefully if you are so hosed you can't fork new processes. (And I've used this on multiple occasions).

To be clear though, I'm not suggesting to not commit the change, but just breaking into ddb to examine state shouldn't break anything (and I think this doesn't change that, per se).

I'm surprised you got a panic in DDB just trying to run a command though for a non-panic entry. It seems like it is a poorly written command in that case.

It's an Isilon command. It uses non-DDB code that locks a mutex. In that sense, yes, it's poorly written, but I'd rather relax that notion and allow mutexes (and therefore more "normal", non-DDB) code to be used. Ideally, there would be DDB code for everything we need; in practice, of course, there isn't.

Thanks for the feedback, @jhb.

This revision was automatically updated to reflect the committed changes.