A dump may be produced without stopping the scheduler via "reboot -d".
In such a case, it's neither safe nor necessary to call sim_poll.
Additionally, a ccb must not be freed/reused until fully through the CAM
pipeline. When the scheduler is stopped, this is guaranteed by the fact
that only one thread operates on the ccb. To handle the case when other
threads may be running, add a flag to indicate that it is safe to
free/reuse the ccb memory.
Details
Test dumps with "reboot -d" as well as panic induced dumps. Ensure a good dump
is produced and that no crashes occur after many iterations. Try a couple of different SIMs.
Diff Detail
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 5847 Build 6135: arc lint + arc unit
Event Timeline
Giving this review a gentle prod in case anyone willing to review might have forgotten about it. I'd also appreciate any suggestions of reviewers if folks in the current set don't have time to review or feel it's outside their area. Thanks.
While the proposed patch may work, I can't say I like it for multiple reasons:
- from my experience kernel dumping without scheduler stopped is a way to obtain inconsistent dump;
- if scheduler is not stopped, there is no need to use not only xpt_polled_action(), but the polling at all -- just put calling thread to sleep and wake it when completion callback fired. There is cam_periph_runccb() present for many years for CCB waiting in sleepable context. If context is not sleepable, then then tight spinning for CCB completion may not work by definition on UP systems, since kernel threads (including interrupt threads) may not preempt each other, and so CCB completion may never be processed without polling.
Thanks mav. Originally, I'd thought stopping the scheduler when using 'reboot -d' was the correct way. kib and I discussed a bit though, and I was convinced to look for another option ( https://lists.freebsd.org/pipermail/freebsd-hackers/2016-November/050108.html ). Your first bullet would seem to suggest that stopping the scheduler is necessary.
Perhaps stopping the scheduler on 'reboot -d' should be made a tunable? If a case arises where a dump can't be obtained, that tunable can be twiddled to return to today's behavior.
Optionally stop scheduler when using "reboot -d". I considered a couple of
other options:
- Changing dumpers (e.g. adadump()) to have a mode that uses
cam_periph_runccb() and friends when the scheduler is not stopped. But there
are many places to make such a change (many of which I know little about and
can't test). It might be that dumpers that support dumping while running could
use this new mode, while others would use the existing dump code (perhaps
stopping the scheduler before doing do), but this all seems overly complicated.
- Writing a generic "dump core while running" function that flows through
geom, but this would mean dumping earlier in reboot (before shutting off geom
services), which seems more likely to produce an inconsistent dump.
Stopping the scheduler gives roughly the same behavior as a panic, while the
tunable allows someone who has an issue here to skip this step during a dump.