Page MenuHomeFreeBSD

Fix a logic typo (?) in IPMI watchdog initialization
Needs ReviewPublic

Authored by lytboris_gmail.com on Fri, Apr 10, 8:27 AM.
Tags
None
Referenced Files
F152549597: D56340.diff
Wed, Apr 15, 3:12 PM
Unknown Object (File)
Wed, Apr 15, 7:36 AM
Unknown Object (File)
Tue, Apr 14, 7:46 PM
Unknown Object (File)
Tue, Apr 14, 1:32 PM
Unknown Object (File)
Tue, Apr 14, 1:13 PM
Unknown Object (File)
Tue, Apr 14, 12:03 PM
Unknown Object (File)
Mon, Apr 13, 3:10 PM
Unknown Object (File)
Mon, Apr 13, 3:10 PM
Subscribers

Details

Reviewers
imp
Summary

There are couple of sysctl knobs controlling various IPMI interface itself and a watchdog device that could be a part of BMC:

  • hw.ipmi.on - (seems to) enable/disable IPMI interface completely. If disabled, we [probably] should not interact with BMC at all
  • hw.ipmi.wd_init_enable - (should) control watchdog initialization.

Both are enabled by default.

Upon IPMI module load, wd_startup_countdown is checked and configured if needed. If it is not configured to run, we do not reset watchdog timer at all.

This creates a problem in case a user configures hw.ipmi.wd_shutdown_countdown and reboots leaving all other knobs untouched. Watchdog is armed by ipmi_shutdown_event() and start running. It runs across machine reset, keeps running while the kernel is being loaded and eventually triggers machine reboot when it runs out.

Current patch is not a thing to be merged, but rather a starting point to elaborate on conditions when IPMI watchdog should be reset.

Should we use wd_init_enable instead or reset WD unconditionally here?

Test Plan
  1. Set hw.ipmi.wd_shutdown_countdown to non-zero value (e.g., 300)
  2. reboot machine
  3. Observe IPMI watchdog running using ipmitool ... mc watchdog get
  4. Observe server reboot once watchdog timer runs out

With a patch watchdog timer will be disabled once IPMI module is loaded.
The tricky part here is that some hardware platforms (for example, Gigabyte) disarms WD upon machine reset while other platforms (for example, Supermicro) keep timer running across machine resets.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

lytboris_gmail.com created this revision.