Otherwise, under bootverbose, the lapic_enable_cmc() banner 'lapicX: CMCI unmasked' is printed by several CPUs in parallel, causing garbled output for the LAPIC dumps.
Details
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
Humm, that's not just a cosmetic error. cmci_monitor() assumes it is not being run concurrently on multiple CPUs so that if a given MC# bank is shared, only one CPU monitors it. It seems this is a very old bug of mine from the original CMCI commit. :( Arguably mca_init() should perhaps lock the mca_lock, but that is a bit trickier perhaps. The lock isn't initialized when mca_init() is called for the BSP. While I would prefer to do locking in mca.c for clarity reasons it would be a bit messier. This is ok, but perhaps add a comment to mca_init() saying we rely on the ap_boot_mtx to serialize it (just the boot-time mca_init(), not the resume mca_resume()).
Thanks for looking into this!
I'm on vacaticions until the 28th, so I've got limited laptop/Internet access and no way to test changes, but maybe you could just pick the lock in _mca_init for the APs (PCPU_GET(cpuid) != 0), and leave the BSP path as-is?
IMO the issue with the cmci_monitor() is unrelated to this one. ap_boot_mtx provided the serialization for the banners, and mca_lock is for mca/mce. I updated the patch to fix both issues, but will split it for the commits, after the review.
sys/x86/x86/mca.c | ||
---|---|---|
1106 | It would be good to be consistent with testing for the BSP by either fixing other tests here to use IS_BSP. That said, there's a chance CPUID_MCE might be set but not MCA (e.g. Pentium) and in this case the mutex would not be initialized where it is locked now. I would perhaps move it inside the MCA block just around the for loop iterating over the banks. In that case you could also remove the BSP check to simplify the code and just use 'if (boot)' (which avoids the IS_BSP inconsistency thing). |
sys/x86/x86/mca.c | ||
---|---|---|
1106 | Can you explain why BSP check is not needed inside the loop ? |
sys/x86/x86/mca.c | ||
---|---|---|
1129 | It doesn't hurt to have the check, but by this point the lock will always be initialized so it's safe to acquire it anyway on the BSP even if it's kind of pointless merely to make the code simpler. Always acquiring the lock also means you probably can use a simpler comment of just 'Configuring CMCI monitoring must be single-threaded.' or some such. |