Page MenuHomeFreeBSD

MCA: Expand AMD Thresholding support to cover all banks
ClosedPublic

Authored by cem on Sep 11 2017, 4:26 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 19, 6:56 AM
Unknown Object (File)
Mon, Nov 18, 8:39 AM
Unknown Object (File)
Fri, Nov 15, 10:17 PM
Unknown Object (File)
Sun, Nov 10, 10:40 AM
Unknown Object (File)
Sun, Nov 10, 12:44 AM
Unknown Object (File)
Sep 26 2024, 2:01 AM
Unknown Object (File)
Sep 23 2024, 12:16 AM
Unknown Object (File)
Sep 21 2024, 8:28 PM
Subscribers
None

Details

Summary

When it was added in r314636, AMD Thresholding was hardcoded to only
bank 4 (Northbridge) for some reason. However, even on family 10h the
MCAx_MISC register Valid/Present bits determine whether thresholding is
supported on that bank.

Expand thresholding support to monitor all monitorable banks. This
simplifies some of the logic and makes it more consistent with our Intel
CMCI support.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 11499
Build 11854: arc lint + arc unit

Event Timeline

Looks ok, but this code is pretty foreign to me. Could you point me to the doc(s) you consult for this stuff?

sys/x86/x86/mca.c
848

The indentation on the continuing line looks wrong.

998

Why "i" and not "bank"?

This revision is now accepted and ready to land.Sep 11 2017, 6:07 PM

Looks ok, but this code is pretty foreign to me. Could you point me to the doc(s) you consult for this stuff?

Isilon's got the doc under NDA.

sys/x86/x86/mca.c
998

To match cmci_monitor(). I agree it's not a great name.

cem marked an inline comment as done.

Fix indentation on continuing line.

This revision now requires review to proceed.Sep 11 2017, 6:16 PM

The reason I originally implemented the feature only for Bank 4 is that in the public BKDG for family 10h MC0_MISC to MC3_MISC are specified as read-only all zero-s. And MC5_MISC is documented to have an unspecified value. So, I thought why bother...
For example:

MSR0000_0403 DC Machine Check Miscellaneous Register (MC0_MISC)
This register is read-only, reset: 0000 0000 0000 0000h.

MSR0000_0417 FR Machine Check Miscellaneous Register (MC5_MISC)
Reset: xxxx xxxx xxxx xxxxh. This register records unspecified, implementation-specific status bits when an
FR machine check error is logged.

Another thing that I was contemplating is using different extended LVTs for different banks (on hardware where multiple banks support the feature), so I left everything "multi-bank-ish" for the future.

As a side node, it seems that mca.c is ripe for being split into mca_intel.c, mca_amd.c and mca.c proper. Not suggesting that it needs to be done as a a part of this change, just a thought.

Correct a typo: 15 should be 15h

In D12321#255658, @avg wrote:

The reason I originally implemented the feature only for Bank 4 is that in the public BKDG for family 10h MC0_MISC to MC3_MISC are specified as read-only all zero-s. And MC5_MISC is documented to have an unspecified value. So, I thought why bother...

Sure. Does this expansion to all banks, still gated on MISC register, seem reasonable to you? My family 17h CPU has something like 23 banks, many of which have non-zero MISC bits. The zero bits in MISC0-3 on 10h should still cause only MISC4 to be monitored on that platform.

And yes, I also noted undefined MISC5 on 10h and have a special check for that.

I'm looking for either a "this needs to be fixed" or "approved" from someone familiar with our MCA architecture :-). Thanks!

This revision was automatically updated to reflect the committed changes.