Page MenuHomeFreeBSD

MCA: Expand AMD Thresholding support to cover all banks
ClosedPublic

Authored by cem on Sep 11 2017, 4:26 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 20 2023, 3:25 AM
Unknown Object (File)
Nov 10 2023, 7:16 PM
Unknown Object (File)
Nov 10 2023, 3:09 PM
Unknown Object (File)
Nov 9 2023, 9:17 AM
Unknown Object (File)
Nov 9 2023, 6:58 AM
Unknown Object (File)
Nov 3 2023, 1:01 AM
Unknown Object (File)
Oct 31 2023, 10:36 PM
Unknown Object (File)
Oct 23 2023, 7:50 AM
Subscribers
None

Details

Summary

When it was added in r314636, AMD Thresholding was hardcoded to only
bank 4 (Northbridge) for some reason. However, even on family 10h the
MCAx_MISC register Valid/Present bits determine whether thresholding is
supported on that bank.

Expand thresholding support to monitor all monitorable banks. This
simplifies some of the logic and makes it more consistent with our Intel
CMCI support.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Looks ok, but this code is pretty foreign to me. Could you point me to the doc(s) you consult for this stuff?

sys/x86/x86/mca.c
848 ↗(On Diff #32922)

The indentation on the continuing line looks wrong.

998 ↗(On Diff #32922)

Why "i" and not "bank"?

This revision is now accepted and ready to land.Sep 11 2017, 6:07 PM

Looks ok, but this code is pretty foreign to me. Could you point me to the doc(s) you consult for this stuff?

Isilon's got the doc under NDA.

sys/x86/x86/mca.c
998 ↗(On Diff #32922)

To match cmci_monitor(). I agree it's not a great name.

cem marked an inline comment as done.

Fix indentation on continuing line.

This revision now requires review to proceed.Sep 11 2017, 6:16 PM

The reason I originally implemented the feature only for Bank 4 is that in the public BKDG for family 10h MC0_MISC to MC3_MISC are specified as read-only all zero-s. And MC5_MISC is documented to have an unspecified value. So, I thought why bother...
For example:

MSR0000_0403 DC Machine Check Miscellaneous Register (MC0_MISC)
This register is read-only, reset: 0000 0000 0000 0000h.

MSR0000_0417 FR Machine Check Miscellaneous Register (MC5_MISC)
Reset: xxxx xxxx xxxx xxxxh. This register records unspecified, implementation-specific status bits when an
FR machine check error is logged.

Another thing that I was contemplating is using different extended LVTs for different banks (on hardware where multiple banks support the feature), so I left everything "multi-bank-ish" for the future.

As a side node, it seems that mca.c is ripe for being split into mca_intel.c, mca_amd.c and mca.c proper. Not suggesting that it needs to be done as a a part of this change, just a thought.

Correct a typo: 15 should be 15h

In D12321#255658, @avg wrote:

The reason I originally implemented the feature only for Bank 4 is that in the public BKDG for family 10h MC0_MISC to MC3_MISC are specified as read-only all zero-s. And MC5_MISC is documented to have an unspecified value. So, I thought why bother...

Sure. Does this expansion to all banks, still gated on MISC register, seem reasonable to you? My family 17h CPU has something like 23 banks, many of which have non-zero MISC bits. The zero bits in MISC0-3 on 10h should still cause only MISC4 to be monitored on that platform.

And yes, I also noted undefined MISC5 on 10h and have a special check for that.

I'm looking for either a "this needs to be fixed" or "approved" from someone familiar with our MCA architecture :-). Thanks!

This revision was automatically updated to reflect the committed changes.