Page MenuHomeFreeBSD

x86: Keep cumulative MCA statistics in the kernel
ClosedPublic

Authored by jtl on Wed, Sep 24, 2:47 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Oct 13, 8:43 AM
Unknown Object (File)
Mon, Oct 13, 8:43 AM
Unknown Object (File)
Sun, Oct 12, 8:21 PM
Unknown Object (File)
Sun, Oct 12, 8:21 PM
Unknown Object (File)
Sun, Oct 12, 8:21 PM
Unknown Object (File)
Sun, Oct 12, 8:21 PM
Unknown Object (File)
Sun, Oct 12, 8:45 AM
Unknown Object (File)
Fri, Oct 10, 4:46 PM
Subscribers
None

Details

Summary

Keeping cumulative MCA statistics in the kernel provides a way for users to get an accurate count of various kinds of errors reported by the CPU.

After ca8929d2a3e9b9df31d2e487377f99d7c39aa01d, it is possible that the kernel will drop the record of some MCA interrupts. Moreover, this provides a cheaper interface to obtain statistics if that is the only reason a user is processing MCA logs.

Test Plan

This is a lightly updated version of a patch which has been in use within Netflix for a while.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jtl requested review of this revision.Wed, Sep 24, 2:47 PM
jtl created this revision.

If anyone is planning to review this (or if I should ask more people), let me know. Otherwise, I will probably just commit this soon. AFAICT, this is fairly innocuous (new feature with low risk of breaking existing things). But, there is a reason we get these things reviewed...

It'd be nice to document this somewhere, but sadly we don't appear to have an mca.4.

sys/x86/x86/mca.c
137

We've generally been avoiding adding new uses of volatile to denote variables accessed using atomic(9). I'd instead add a short comment explaining the synchronization protocol for this array.

652

Was it intentional to include both an assertion and runtime check? If so, something like this would be a bit cleaner IMO:

if (event_type < 0 || event_type >= MCA_T_COUNT) {
     KASSERT(0, ("%s invalid event type (%d)", ...));
     event_type = MCA_T_UNKNOWN;
}
This revision is now accepted and ready to land.Fri, Oct 3, 1:16 PM

Thanks for the review! I agree on the documentation comment.

sys/x86/x86/mca.c
137

OK, thanks. I didn't know that. We may want to remove volatile from the prototypes in atomic(9). I'll change it and add a comment.

652

Yes, it was intentional, in a belt-and-suspenders sort of way. If anyone adds a new event type and tests their code with a DEBUG kernel, they should hit the assertion as a reminder to update the stats array. If that doesn't happen, the runtime check avoids an out-of-bounds write.

I like your update; I will adopt it.

incorporating review feedback

This revision now requires review to proceed.Fri, Oct 3, 2:20 PM
jtl marked 2 inline comments as done.Fri, Oct 3, 2:21 PM
This revision is now accepted and ready to land.Fri, Oct 3, 2:45 PM