Page MenuHomeFreeBSD

x86: Allow MCA messages to be rate-limited
ClosedPublic

Authored by jtl on Mon, Oct 6, 9:30 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Oct 11, 2:00 PM
Unknown Object (File)
Sat, Oct 11, 2:00 PM
Unknown Object (File)
Sat, Oct 11, 2:00 PM
Unknown Object (File)
Sat, Oct 11, 2:00 PM
Unknown Object (File)
Sat, Oct 11, 2:00 PM
Unknown Object (File)
Sat, Oct 11, 2:00 PM
Unknown Object (File)
Sat, Oct 11, 2:00 PM
Unknown Object (File)
Sat, Oct 11, 2:00 PM
Subscribers

Details

Summary

Always print the first 50 messages of each type. After that, optionally rate-limit the messages. This provides a way to limit the overhead of processing excessive messages without suppressing the first few of any type.

As part of this change, we are switching from direct printf() calls to collecting data in an sbuf(9). In POLLED mode (run from a task queue), we dynamically allocate the buffer. In the other modes (which are likely called from a hardware interrupt), we use a buffer allocated from the BSS segment and guarded by a lock. In normal operation, most calls to mca_log() should come from the POLLED mode, so there should be no contention for the new lock. If there is an interrupt storm which exceeds the capacity of the free list, there will be new contention for this lock; however, overall lock contention should still be lower than it was prior to e770e32aa3a0, when the mca_lock was held for the entirety of the mca_log() call.

This commit is partly based on a patch proposed by Loic Prylli <lprylli@netflix.com>.

Test Plan

Create fake events using the infrastructure described in D52942.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jtl requested review of this revision.Mon, Oct 6, 9:30 PM
This revision is now accepted and ready to land.Tue, Oct 7, 5:42 AM

Fix logic in case (mode != polled && (mca_startup_done || (rec->mr_status & MC_STATUS_UC)))

This revision now requires review to proceed.Tue, Oct 7, 12:56 PM

Remove unneeded check for mca_startup_done

Not sure whether it needs to be addressed or not, but for completeness there is the possibility of some sequence of events occasionally emptying the mca_freelist, and then my understanding is that mca_record_entry() will work in "degraded mode" directly call mca_log() in an interrupt context (next to "MCA: Unable to allocate space for an event.\n"), and as long as that situation persists, the corresponding MCA printf() won't be ratechecked any more.

It could be a very unlikely theoretical case with the mca_postscan() changes, but with previous version of code, a couple of machines were able to trigger the "MCA: Unable to allocate space for an event" situation in netflix fleet.

Not sure whether it needs to be addressed or not, but for completeness there is the possibility of some sequence of events occasionally emptying the mca_freelist, and then my understanding is that mca_record_entry() will work in "degraded mode" directly call mca_log() in an interrupt context (next to "MCA: Unable to allocate space for an event.\n"), and as long as that situation persists, the corresponding MCA printf() won't be ratechecked any more.

It could be a very unlikely theoretical case with the mca_postscan() changes, but with previous version of code, a couple of machines were able to trigger the "MCA: Unable to allocate space for an event" situation in netflix fleet.

I agree that we could theoretically hit that. (And, based on what you said, may even be likely to hit that in some small percentage of cases.) I think the question is whether it is worth closing that gap? My view is that the rate-limiting is a best-effort functionality which should "fail open" rather than "fail closed". Even if the limiting isn't perfect, it would still help some.

To close this robustly, we either need to accept message truncation, find a way to reliably buffer the whole message in a way that avoids allocations, or we would likely need to separate event-classification from event-printing (as your original patch did, but I wanted to avoid running duplicate logic).

@glebius : I don't love it, but... what do you think about a heap allocation of 512 bytes which we would guard through a spin lock and only use when we're in an interrupt context? Or, accepting message truncation?

jtl edited the summary of this revision. (Show Details)

Reworked patch to use a large-ish buffer allocated from BSS and protected by a lock if called from an interrupt context. As noted in the commit message, this will result in increased lock contention during an interrupt storm which exceeds the capacity of the free list; however, overall lock contention should still be lower than it was when mca_log() was called with the mca_lock held.

sys/x86/x86/mca.c
132
133

Why not define the sysctl together with the handler?

171

If val is a count of microseconds, why are we setting tv_sec here?

494

Not directly part of your change, but, what does the condition !tes_supported(rec->mr_mcg_cap) || ((rec->mr_status & MC_STATUS_TES_STATUS) >> 53) != 0x2 mean? That's cryptic enough to deserve a comment IMO.

717

I think some brief comment explaining the intent here would be helpful.

sys/x86/x86/mca.c
132

This should say seconds, not microseconds. Thanks for catching it. (This also answers your question below.) A man page would also help here. :-)

133

It was a misguided attempt to keep like things together, and avoid defining a function in the middle of variables. However, looking again at the code, I think it will still be quite readable if I move this down to where the handler exists.

171

As above, I meant this to be seconds.

494

Git spelunking (ac64943ca8514382289e6054e8127726a7c40ea3) says:
"Setting hw.mca.log_corrected to 0 will mute corrected errors logging except ones marked as reaching Yellow threshold by hardware."
(And, indeed, these magic numbers match the magic numbers added in 3bdba24c74604b1bb27623cd8304476bbbed69d1.)

Do you think I should add a comment here? Do a subsequent commit to change the magic numbers into defines? Or, ignore this for now?

717

Happy to add one. Are you okay with 50 being statically defined here? Or, do you think it should be either compile-time or run-time changeable? (Or, be a different static value?)

sys/x86/x86/mca.c
132

A man page would be great. :)

494

I'm fine with leaving this alone for now. It'd be nice to make this clearer in a subsequent commit if you plan to keep working in this area.

717

I think having 50 be hardcoded is fine for now.

jtl marked 3 inline comments as done and an inline comment as not done.Wed, Oct 8, 3:28 PM
jtl marked 4 inline comments as done.
jtl marked an inline comment as done.
jtl added inline comments.
sys/x86/x86/mca.c
132

Let me work on one separately from this review. I agree one is needed by now. :-)

jtl marked an inline comment as done.

updates to incorporate review feedback

Seems ok to me.

sys/x86/x86/mca.c
176

Perhaps use the logging lock to protect this update? And set CTLFLAG_MPSAFE in the sysctl flags below, otherwise this handler will be serialized by Giant.

731

This line looks like it might have incorrect whitespace.

This revision is now accepted and ready to land.Wed, Oct 8, 3:40 PM
sys/x86/x86/mca.c
176

Thanks! I added MPSAFE in my local environment. (I thought we'd already switched to MPSAFE by default, but I see we haven't. :-) )

I purposely didn't lock the update. I didn't think it provided a lot of value since there isn't really a synchronization issue that I could see. Am I missing a synchronization corner case?

sys/x86/x86/mca.c
176

I think it's ok without synchronization.

This revision was automatically updated to reflect the committed changes.