Page MenuHomeFreeBSD

Free MCA entries after logging
ClosedPublic

Authored by jtl on May 31 2019, 5:22 PM.

Details

Summary

Currently, MCA entries remain on an every-growing linked list. This means that it becomes increasingly expensive to process a steady stream of correctable errors.

After logging an entry, free it to the free list. Expand the free list maintenance routine to free or allocate entries to keep the free list in an acceptable range.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

jtl created this revision.May 31 2019, 5:22 PM
jhb added a comment.May 31 2019, 5:38 PM

So I have had tools in the past that parsed the list from the kernel. See https://github.com/freebsd/freebsd/compare/master...bsdjhb:libsmbios_ecc. At the very least I think there should perhaps be a tunable/sysctl to control the behavior.

jtl added a comment.May 31 2019, 5:48 PM
In D20482#442176, @jhb wrote:

So I have had tools in the past that parsed the list from the kernel. See https://github.com/freebsd/freebsd/compare/master...bsdjhb:libsmbios_ecc. At the very least I think there should perhaps be a tunable/sysctl to control the behavior.

Yeah, I wondered if there was a reason like this for the current behavior.

Would you be open to the following sort of (perhaps over-engineered) solution?

  • Split mca_records into two lists: those waiting to be logged and those which have already been logged. (This avoids the CPU spinning through an every-increasing list every time there is a new error.)
  • Create an upper bound on the number of records on the "has been logged" list, which can default to the current behavior of "keep everything".
jhb added a comment.May 31 2019, 5:53 PM

I think having two lists with a cap on already-logged is fine, sure.

jtl updated this revision to Diff 58126.May 31 2019, 8:38 PM

Try 2:
Maintain the last N records in the mca_records list. N is user-configurable and defaults to -1 (unlimited; the current behavior).

When the mca_records list grows beyond N records, free the oldest ones.

imp accepted this revision.May 31 2019, 8:55 PM
This revision is now accepted and ready to land.May 31 2019, 8:55 PM
jhb added inline comments.May 31 2019, 9:38 PM
sys/x86/x86/mca.c
125 ↗(On Diff #58126)

Extra blank line?

562 ↗(On Diff #58126)

This should perhaps be renamed to 'mca_resize_freelist' since it can now shrink as well as grow it.

565 ↗(On Diff #58126)

I have a slight preference for 'desired_max, desired_min' instead of 'low' and 'high'.

569 ↗(On Diff #58126)

Maybe shorten a bit by saying something like:

* Ensure we have at least one record fore each bank and
* one record per CPU, but no more than twice that amount.
* Free records first in case a concurrent thread uses records
* while this thread is freeing them.

One thing you might consider is to do a batched free to avoid banging on the lock as much. Something like this:

STAILQ_HEAD(, mca_internal) tofree;
struct mca_internal *rec, *next;
int desired_min, desired_max;

desired_min = ...
desired_max = ...
mtx_lock_spin(&mca_lock);
if (mca_freecount > desired_max) {
     STAILQ_INIT(&tofree);
     while (mca_freecount > desired_max) {
          rec = STAILQ_FIRST(&mca_freelist);
          STAILQ_REMOVE_HEAD(&mca_freelist, link);
          STAILQ_INSERT_TAIL(&tofree, rec, link);
          mca_freecount--;
     }
     mtx_unlock_spin(&mca_lock);
     STAILQ_FOREACH_SAFE(rec, &tofree, link, next) {
          free(rec, M_MCA);
     }
     mtx_lock_spin(&mca_lock);
}
while (mca_freecount < desired_min) {
     ...
}
jtl updated this revision to Diff 58137.May 31 2019, 11:25 PM

Incorporate feedback from @jhb.

This revision now requires review to proceed.May 31 2019, 11:25 PM
jtl marked 4 inline comments as done.May 31 2019, 11:27 PM
jtl added inline comments.
sys/x86/x86/mca.c
125 ↗(On Diff #58126)

Removed.

562 ↗(On Diff #58126)

Done. I also changed other "refill" things to "resize".

565 ↗(On Diff #58126)

Changed. (Doh! count is now not in alphabetical order. I'll try to remember to fix that before committing.)

569 ↗(On Diff #58126)

I mostly did this, but implemented it slightly differently.

jhb accepted this revision.Jun 6 2019, 12:14 AM
jhb added inline comments.
sys/x86/x86/mca.c
591 ↗(On Diff #58137)

For this I think you can use STAILQ_CONCAT to join the lists as a single O(1) operation.

    mtx_lock_spin(&mca_lock);
    STAILQ_CONCAT(&mca_freelist, &tmplist);
}
633 ↗(On Diff #58137)

I feel like we can maybe drop this now since you are doing this in the interrupt handler that calls this as well. In particular, this might queue the task even though the interrupt handler is going to free the record after logging it.

This revision is now accepted and ready to land.Jun 6 2019, 12:14 AM
jtl marked 4 inline comments as done.Jun 6 2019, 12:43 AM
jtl added inline comments.
sys/x86/x86/mca.c
591 ↗(On Diff #58137)

Good point! I'll make the change.

633 ↗(On Diff #58137)

The interrupt handler will only queue the task if it actually does free a record. By default (hw.mca.maxcount == -1), it won't free any records, so it won't queue the task. If a user changes hw.mca.maxcount to a non-negative number, we will probably queue the task more than necessary. AFAICT, the only reliable way around that is to only queue the task in the interrupt handler after logging. However, that may delay refilling the free list longer than may be desirable. Or, is that not an issue and we should just unconditionally queue the task at the end of the interrupt handler?

jhb added inline comments.Jun 6 2019, 6:05 PM
sys/x86/x86/mca.c
633 ↗(On Diff #58137)

It probably is fine to just wait for the end of the interrupt handler and do it if the scan found anything. (So just 'if (!cold)' instead of needing the 'doresize' check.) It will be simpler all around I think.

jtl updated this revision to Diff 58354.Jun 7 2019, 9:49 AM

Changes based on review:

  • Switched to using STAILQ_CONCAT to refill the free list.
  • Dropped redundant calls to resize the free list.
  • Centralized the logging logic to reduce code duplication.
This revision now requires review to proceed.Jun 7 2019, 9:49 AM
jtl marked 3 inline comments as done.Jun 7 2019, 10:15 AM

In my local tree, I added functions to generate fake machine check records and exercise the logic now found in mca_process_records(). It appears to work correctly. I *think* this is now ready to land.

imp accepted this revision.Jun 7 2019, 6:36 PM
This revision is now accepted and ready to land.Jun 7 2019, 6:36 PM
jhb accepted this revision.Jun 7 2019, 7:03 PM
This revision was automatically updated to reflect the committed changes.