Page MenuHomeFreeBSD

mca: Allow for passing ECC error record to memory controller driver
AbandonedPublic

Authored by stevek on Apr 18 2023, 5:29 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Dec 10, 5:57 PM
Unknown Object (File)
Sep 11 2024, 11:30 AM
Unknown Object (File)
Sep 6 2024, 12:36 AM
Unknown Object (File)
Sep 5 2024, 11:01 AM
Unknown Object (File)
Aug 7 2024, 10:53 AM
Unknown Object (File)
Jul 28 2024, 2:33 AM
Unknown Object (File)
Jul 25 2024, 7:58 AM
Unknown Object (File)
Jul 20 2024, 6:52 AM
Subscribers

Details

Reviewers
mav
markj
kib
Summary

Added (de)registration functions for memory controller driver to be
notified when ECC errors occur. This allows for decoding of the
specific error by the driver.

Sponsored by: Juniper Networks, Inc.
Obtained from: Semihalf

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 51007
Build 47898: arc lint + arc unit

Event Timeline

I have no objections in general, if you find it useful, just a few on the implementation.

sys/x86/x86/mca.c
441

Cosmetics, but why "aux", not "arg"?

446

Is there any reason to not just assert it here and at the next function? It is not some user API where anything may happen.

453

Why not just "else"?

455

As I see, mca_log() in some cases may be called in MC context without the lock held. It would be good it argument was there first. And mca_decode_record() could explicitly use local variable when checking for NULL and calling. Or may be all this logic could be changed to atomics for both register and deregister.

sys/x86/x86/mca.c
441

It's what the original authors (Semihalf) had used. I have no problem changing it.
All of this code is from Semihalf.

Note the person in Juniper that submitted it for review did not provide any Semihalf userid to use for the Author.

446

I'm fine with just using KASSERT here and the next function.

453

I agree else would be better

455

mca_log() should probably be asserting that the spin lock is held and the places that don't currently take the lock should do so.

In D39661#902586, @mav wrote:

I have no objections in general, if you find it useful, just a few on the implementation.

See D39663 for a user of this functionality.

sys/x86/x86/mca.c
455

It is not safe to take the same lock in and out of MC context, since it may cause deadlocks, so no it should not.

sys/x86/x86/mca.c
665

This place is good if you need to just decode additional information. But I guess it may be not good if you need additional processing, since execution may not get here, or get later.

sys/x86/x86/mca.c
455

Ah right, because you could holding the lock already when there is a MC triggered.
An atomic should work fine then to register/deregister.

665

The Pondicherry2 driver (D39663) is just decoding additional information to display the channel, DIMM, rank, bank, column and row if/when the there is an ECC error.

sys/x86/include/mca.h
51

Replaced by review D45072, will address the review comments from this review in the newer one.