Page MenuHomeFreeBSD

aacraid: log DMA failures
Needs ReviewPublic

Authored by luporl on Jun 17 2021, 12:50 PM.

Details

Summary

Log an error message when bus_dmamap_load() fails to load the
buffer to DMA area and invokes the callback function with an
error.

This makes it much easier to figure out what is going wrong and why a read/write command failed.

It is hard to abort the I/O operation at aacraid DMA callback and propagate the error to upper layers, but at least we can log the error and help whoever is trying to debug an issue.

Test Plan

Tested on a Talos II machine with an aacraid controller.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 41866
Build 38754: arc lint + arc unit

Event Timeline

sys/dev/aacraid/aacraid.c
1319

Does this need a rate limiter?

Address review's comments

luporl added inline comments.
sys/dev/aacraid/aacraid.c
1319

Yes, it could generate a lot of messages.
Limiting it to 1 message in each 10 seconds (mostly because I prefer it to be displayed more often than not and this error end up compromising the system in a short time).

@imp I've added the rate limiter, do you think this revision is ok now?

scottl requested changes to this revision.Jul 23 2021, 2:42 PM

I've put rate limiters into other drivers and have been unhappy with the results. Whether it's once a second, 10 seconds, 100 seconds, or whatever, an unattended system will wind up with thousands of messages in its logs that obfuscate more important messages. I think a better tactic is to print a message once, and maintain a sysctl with a counter.

This revision now requires changes to proceed.Jul 23 2021, 2:42 PM

Address reviewer's comments:

  • log DMA load buffer error message only once for each device
  • add a per device error counter, exported via sysctl (inspired by mpr code)
sys/dev/aacraid/aacraid.c
227–229

it looks like this was never used before

253

does it make sense to put this under dev.aacraid instead? I think the sysctl setup is simplified by doing so

sys/dev/aacraid/aacraid.c
253

Correct, it should be dev.aacraid.

luporl added inline comments.
sys/dev/aacraid/aacraid.c
253

Ok, it's now always added to dev.aacraid and setup is skipped if for some reason sysctl_ctx or sysctl_tree is NULL.

luporl added inline comments.
sys/dev/aacraid/aacraid.c
227–229

It's really not used. Do you think it's better to remove it, or just ignore it?

sys/dev/aacraid/aacraid.c
227–229

I would delete it in a separate commit, either before or after this change.

Is this change good to go now?

sys/dev/aacraid/aacraid.c
227–229

Ok, I'll delete it after this change.