Page MenuHomeFreeBSD

mps: Use 64-bit chain structures
ClosedPublic

Authored by imp on Jan 24 2022, 8:23 PM.
Tags
None
Referenced Files
Unknown Object (File)
Feb 10 2024, 10:51 AM
Unknown Object (File)
Feb 3 2024, 5:12 PM
Unknown Object (File)
Dec 25 2023, 11:35 AM
Unknown Object (File)
Dec 24 2023, 1:20 AM
Unknown Object (File)
Dec 23 2023, 12:52 AM
Unknown Object (File)
Oct 25 2023, 5:57 AM
Unknown Object (File)
Oct 4 2023, 9:42 AM
Unknown Object (File)
Aug 27 2023, 2:36 PM
Subscribers

Details

Summary

According to Broadcom, mixing 64-bit SGEs with 32-bit chain entries can
lead to IOC Fault code 0x40000d04. This fault code has been observed to
suddenly increase on certain machines when the OCA firmware images are
deployed. It is unclear why this mixing can lead to trouble, but the
vendor recommends against mixing. Also, both the Linux and NetBSD
drivers don't mix.

Rework the chaining code to use MPI2_SGE_CHAIN64 instead of
MPI2_SGE_CHAIN32. Adjust MPS_SGC_SIZE from 8 to 12 to match the size of
the new structure. Flag the structure as being 64-bits now. Since
MPS_SGE64_SIZE and MPS_SGC_SIZE are the same now, mps_push_sge could be
simplified (after the same fashion of mpr). The different number of
cases collapse to whether or not there's room for the segments and if
not we need a chain, however these changes haven't been made yet as the
current code handles those cases properly with the new defines.

Made chain_busaddr 64-bits, even though we ask for all allocations to be
below 4GB for this tag. Use it to set both parts of the CHAIN64 address
rather than baking the 4GB assumption. Add asserts around the allocation
to detect and BUSDMA bugs in allocation.

Remove asserts and associated comment in mpi_pre_fw_download and
mpi_pre_fw_upload. The code does not, it seems, depend on this
invariant. The mpr driver has similar code, no asserts and also doesn't
depend on this.

Adjust comments to reflect the updated size.

Sponsored by: Netflix

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

imp requested review of this revision.Jan 24 2022, 8:23 PM

mpr does not need this change because mpr already uses exclusively 64-bit chaining entries.

Some time ago I've tried to investigate reported mps(4) firmware crashes after we increased MAXPHYS from 128KB to 1MB. It was odd that on some systems hw.mps.max_io_pages=32 workarounded it, but on some others on opposite provoked the problem. I still don't know what was it, but there are obviously some dragons in the chain handling. I saw that recommendations to not mix 32 and 64 bit elements, but decided not to risk touching it, since we moved from those HBAs to mpr(4) in our products. I have no objections if you can test it. I have vague memories that use of 32-bit chain element there allowed to pack some more I/O into the same SGL there, but I may mix it with some other HBAs.

This revision is now accepted and ready to land.Jan 24 2022, 9:24 PM
In D34016#769094, @mav wrote:

Some time ago I've tried to investigate reported mps(4) firmware crashes after we increased MAXPHYS from 128KB to 1MB. It was odd that on some systems hw.mps.max_io_pages=32 workarounded it, but on some others on opposite provoked the problem. I still don't know what was it, but there are obviously some dragons in the chain handling. I saw that recommendations to not mix 32 and 64 bit elements, but decided not to risk touching it, since we moved from those HBAs to mpr(4) in our products. I have no objections if you can test it. I have vague memories that use of 32-bit chain element there allowed to pack some more I/O into the same SGL there, but I may mix it with some other HBAs.

We have ~10k controllers across a few thousand machines that this will be running on shortly (it's currently running on the dozen worst offenders w/o issue). I'm happy to let that get fully deployed before committing this change if people are nervous (understandable, to be honest, given this went all pear shaped when I forgot to bump MPS_SGC_SIZE).

This revision was automatically updated to reflect the committed changes.