Page MenuHomeFreeBSD

mpr(4) and mps(4) diagnostic path fixes
ClosedPublic

Authored by ken on Dec 11 2017, 7:33 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mar 13 2024, 10:57 PM
Unknown Object (File)
Feb 9 2024, 9:23 AM
Unknown Object (File)
Dec 29 2023, 9:55 AM
Unknown Object (File)
Dec 20 2023, 11:20 AM
Unknown Object (File)
Dec 20 2023, 2:24 AM
Unknown Object (File)
Dec 3 2023, 12:35 AM
Unknown Object (File)
Nov 6 2023, 9:42 AM
Unknown Object (File)
Sep 5 2023, 1:33 AM
Subscribers

Details

Summary

This fixes several bugs in the diagnostic buffer code in the mpr(4) and mps(4) drivers.

There are times when the driver reinitializes the firmware, and that could cause panics. We also fail to check the status of bus_dmamap_load() and we return incorrect status if we fail to create a busdma tag.

The commit messages for the mpr(4) driver in Spectra's repository are below. The mps(4) driver is pretty much the same.

Change 1341263 on 2017/11/21 by kenm@ken.redline14

Fix several bugs in the diagnostic path in mpr_user.c.

o In mpr_post_fw_diag_buffer() and mpr_release_fw_diag_buffer(),
  check the reply to see whether it is NULL.  It can be NULL (and
  the command non-NULL) if the controller gets reinitialized
  while we're waiting for the command to complete but the
  driver structures aren't reallocated.  The driver structures
  generally won't be reallocated unless there is a firmware upgrade
  that changes one of the IOCFacts.

o When freeing diagnostic buffers in mpr_diag_register() and
  mpr_diag_unregister(), zero/NULL out the buffer after freeing it.
  This will prevent a duplicate free in some situations.

Change 1343313 on 2017/12/06 by kenm@ken.redline14

Fix a couple of busdma issues in the mpr(4) driver.

In mpr_diag_register(), which is used to register diagnostic
buffers with the mpr(4) firmware, we allocate DMAable memory.

There were several issues here:
 o No checking of the bus_dmamap_load() return value.  If the load
   failed or got deferred, mpr_diag_register() continued on as if
   nothing had happened.  We now check the return value and bail
   out if it fails.

 o No waiting for a deferred load callback.  bus_dmamap_load()
   calls a supplied callback when the mapping is done.  This is
   generally done immediately, but it can be deferred.
   mpr_diag_register() did not check to see whether the callback
   was already done before proceeding on.  We now sleep until the
   callback is done if it is deferred.

 o No call to bus_dmamap_sync(... BUS_DMASYNC_PREREAD) after the
   memory is allocated and loaded.  This is necessary on some
   platforms to synchronize host memory that is going to be updated
   by a device.

mprvar.h:
        Add a new structure, struct mpr_busdma_context, that is
        used for deferred busdma load callbacks.

        Add a prototype for mpr_memaddr_wait_cb().
mpr.c:
        Add a new busdma callback function, mpr_memaddr_wait_cb().
        This provides synchronization for callers that want to
        wait on a deferred bus_dmamap_load() callback.

mpr_user.c:
        In bus_dmamap_register(), add a call to bus_dmamap_sync()
        with the BUS_DMASYNC_PREREAD flag set after an allocation
        is loaded.

        Also, check the return value of bus_dmamap_load().  If it
        fails, bail out.  If it is EINPROGRESS, wait for the
        callback to happen.  We use an interruptible sleep (msleep
        with PCATCH) and let the callback clean things up if we get
        interrupted.

Change 1343965 on 2017/12/11 by kenm@ken.redline14

Fix a return value in mpr_diag_register().

In the case of a failure to create a busdma tag, we were returning
ENOMEM instead of MPR_DIAG_FAILURE.  The caller expects an MPR_
status returned, not an errno.
Test Plan

Write a script that toggles an expander PHY on and off to cause a drive to go away and come back repeatedly.

Setup the controller to record a diagnostic buffer.

Watch the /var/log/messages for mprsas_prepare_remove and pull a diagnostic buffer each time a drive is removed.

Eventually the controller will report an IOC fault (PCIe error) or a command sent to the controller will time out after 30 seconds. In either case, the driver will reinit the controller. Without these changes, the driver will panic.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 13847

Event Timeline

Generally looks good to me. I suppose it may need some more bus_dmamap_sync() calls for completeness.

Also looking through the code I see plenty of other places where mpr_memaddr_cb() is used both without status check or even BUS_DMA_NOWAIT flag.

sys/dev/mps/mps_user.c
1509

What's about calling it with BUS_DMASYNC_POSTREAD at some point?

This revision is now accepted and ready to land.Dec 11 2017, 10:19 PM

Added bus_dmamap_sync(..., BUS_DMASYNC_POSTREAD) before copying diagnostic buffers out to userland.

This revision now requires review to proceed.Dec 26 2017, 9:23 PM
In D13453#281205, @mav wrote:

Generally looks good to me. I suppose it may need some more bus_dmamap_sync() calls for completeness.

Good point. I added calls before we copy diag buffers out to userland.

Also looking through the code I see plenty of other places where mpr_memaddr_cb() is used both without status check or even BUS_DMA_NOWAIT flag.

I agree. There are other places, especially on setup, where the drivers don't check any status. And we need to audit and see if there are other places where we should sync.

This revision is now accepted and ready to land.Jan 23 2018, 4:54 PM