Page MenuHomeFreeBSD

Fix use after free panic and state transitions in mps(4) and mpr(4)
AcceptedPublic

Authored by ken on Fri, Jun 26, 8:53 PM.

Details

Reviewers
imp
scottl
slm
Summary
Fix mpr(4) and mps(4) state transitions and a use-after-free panic.
 
 When the mpr(4) and mps(4) drivers probe a SATA device, they issue
 an ATA Identify command (via mp{s,r}sas_get_sata_identify()) before
 the target is fully setup in the driver.  The drivers wait for
 completion of the identify command, and have a 5 second timeout.
 If the timeout fires, the command is marked with the
 SATA_ID_TIMEOUT flag so it can be freed later.
 
 That is where the use-after-free problem comes in.  Once the
 ATA Identify times out, the driver sends a target reset, and then
 frees any identify commands that have timed out.  But, once the
 target reset completes, commands that were queued to the drive are
 returned to the driver by the controller.
 
 At that point, the driver (in mp{s,r}_intr_locked()) looks up the
 command descriptor for that particular SMID, marks it CM_STATE_BUSY
 and sends it on for completion handling.
 
 The problem at this stage is that the command has already been
 freed, and put on the free queue, so its state is CM_STATE_FREE.
 If INVARIANTS are turned on, we get a panic as soon as this command
 is allocated, because its state is no longer CM_STATE_FREE, but
 rather CM_STATE_BUSY.
 
 So, the solution is to not free ATA Identify commands that get
 stuck until they actually return from the controller.  Hopefully
 this works correctly on older firmware versions.  If not, it could
 result in commands hanging around indefinitely.  But, the
 alternative is a use-after-free panic or assertion (in the
 INVARIANTS case).
 
 This also tightens up the state transitions between
 CM_STATE_FREE, CM_STATE_BUSY and CM_STATE_INQUEUE, so that the
 state transitions happen once, and we have assertions to make sure
 that commands are in the correct state before transitioning to the
 next state.
 Also, for each state assertion, we print out the current state of
 the command if it is incorrect.
 
 mp{s,r}.c:      Add a new sysctl variable, dump_reqs_alltypes,
                 that controls the behavior of the dump_reqs sysctl.
                 If dump_reqs_alltypes is non-zero, it will dump
                 all commands, not just the commands that are in the
                 CM_STATE_INQUEUE state.  (You can see the commands
                 that are in the queue by using mp{s,r}util debug
                 dumpreqs.)
 
                 Make sure that the INQUEUE -> BUSY state transition
                 happens in one place, the mp{s,r}_complete_command
                 routine.        
 
 mp{s,r}_sas.c:  Make sure we print the current command type in
                 command state assertions.
 
 mp{s,r}_sas_lsi.c:
                 Add a new completion handler,
                 mp{s,r}sas_ata_id_complete.  This completion
                 handler will free data allocated for an ATA
                 Identify command and free the command structure.
 
                 In mp{s,r}_ata_id_timeout, do not set the command
                 state to CM_STATE_BUSY.  The command is still in
                 queue in the controller.  Since we were blocking
                 waiting for this command to complete, there was
                 no completion handler previously.  Set the
                 completion handler, so that whenever the command
                 does come back, it will get freed properly.
 
                 Do not free ATA Identify commands that have timed
                 out in mp{s,r}sas_add_device().  Wait for them
                 to actually come back from the controller.
 
 mp{s,r}var.h:   Add a dump_reqs_alltypes variable for the new
                 dump_reqs_alltypes sysctl.
 
                 Make sure we print the current state for state
                 transition asserts.
Test Plan

To best test this, you need a machine with a SAS expander, and SATA drives attached to the expander. The more drives the better.

Build a kernel with INVARIANTS, and then:

You will be cycling the PHYs for the drives in question on the expander. You can script it, for example:

for i in 10 11 12 13 14 15 16 17 18 19; do while true; do camcontrol smppc ses0 -p $i -o disable; sleep 10; camcontrol smpc ses0 -p $i -o linkreset; sleep 10; done & done

If you run for 24 hours without an assert, you can probably call this fixed.

Once you're done with the test, kill all of the jobs, and then reset the PHYs so that the drives come back. For example:

for i in 10 11 12 13 14 15 16 17 18 19; do camcontrol smppc ses0 -p $i -o linkreset; done

Once all of the drives have come back successfully (make sure that they all come back!), make sure there are not any stuck commands.

To do that:

sysctl dev.mps.0.dump_reqs_alltypes=1
(Use "mpr" for 12Gb controllers, and do that for each controller instance number, 0, 1, 2, etc.)

Then, using either mpsutil or mprutil (6Gb or 12Gb), do this:

mpsutil debug dumpreqs |grep SMID |awk '{if ($4 != 0){print $0}}'

That will show commands that are not free. If you have I/O active, you may see commands that are busy (state 1) and in queue (state 2), but they should be transient. If they're stuck, then we may have a bug.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 31992

Event Timeline

ken created this revision.Fri, Jun 26, 8:53 PM
ken requested review of this revision.Fri, Jun 26, 8:53 PM
imp accepted this revision.Tue, Jun 30, 4:33 PM

I only looked at mpr, but this looks good to me. It's good you have a way to recreate it. I have random machines failing with this (about ~0.001%/day), but can't find one to recreate it. I like this approach, and is similar to the one I took with other commands and target reset. thanks for fixing.

This revision is now accepted and ready to land.Tue, Jun 30, 4:33 PM