Page MenuHomeFreeBSD

expand scope of da refcount to protect I/Os in flight
ClosedPublic

Authored by chs on May 14 2018, 8:22 PM.

Details

Summary

When a disk disappears and the periph is invalidated, any I/Os that are pending with the controller can cause a crash when they complete. This diff holds the softc reference count taken in dastart() until the I/O is complete rather than only until xpt_action() returns. (This approach was suggested by Ken Merry.)

Note that the ada and nda drivers likely need similar changes, this addresses only the da driver since that's all I can test so far.

Test Plan

This patch has been tested with script that repeated uses lsiutil to repeated cause a drive to disappear while a bunch of I/O in progress. With the patch the crashes in the CAM code are gone (though panics in other subsystems are also triggered, I'll submit patches for those as well).

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

chs created this revision.May 14 2018, 8:22 PM
chs edited the summary of this revision. (Show Details)May 14 2018, 8:30 PM
imp accepted this revision.May 14 2018, 8:34 PM

Looks good to me. One tiny question that occurred to me after I looked at it this morning. I think we're fine delaying the close for up to a tick so we don't have to try to send a bunch of wake ups during normal operation. Also did the obligatory lock check, and we're good there as well: refcount is only changed while holding the periph lock, and cam_periph_sleep uses the same lock to sleep on.

FreeBSD/sys/cam/scsi/scsi_da.c
4409 ↗(On Diff #42548)

in daclose we have:

while (softc->refcount != 0)
        cam_periph_sleep(periph, &softc->refcount, PRIBIO, "daclose", 1);

do we need to wake that up, or just rely on the timeout?

This revision is now accepted and ready to land.May 14 2018, 8:34 PM
scottl accepted this revision.May 14 2018, 9:39 PM
ken accepted this revision.May 14 2018, 10:52 PM

Looks good. I'm glad this seems to have fixed the problem in the da(4) driver at least. I agree, ada(4) and nda(4) should probably have the same change since they'll have the same issue.

FreeBSD/sys/cam/scsi/scsi_da.c
4409 ↗(On Diff #42548)

I guess you could do either. With a 1/hz timeout, this turns into something of a busy wait.

This revision was automatically updated to reflect the committed changes.