Page MenuHomeFreeBSD

expand scope of da refcount to protect I/Os in flight
ClosedPublic

Authored by chs on May 14 2018, 8:22 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Apr 14, 5:02 AM
Unknown Object (File)
Dec 20 2023, 5:25 AM
Unknown Object (File)
Nov 15 2023, 1:38 PM
Unknown Object (File)
Oct 13 2023, 4:03 PM
Unknown Object (File)
Oct 13 2023, 3:55 PM
Unknown Object (File)
Jun 20 2023, 5:14 PM
Unknown Object (File)
May 6 2023, 11:08 AM
Unknown Object (File)
Mar 3 2023, 4:34 PM
Subscribers
None

Details

Summary

When a disk disappears and the periph is invalidated, any I/Os that are pending with the controller can cause a crash when they complete. This diff holds the softc reference count taken in dastart() until the I/O is complete rather than only until xpt_action() returns. (This approach was suggested by Ken Merry.)

Note that the ada and nda drivers likely need similar changes, this addresses only the da driver since that's all I can test so far.

Test Plan

This patch has been tested with script that repeated uses lsiutil to repeated cause a drive to disappear while a bunch of I/O in progress. With the patch the crashes in the CAM code are gone (though panics in other subsystems are also triggered, I'll submit patches for those as well).

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

Looks good to me. One tiny question that occurred to me after I looked at it this morning. I think we're fine delaying the close for up to a tick so we don't have to try to send a bunch of wake ups during normal operation. Also did the obligatory lock check, and we're good there as well: refcount is only changed while holding the periph lock, and cam_periph_sleep uses the same lock to sleep on.

FreeBSD/sys/cam/scsi/scsi_da.c
4409

in daclose we have:

while (softc->refcount != 0)
        cam_periph_sleep(periph, &softc->refcount, PRIBIO, "daclose", 1);

do we need to wake that up, or just rely on the timeout?

This revision is now accepted and ready to land.May 14 2018, 8:34 PM

Looks good. I'm glad this seems to have fixed the problem in the da(4) driver at least. I agree, ada(4) and nda(4) should probably have the same change since they'll have the same issue.

FreeBSD/sys/cam/scsi/scsi_da.c
4409

I guess you could do either. With a 1/hz timeout, this turns into something of a busy wait.

This revision was automatically updated to reflect the committed changes.