Page MenuHomeFreeBSD

cam: Drop periph lock when completing I/O with ENOMEM status
ClosedPublic

Authored by imp on Wed, May 22, 11:08 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jun 6, 11:17 PM
Unknown Object (File)
Mon, May 27, 1:04 AM
Unknown Object (File)
Fri, May 24, 8:12 PM
Unknown Object (File)
Fri, May 24, 4:18 PM
Unknown Object (File)
Thu, May 23, 11:10 AM
Unknown Object (File)
Thu, May 23, 11:05 AM
Unknown Object (File)
Thu, May 23, 8:48 AM
Subscribers

Details

Summary

When biofinish calls g_io_deliver with an error of ENOMEM, that kicks
off the slowdown protocol, forcing I/O to go through g_down rather than
be directly dispatch. One of the side effects is that the I/O is
resubmitted, so the start routines get called recursively, leading to a
recursive lock panic. Rather than make the periph lock recursive, drop
and reacquire the lock around such calls to biofinish.

For nda, this happens only when we can't allocate space to construct a
TRIM. For ada and da, this is only for certain ZONE operations.

Sponsored by: Netflix

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

imp requested review of this revision.Wed, May 22, 11:08 PM
gallatin added inline comments.
sys/cam/nvme/nvme_da.c
1087

I don't know the code well, but dropping a lock can introduce races.. I've spent a too long chasing my tail on that in the network stack where I suspect a race around dropping lock in a bug i'm working on.

I assumed you considered making the periph lock a recursive lock (eg, MTX_RECURSE) ?

sys/cam/nvme/nvme_da.c
1087

We already drop the lock around the call to the SIM action routine, which is the normal exit path. And the periph lock is used all over CAM... so I'm hesitant to contemplate that...

gallatin added inline comments.
sys/cam/nvme/nvme_da.c
1087

OK, no objections then. I don't really feel qualified to review more than this hunk..

This revision is now accepted and ready to land.Fri, May 24, 5:46 AM