Page MenuHomeFreeBSD

mpi3mr: Only set callout_owned when we create a timeout
ClosedPublic

Authored by imp on Nov 10 2023, 12:19 AM.
Tags
None
Referenced Files
F93357526: D42541.id.diff
Mon, Sep 9, 3:18 AM
Unknown Object (File)
Wed, Sep 4, 9:37 AM
Unknown Object (File)
Wed, Sep 4, 7:08 AM
Unknown Object (File)
Wed, Aug 21, 12:39 AM
Unknown Object (File)
Mon, Aug 12, 1:10 AM
Unknown Object (File)
Jul 30 2024, 10:43 PM
Unknown Object (File)
Jul 29 2024, 8:48 PM
Unknown Object (File)
Jul 20 2024, 7:04 AM
Subscribers
None

Details

Summary

Since we assume there's a timeout to cancel when this is true, only set
it true when we set the timeout. Otherwise we may try to cancel a timeout
when there's been an error in submission.

Sponsored by: Netflix

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

imp requested review of this revision.Nov 10 2023, 12:19 AM
imp created this revision.

Makes sense, but while there I wonder what is the SBT_1S * 90 timeout? Why not the value in CCB received from CAM? Is it something else?

This revision is now accepted and ready to land.Nov 14 2023, 11:58 PM

update per mav: set the timeout to be what's in the ccb.

This revision now requires review to proceed.Nov 15 2023, 3:04 AM

This patch looks OK to me, but I generally have difficulties to understand why we need this callout_owner flag. callout_stop() should not be a reason. I am just not sure about semantics of mpi3mr_flush_io().

This revision is now accepted and ready to land.Nov 15 2023, 2:51 PM
In D42541#972181, @mav wrote:

This patch looks OK to me, but I generally have difficulties to understand why we need this callout_owner flag. callout_stop() should not be a reason. I am just not sure about semantics of mpi3mr_flush_io().

It looks like it's to respond to pending I/O when a soft reset happens from the watchdog thread.

I'm not sure that mpi3mrflush_io() is correct, though it might be. If the goal is to cancel all CAM I/O to the card (but not SIM internal commands that lack a ccb), then it is almost correct. I think the cmd->ccb != NULL test is redundant with the callout_owner test. Either the cmd has a ccb and the ccb is pending *OR* ccb == NULL. At least at least after all mtx_unlock(mpi3mr_mtx) calls. We can't have ccb == NULL and callout_owner == true. However, it looks like mpi3mr_flush_io is called w/o the mpimr_mtx lock held, so we could be racing the ccb != NULL test and the callout_owner=true tests... but that seems somewhat unsafe. I'm not sure the implications of this, though. Or if the redundancy provides any additional safety... My couple of minute look suggests that we should take the lock for the life of flush_io, but that also means we need a mpi3mr_cmd_done that doesn't take the lock.... and there might be implications for the mpi3mr_release_command's different locks and lock ordering if we hold the mpi3mr_mtx lock over calls to it (but it may already be called in that context because we already call it from mpi3mr_map_request which is called from the sim action routine and mpi3mr_mtx is the SIM Lock.

So I think we can just eliminate callout_owner, but that's far from a certainty (though I think any changes in behavior would be to eliminate races). I'll leave it for another review and for when I can test with a device/card that's flakey and needs a soft reset. I don't want to stack changes too deep.