Page MenuHomeFreeBSD

OCS_FC: Wait for a specific period of time prior to telling the OS about lost device.
ClosedPublic

Authored by ram on Jul 9 2018, 12:27 PM.

Details

Summary

Issue: SCSI Device names change while doing port-toggle along with IO.

Cause: We use node instance number as the target id. So, when the link toggle happens we create new node that leads to differnt target numbers and names in the OS.

Fix: Added LDT(Device Lost Timer)- we wait a specific period of time prior to telling the OS about lost device.

Added targets database which assoiates with the remote nodes.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Overall it looks ok; see the issues I highlighted inline. I assume you've done some testing on it? I compiled it on amd64 but otherwise I haven't tested it. Also, please send me and Alexander the proposed commit message.

sys/dev/ocs_fc/ocs.h
69 ↗(On Diff #45071)

Looks like you have a space/tab inconsistency in this structure. I would suggest changing it all to tabs.

sys/dev/ocs_fc/ocs_cam.c
1082 ↗(On Diff #45071)

This comment makes it sound like the timer is active all the time. I had to look at the code to tell that it is only activated when a device goes away, and it only stays active if there is still work to do.

Please change the comment to explain that it's not polling once a second all the time.

1510 ↗(On Diff #45071)

Looks like you've got an extra space in there.

2251 ↗(On Diff #45071)

The declarations usually go in one contiguous section, so please take out this blank line.

Fixed as per the review comments.

We have tested the changes.
Tests:

  1. Port toggle on target side with less than 30sec interval and more than 30 seconds interval.
  2. Port toggle on initiator side with less than 30sec interval and more than 30 seconds interval.

Approved. Looks good, thank you!

This revision is now accepted and ready to land.Jul 17 2018, 2:10 PM

The general motivation is good for me. Very alike approach is used in isp(4) driver, may be just for some additional reasons. But unless I missed something, I think this patch of yours is incomplete -- it implements the translation table for initiator role, but not for target. If the HBA is capable of playing both roles same time, there will be a mess in CAM and CTL. isp(4) driver originally also had the same issue, but I fixed it there few years ago, and now target and initiator roles may successfully coexist.

sys/dev/ocs_fc/ocs_cam.c
1793 ↗(On Diff #45150)

Here and below, when whole target is gone, proper error code would be CAM_SEL_TIMEOUT, since CAM_DEV_NOT_THERE will destroy only one LUN.

ram marked 4 inline comments as done.Jul 17 2018, 5:19 PM
In D16196#345939, @mav wrote:

The general motivation is good for me. Very alike approach is used in isp(4) driver, may be just for some additional reasons. But unless I missed something, I think this patch of yours is incomplete -- it implements the translation table for initiator role, but not for target. If the HBA is capable of playing both roles same time, there will be a mess in CAM and CTL. isp(4) driver originally also had the same issue, but I fixed it there few years ago, and now target and initiator roles may successfully coexist.

Yes, you are right. I implemented this for initiator role. After checking-in these changes I will test and fix the issues related to I+T role.