When link_active_on_if_down flag is disabled and link is turned
down with ifconfig FW reports a false positive link event
about unqualified transceiver. Condition used in driver to
filter out those false positive events was incorrect and caused
that information about unqualified module was not reported
also when the event was valid. Change the condition to relay
on IFF_UP flag instead of link_active_on_if_down and bump
driver version to 2.3.1-k.
Details
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 40643 Build 37532: arc lint + arc unit
Event Timeline
Also, in original change, https://reviews.freebsd.org/D28028, I noticed that after executing ixl_set_link(pf, false), the PHY capabilities query for an_info has I40E_AQ_QUALIFIED_MODULE unset. So, the same supported/qualified module becomes unqualified.
I think the crux of the problem is with ixl_set_link() unsetting I40E_AQ_QUALIFIED_MODULE.
sys/dev/ixl/ixl_pf_iflib.c | ||
---|---|---|
419–420 | This change appears to suppress "unqualified module" message if the link for any reason go down. |
There is no other way than setting phy_type for the driver to reliably disable and re-enable a link. The side effect is that when link is disabled, FW unsets I40E_AQ_QUALIFIED_MODULE flag. To avoid logging false positive message about unqualified module, we need to filter out in the driver events received from FW after interface is brought down with ifconfig.
sys/dev/ixl/ixl_pf_iflib.c | ||
---|---|---|
419–420 | The IFF_UP flag is controlled by ifconfig and it does not depend on the state of a link reported by FW. When interface is brought up by an user and FW reports link down due to unqualified module the message is going to be reported. |
Krzysztof,
So, thinking on this, my guess is that when you reboot the machine, we would be finding an "unqualified" for a qualified cable because FW see this as link down. Also, my guess, the cable will show up as unqualified when you shut the link on link-partner.
I have applied the patch and rebooted my machine and I see "unqualified" message for a good cable.
The good thing is that with this patch I donot see "unqualified" message for a admin link-down.
I'm testing both scenarios with this transceiver:
plugged: SFP/SFP+/SFP28 10G Base-SR (LC)
vendor: Intel Corp PN: AFBR-703SDZ-IN2 SN: AD1432A0AY9 DATE: 2014-08-11
and I don't see the unqualified message in the dmesg
Shutting down a link on the link partner does not affect reporting by FW if module is qualified.
Could you, please, provide your configuration (rc.conf, loader.conf) and exact steps for reproduction?
Yes, Krzysztof, for immediate partner-link-down event, I donot see the issue too. But upon overnight link-toggle tests and I do see "unqualified" message.
Could you, please, provide your configuration (rc.conf, loader.conf) and exact steps for reproduction?
I have connected the cable back-to-back between two servers and rebooted them both at a time. During boot, on both the nodes, immediately after the driver is attached, receives link-event and notices MEDIA_AVAILABLE + IFF_UP + UNQUALIFIED + NO_LINK_UP.
My cable is
plugged: QSFP+ 40GBASE-CR4 (No separable connector)
vendor: Molex Inc. PN: 112-00322 SN: 524720492
And nodes are NetApp platforms. My code base is not HoL. I just patched this change into my code-base to give it a try.
Again Krzysztof, I presume your tests have enabled IXL_PF_STATE_LINK_ACTIVE_ON_DOWN.
I'm testing using 4 port adapter with following config:
/boot/loader.conf:
dev.ixl.0.link_active_on_if_down=1
dev.ixl.3.link_active_on_if_down=0
/etc/rc.conf:
ifconfig_ixl0=190.2.20.1/16
ifconfig_ixl3=190.3.20.1/16
To ensure that link state is correct driver during attach sets it according to the link_active_on_if_down tunable. This triggers a link event but during attach IFF_UP flag is not set:
ixl3: ixl_set_link enable: 0
ixl3: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 128
ixl3: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
Then interface is brought up with an ioctl call:
ixl3: ixl_set_link enable: 1
ixl3: ixl_link_event IFF_UP: 1 MA: 64, LUP: 1 QUAL: 128
ixl3: Link is up, 10 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: False, Flow Control: None
ixl3: link state changed to UP
With link_active_on_if_down=1 FW correctly reports that module is qualified in every link event:
ixl0: ixl_set_link enable: 1
ixl0: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 128
ixl0: ixl_link_event IFF_UP: 0 MA: 64, LUP: 1 QUAL: 128
ixl0: Link is up, 10 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: False, Flow Control: None
ixl0: link state changed to UP
ixl0: ixl_set_link enable: 1
ixl0: ixl_link_event IFF_UP: 1 MA: 64, LUP: 0 QUAL: 128
ixl0: link state changed to DOWN
ixl0: ixl_link_event IFF_UP: 1 MA: 64, LUP: 1 QUAL: 128
ixl0: Link is up, 10 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: False, Flow Control: None
ixl0: link state changed to UP
Kryztof,
My run almost matches with your test result but have following differences,
On node reboot (where the link-partner is some cisco switch).
- During attach, when ixl_set_link(pf, 0) get invoked, like you said I had the ixl_link_event()
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 128
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0 <<<< repeats some 50+ times
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
...
...
- NetApp networking stack brings up the link (this is nothing but ifhwioctl() gets invoked to bring-up the link)
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 0 QUAL: 0 <<<<<
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 1 QUAL: 128
So I have a spurious link-event where I40E_AQ_LINK_UP is unset and I40E_AQ_QUALIFIED_MODULE is unset. At this stage, I get "unqualified message" on working/qualified transceiver.
Not really sure why I get a spurious link-event but not at your side. This has something to do with link-auto-negotiation and the output depends on link-partner. I think its pretty much ok for link to bounce while negotiating.
May be we need to wait for negotiation to complete before checking & printing "unqualified message" ?
On admin-link-down case (i.e ifconfig e2a down), I see,
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 128
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
On admin-link-up, (i.e, ifconfig e2a up), I see,
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 1 QUAL: 128
In this case, there is no spurious event.
I'm still not able to reproduce this issue with any of my switches. Could you, please, modify printf in the ixl_link_event function to dump hex values of status->link_info, status->an_info, hw->phy.link_info.link_info and hw->phy.link_info.an_info, and send me the log?
Kryzsztof,
- During attach, ixl_set_link(pf, 0) gets invoked and I have
e2a: ixl_link_event Status_LN: 0xca, HW_LN: 0xca, Status_AN: 0x80, HW_AN: 0x80
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 128
..
e2a: ixl_link_event Status_LN: 0xca, HW_LN: 0xca, Status_AN: 0x0, HW_AN: 0x0 <<<< repeats some 50+ times
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
e2a: ixl_link_event Status_LN: 0xca, HW_LN: 0xca, Status_AN: 0x0, HW_AN: 0x0
e2a: ixl_link_event IFF_UP: 0 MA: 64, LUP: 0 QUAL: 0
..
..
- Link bring-up ioctl
e2a: ixl_link_event Status_LN: 0xca, HW_LN: 0xca, Status_AN: 0x0, HW_AN: 0x80 <<<<< spurious event
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 0 QUAL: 0 <<< spurious event
e2a: ixl_link_event Status_LN: 0xe1, HW_LN: 0xe1, Status_AN: 0x80, HW_AN: 0x80
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 1 QUAL: 128
Similar run on another machine has same effect. All logs are same except the "spurious" event where hw->phy.link_info.link_info is now 0xD2.
e2a: ixl_link_event Status_LN: 0xca, HW_LN: 0xD2, Status_AN: 0x0, HW_AN: 0x80 <<<
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 0 QUAL: 0
e2a: ixl_link_event Status_LN: 0xe1, HW_LN: 0xe1, Status_AN: 0x80, HW_AN: 0x80
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 1 QUAL: 128
- Do not rely on information from link event
After delivering a link event FW disables such events until
they are re-enabled with AQC call. It is possible that
link state changes before events are re-enabled and driver
may miss that. To avoid such situation do not relay on information
from the event. Instead use most recent status info retrieved
with a Get Link Status call.
e2a: ixl_link_event Status_LN: 0xca, HW_LN: 0xca, Status_AN: 0x0, HW_AN: 0x80 <<<<< spurious event
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 0 QUAL: 0 <<< spurious event
e2a: ixl_link_event Status_LN: 0xe1, HW_LN: 0xe1, Status_AN: 0x80, HW_AN: 0x80
e2a: ixl_link_event IFF_UP: 1 MA: 64, LUP: 1 QUAL: 128
I don't think we can do much about those spurious events in the driver, but that AN status retrieved with Get Link Status AQC has correct information. Using It instead of information from the event should help.