Page MenuHomeFreeBSD

Fix ixgbe SRIOV bugs
ClosedPublic

Authored by pkelsey on Jun 27 2015, 5:53 AM.
Tags
None
Referenced Files
F105769374: D2922.diff
Fri, Dec 20, 10:50 AM
Unknown Object (File)
Thu, Nov 28, 9:03 AM
Unknown Object (File)
Thu, Nov 28, 9:01 AM
Unknown Object (File)
Nov 4 2024, 11:08 AM
Unknown Object (File)
Sep 12 2024, 11:20 PM
Unknown Object (File)
Sep 8 2024, 2:35 PM
Unknown Object (File)
Sep 8 2024, 9:19 AM
Unknown Object (File)
Sep 7 2024, 6:09 AM
Subscribers

Details

Summary

The bugs fixed are as follows:

  • In ixgbe_vf_api_negotiate(), the API version is in msg[1], not msg[0]. This bug results in the API version not being correctly set in the VF context. I believe the only current consequence is an error status being printed to the log.
  • In ixgbe_reset_hw_vf(), IXGBE_VT_MSGTYPE_CTS must be masked off in msgbuf[0] before performing the message type + ack/nack checks or those checks will always fail. The main side effect of this failure is that the VF MAC address cannot be set at creation time via iovctl.
  • In ixv_initialize_receive_units(), set the receive tail pointer after enabling the queues instead of before as enabling the queues clears the tail pointer. The driver currently muddles through this mistake during the first interrupt as the receive ring mbufs wind up being refreshed, which updates the tail pointer as a side effect.
Test Plan

Tested on an 82599 with one or more VFs - verified the API

version negotiation error status cleared, VF MAC addresses can be
set via iovctl, and receive interrupt processing no longer relies
on friendly side effects of the mbuf refresh routine operating
with incorrect state.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

pkelsey retitled this revision from to Fix ixgbe SRIOV bugs.
pkelsey updated this object.
pkelsey edited the test plan for this revision. (Show Details)
pkelsey added reviewers: adrian, erj, rstone.
gnn edited edge metadata.
This revision is now accepted and ready to land.Jun 28 2015, 6:45 PM
erj requested changes to this revision.Jun 29 2015, 5:34 PM
erj edited edge metadata.

Everything but the change in ixgbe_vf.c looks good to me.

It looks like the PF driver needs to not set CTS when it acks the reset request, instead of the VF driver ignoring the flag.

So ixgbe_vf_reset_msg() in if_ix.c needs to change instead.

This revision now requires changes to proceed.Jun 29 2015, 5:34 PM
In D2922#57477, @erj wrote:

Everything but the change in ixgbe_vf.c looks good to me.

It looks like the PF driver needs to not set CTS when it acks the reset request, instead of the VF driver ignoring the flag.

So ixgbe_vf_reset_msg() in if_ix.c needs to change instead.

It certainly wasn't clear to me what the implementation intent was with CTS flag, so I made this change in what appeared to be the most consistent way given the existing code. In this case, ixgbe_set_rar_vf(), ixgbe_set_uc_addr_vf(), and ixgbev_get_queues() served as the reference for what the VF side of things was doing before checking flags in the response header. Considering your requested change to the patch, would those also be changed under the same reasoning? Also, in if_ix.c, ixgbe_vf_get_queues() is setting IXGBE_VT_MSGTYPE_CTS in its response - should that also be eliminated as the receiver of that response is just masking it off?

In D2922#57477, @erj wrote:

Everything but the change in ixgbe_vf.c looks good to me.

It looks like the PF driver needs to not set CTS when it acks the reset request, instead of the VF driver ignoring the flag.

So ixgbe_vf_reset_msg() in if_ix.c needs to change instead.

It certainly wasn't clear to me what the implementation intent was with CTS flag, so I made this change in what appeared to be the most consistent way given the existing code. In this case, ixgbe_set_rar_vf(), ixgbe_set_uc_addr_vf(), and ixgbev_get_queues() served as the reference for what the VF side of things was doing before checking flags in the response header. Considering your requested change to the patch, would those also be changed under the same reasoning? Also, in if_ix.c, ixgbe_vf_get_queues() is setting IXGBE_VT_MSGTYPE_CTS in its response - should that also be eliminated as the receiver of that response is just masking it off?

My reasoning was "do it because it'll match the Linux PF's behavior" and not "I don't like the way you did it." AFAICT, the Linux PF doesn't send CTS on a VF reset request ACK because they don't think it's necessary. The VF is marked as being clear to send in their PF after their reset flow, so they might assume the VF would know it's clear to send if it receives a response.

In D2922#57500, @erj wrote:
In D2922#57477, @erj wrote:

Everything but the change in ixgbe_vf.c looks good to me.

It looks like the PF driver needs to not set CTS when it acks the reset request, instead of the VF driver ignoring the flag.

So ixgbe_vf_reset_msg() in if_ix.c needs to change instead.

It certainly wasn't clear to me what the implementation intent was with CTS flag, so I made this change in what appeared to be the most consistent way given the existing code. In this case, ixgbe_set_rar_vf(), ixgbe_set_uc_addr_vf(), and ixgbev_get_queues() served as the reference for what the VF side of things was doing before checking flags in the response header. Considering your requested change to the patch, would those also be changed under the same reasoning? Also, in if_ix.c, ixgbe_vf_get_queues() is setting IXGBE_VT_MSGTYPE_CTS in its response - should that also be eliminated as the receiver of that response is just masking it off?

My reasoning was "do it because it'll match the Linux PF's behavior" and not "I don't like the way you did it." AFAICT, the Linux PF doesn't send CTS on a VF reset request ACK because they don't think it's necessary. The VF is marked as being clear to send in their PF after their reset flow, so they might assume the VF would know it's clear to send if it receives a response.

My position is, let's not make a change that perhaps very locally makes sense in isolation but that is inconsistent with the rest of the driver implementation, because indoing so we are poorly serving all who might read or work on this driver in the future. We should not look at just one function in the Linux version and imitate it without considering what is going on in the rest of the sources we are dealing with. It appears to me that the One True Function of this CTS flag is to indicate to the link status checker whether a PF reset has occurred. Other than that, it doesn't seem to me to be consistently being set or consumed on all of the other mailbox transactions.

I think the reasonable course of action would be to accept this patch as functionally correct and expressing good parallel structure with the driver as is, then address cleaning up the use of this CTS flag in the entire implementation as a separate issue. What do you think?

It's been two weeks with no further input. I just pinged rstone directly with a last-call. If there are no further objections in the next 8 hours or so, I plan to commit this as-is.

erj edited edge metadata.

I have no objections now, but this could end up getting changed in the future.

This revision is now accepted and ready to land.Jul 13 2015, 8:36 PM
This revision was automatically updated to reflect the committed changes.