Page MenuHomeFreeBSD

10Gigabit Ethernet driver for AMD SoC
Needs ReviewPublic

Authored by rajesh1.kumar_amd.com on Jul 24 2020, 6:09 AM.

Details

Reviewers
manu
andrew
Group Reviewers
Core Team
network
Summary

This patch has the driver for 10Gigabit Ethernet controller in AMD
SoC. This driver is written compatible to the Iflib framework. The
existing driver is for the old version of hardware. The submitted
driver here is for the recent versions of the hardware where the Ethernet
controller is PCI-E based.

Test Plan

The Submitted driver is verified with the recent versions of the AMD SoC.

If there are existing users for the old version of this driver, kindly
let us know and we would like to work with in validating the same.

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 33914
Build 31114: arc lint + arc unit

Event Timeline

rajesh1.kumar_amd.com requested review of this revision.Jul 24 2020, 6:09 AM
rajesh1.kumar_amd.com created this revision.
emaste added a subscriber: emaste.Jul 24 2020, 2:24 PM

Can we add an appropriate SPDX tag to the source files as well - e.g. /* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */ or similar?

https://spdx.github.io/spdx-spec/appendix-IV-SPDX-license-expressions/
https://www.kernel.org/doc/html/v4.18/process/license-rules.html

The older version of this driver supports the Ethernet on the SoftIron Arm systems, correct?

emaste added inline comments.Jul 24 2020, 2:38 PM
sys/dev/axgbe/if_axgbe.c
154–155

What's the story here?

sys/dev/axgbe/if_axgbe_pci.c
2

Can we add a SPDX-License-Identifier: BSD-2-Clause-FreeBSD to this file

Can we add an appropriate SPDX tag to the source files as well - e.g. /* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */ or similar?

Need a clarification here please. The license text in the source files has additional text (part of it give below) apart from the normal BSD-3-Clause text. If this is not an issue, we can use /* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */. Please clarify.

  • This file incorporates work covered by the following copyright and
    • permission notice:
    • The Synopsys DWC ETHER XGMAC Software Driver and documentation
    • (hereinafter "Software") is an unsupported proprietary work of Synopsys,
    • Inc. unless otherwise expressly agreed to in writing between Synopsys
    • and you.

The older version of this driver supports the Ethernet on the SoftIron Arm systems, correct?

Yes. From the existing Copyright/License text, the older version seems to be written for SoftIron systems by @andrew. But, the exact target hardware details are unknown. The major difference between the old and new versions are

  1. New version is PCI-E based, where as the old version is not.
  2. New version uses a different PHY than the old one. So, the mdio code is made as wrapper to diverge to old or new PHY code.

We posted another patch (D21728) for the PHY changes alone. But there is no comments on that yet. So, we are not sure whether the old driver is still in use by someone. If @andrew or someone related to SoftIron can confirm, it would be helpful.

sys/dev/axgbe/if_axgbe.c
154–155

The newer version of the driver is based on PCI-E. Also, the new version is written compatible to the Iflib framework. We made changes related to that in the existing code moving few function between files.

We haven't made the Iflib compatible changes to the old version since we don't have the target hardware to verify the same. So, If the old version of the driver is still in use by someone, we would be happy to work with to make necessary changes and verify the same.

Please let me know if more details needed.

sys/dev/axgbe/if_axgbe_pci.c
2

Yes, This file can be added with /* SPDX-License-Identifier: BSD-2-Clause-FreeBSD */. We can also make the xgbe_osdep.c and xgbe_osdep.h tagged with the same SPDX License Identifier.

Once we get the clarity about the SPDX Licence Identifier for other files (using BSD-3-Clause with additional text), we shall make all the changes and submit another version for review.

Can we add an appropriate SPDX tag to the source files as well - e.g. /* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */ or similar?

Need a clarification here please. The license text in the source files has additional text (part of it give below) apart from the normal BSD-3-Clause text. If this is not an issue, we can use /* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */. Please clarify.

Due to the added text I would not call this a BSD-3-Clause, too bad they added that text.

  • This file incorporates work covered by the following copyright and
    • permission notice:
    • The Synopsys DWC ETHER XGMAC Software Driver and documentation
    • (hereinafter "Software") is an unsupported proprietary work of Synopsys,
    • Inc. unless otherwise expressly agreed to in writing between Synopsys

This text had me worried until I went and read the rest of it, which does expressly in writing grant certain rights.

sys/dev/axgbe/if_axgbe.c
539

C++ style comments should not be used, please convert to C style, or why not just delete the line?
Here and in many other places. I am not going to mark them all.

sys/dev/axgbe/if_axgbe_pci.c
4

Why are All Rights Reserved still be asserted in 2020?

sys/dev/axgbe/xgbe-common.h
58–59

Technically a copyright and license are 2 separate and different things, the "Copyright" should be factored out of the license section and placed before it.

sys/dev/axgbe/xgbe-desc.c
58–59

Again the all rights reserved thing, yet it is not asserted in the GPLv2 license, copy paste repeat mistakes?

rajesh1.kumar_amd.com updated this revision to Diff 75199.EditedJul 31 2020, 6:43 AM

Change details :

  1. Added SPDX-License Identifier for possible files with BSD-2-Clause-FreeBSD
  2. Moved copyright tags out of the License text as per the comment
  3. Changed to C style comments.
  4. Removed commented code and added TODO tags.
rajesh1.kumar_amd.com marked 2 inline comments as done.Jul 31 2020, 7:06 AM
rajesh1.kumar_amd.com added inline comments.
sys/dev/axgbe/if_axgbe_pci.c
4

To understand about "All rights reserved", this text is only related to the copyright and unrelated to the License right? And "year" means the year the particular file came into existence for the first time in general and if multiple years specified, it means the years that file is actively touched, right?

If my understanding is right, does changing the year to 2020 is the problem here?

sys/dev/axgbe/xgbe-common.h
58–59

Moved copyright above the License text and just have one instance.

sys/dev/axgbe/xgbe-desc.c
58–59

Just moved the copyright above and retained "All rights reserved" text. Once I get the clarity about the text, will make appropriate changes.

rgrimes added inline comments.Jul 31 2020, 2:51 PM
sys/dev/axgbe/if_axgbe_pci.c
4

Correct, the text "All rights Reserved" is associated with the copyright and not the License. But this text, "All rights reserved" is no longer needed, and we are trying to remove it when we can and not use it on any new code if possible. All the templates in FreeBSD have been updated to note have this text.

Your statement about the "year" is the interpretation I have always used, some disagree on that and say it is valid to just list the first and last years as a range 2014-2020 for example.

There is no problem with the year use of 2020 in this file, it is a new file.

sys/dev/axgbe/xgbe-common.h
4

This should probably have dates updated to 2014-2016, 2020.

rajesh1.kumar_amd.com marked an inline comment as done.

Changes :

  1. Removed "All rights reserved" and corrected years in the copyright
  2. Added a minor change in i2c and phy-v2 files for link status update for multi-port
rajesh1.kumar_amd.com marked 2 inline comments as done.Aug 1 2020, 8:32 AM
rajesh1.kumar_amd.com added inline comments.
sys/dev/axgbe/xgbe-common.h
4

Other places also modified as needed.

rajesh1.kumar_amd.com marked an inline comment as done.Aug 5 2020, 6:19 AM

Any more comments on this patch?

manu added a comment.Aug 25 2020, 9:56 AM

Any more comments on this patch?

Hi Rajesh,
Sorry I haven't had time to even compile test this.
I think we're ok on the already addressed comment.
I'll commit this during this weekend if no one have an objection until then.
Thanks.

Changes :

  1. Removed unwanted header files from unwanted places
  2. Changed Rx path to use 2 Freelist instead of 1. Rx descriptors holds header and data buffers. Earlier use entries in one freelist were used to populate one descriptor. Now with 2 freelists, one entry from each freelist is used to populated on descriptor. This way the indexes will be in sync between driver and iflib

Hi,

Does anyone have any more comments on this patch?

manu added a comment.Thu, Sep 10, 2:59 PM

Hello,

I've just tested on my machine, an IEI Puzzle. (Sorry it took so long).
I needed this to have the driver compiled and loading : https://reviews.freebsd.org/P422

I don't know if something changed in iflib that causes the nrxqs change.
Also I don't know if my sfp+ modules aren't compatible or if it's due to the driver but I have :
ax0: xgbe_phy_sfp_detect: mod absent
in a loop when I up the interface.

Log when I load the module :
ax0: <AMD 10 Gigabit Ethernet Driver> mem 0xef7e0000-0xef7fffff,0xef7c0000-0xef7dffff,0xef80e000-0xef80ffff irq 40 at device 0.4 on pc
i6
ax0: Using 512 TX descriptors and 512 RX descriptors
ax0: Using 4 RX queues 4 TX queues
ax0: Using MSI-X interrupts with 8 vectors
ax0: xgbe_phy_reset: no phydev
ax1: <AMD 10 Gigabit Ethernet Driver> mem 0xef7a0000-0xef7bffff,0xef780000-0xef79ffff,0xef80c000-0xef80dfff irq 41 at device 0.5 on pci6
ax1: Using 512 TX descriptors and 512 RX descriptors
ax1: Using 4 RX queues 4 TX queues
ax1: Using MSI-X interrupts with 8 vectors
ax1: xgbe_phy_reset: no phydev
ax2: <AMD 10 Gigabit Ethernet Driver> mem 0xef760000-0xef77ffff,0xef740000-0xef75ffff,0xef80a000-0xef80bfff irq 42 at device 0.6 on pci6
ax2: Using 512 TX descriptors and 512 RX descriptors
ax2: Using 2 RX queues 2 TX queues
ax2: Using MSI-X interrupts with 6 vectors
ax2: xgbe_phy_reset: no phydev
ax3: <AMD 10 Gigabit Ethernet Driver> mem 0xef720000-0xef73ffff,0xef700000-0xef71ffff,0xef808000-0xef809fff irq 43 at device 0.7 on pci6
ax3: Using 512 TX descriptors and 512 RX descriptors
ax3: Using 2 RX queues 2 TX queues
ax3: Using MSI-X interrupts with 6 vectors
ax3: xgbe_phy_reset: no phydev
ax0: Enabling TSO in channel 0
ax0: Enabling TSO in channel 1
ax0: Enabling TSO in channel 2
ax0: Enabling TSO in channel 3
ax0: RSS Enabled
ax0: Receive checksum offload Enabled
ax0: VLAN filtering Enabled
ax0: VLAN Stripping Enabled

I'm wondering about the 'ax1: xgbe_phy_reset: no phydev' line.

Thanks,

Hi @manu, Thanks for trying the patch and making other necessary changes. Sorry, I missed to make that change MPASS (nrxqs == 2) when I submitted the v4 patch.

Regarding the log

'ax0: xgbe_phy_reset: no phydev'

This is normal before the interface is brought up. Not an issue. This log may be confusing. I will change that to verbose level log in the future patches.

Problem here is,

ax0: xgbe_phy_sfp_detect: mod absent

This means the SFP module is not recognized. We have tested with multiple SFP+ modules. Which SFP+ module are you using? Can you try a different one and see for verification?

You can enable the debug logs by setting appropriate value to below sysctl (supported values 1-3) for more verbose logs.

sysctl dev.ax.0.axgbe_debug_level=3

Currently enabling verbose logs for one device instance will set the same level for all instances. This will be corrected in future patches.

manu added a comment.Thu, Sep 10, 4:34 PM

Hi @manu, Thanks for trying the patch and making other necessary changes. Sorry, I missed to make that change MPASS (nrxqs == 2) when I submitted the v4 patch.

Regarding the log

'ax0: xgbe_phy_reset: no phydev'

This is normal before the interface is brought up. Not an issue. This log may be confusing. I will change that to verbose level log in the future patches.

Thanks.

Problem here is,

ax0: xgbe_phy_sfp_detect: mod absent

This means the SFP module is not recognized. We have tested with multiple SFP+ modules. Which SFP+ module are you using? Can you try a different one and see for verification?

I only have finisar ftlx8574d3bcl for now, do you have a known model that is supposed to work ?
Also can you make the printf not in a loop ?

You can enable the debug logs by setting appropriate value to below sysctl (supported values 1-3) for more verbose logs.

sysctl dev.ax.0.axgbe_debug_level=3

Currently enabling verbose logs for one device instance will set the same level for all instances. This will be corrected in future patches.

Ok, will try that on monday or tuesday.
Thanks.

Fine @manu .

I only have finisar ftlx8574d3bcl for now, do you have a known model that is supposed to work ?
Also can you make the printf not in a loop ?

We have tested with Finisar 10G SFP-to-Optical modules (not RJ45). Other than that, we have tested with 10G FS, Prolabs and 10GTek (all SFP-to-RJ45). I will make the change to print that log only once and other changes I mentioned before and submit a new patch tomorrow.

In the patch tomorrow, shall I include the changes you made in https://reviews.freebsd.org/P422 as well?

manu added a comment.Thu, Sep 10, 5:07 PM

Fine @manu .

I only have finisar ftlx8574d3bcl for now, do you have a known model that is supposed to work ?
Also can you make the printf not in a loop ?

We have tested with Finisar 10G SFP-to-Optical modules (not RJ45). Other than that, we have tested with 10G FS, Prolabs and 10GTek (all SFP-to-RJ45). I will make the change to print that log only once and other changes I mentioned before and submit a new patch tomorrow.

Ok, weird that my modules doesn't work then no ?

In the patch tomorrow, shall I include the changes you made in https://reviews.freebsd.org/P422 as well?

Sure, thanks.

Ok, weird that my modules doesn't work then no ?

Yeah. I don't think optical (or) RJ45 should make any difference. Trying with other modules will make things clear. One other thing to try is to just have one port enabled and see whether that makes any difference.

manu added a comment.Fri, Sep 11, 7:30 AM

Log with debug_level=3 : https://reviews.freebsd.org/P423

Also if you update the code could you add the PNPINFO so the module will autoload via devmatch(8) please ?
Will do more test after buying others sfp+ modules.
Thanks.

Changes :

  1. Fixed the debug sysctl to be device instance specific
  2. Minor Logging changes

@manu, all the discussed changes (avoid repetitive logs, fix the debug sysctl, and your changes in P422) are done and updated the patch here.

Also if you update the code could you add the PNPINFO so the module will autoload via devmatch(8) please

Regarding the above comment, PNPINFO is already added in the code (IFLIB_PNP_INFO) and devmatch -d lists the details appropriately. With the recent patch, necessary changes to load the driver automatically is also done.

Regarding the logs shared in P433, the config looks correct from the logs. So, problem could with the Port values being read from the SFP. Additional logs are added to indicate that now. Please try with other SFPs as well.

Let me know if you have any more comments.

manu added a comment.Wed, Sep 16, 1:41 PM

So I feel a bit dumb but I now know why I it didn't worked.
Naming of interfaces is of course not the same on the panel but what made me confuse is if I remove/insert the module in the slot that correspond to ax0 I have :
ax3: xgbe_phy_sfp_signals: sfp_gpio_inputs: 0x7220
ax3: xgbe_phy_sfp_signals: sfp_gpio_inputs: 0x2220
Which lead me to think that I was using the right interface.
So TLDR inserting/removing a module in any of the 4 slot always prints info for ax3.
Do you have any idea what's going on ?

It's a logging mistake @manu. I will correct it accordingly. Each nibble in that value denotes a port status. When you say you are touching only ax0, I see the correct nibble reflecting the change (bits[15:12]). From the values you showed here, looks like your SFP module is detected, but Link is still not. Is the link connected?

What is the problem with the SFP module earlier? Is it faulty?

Apart from this logging issue, Is there any other changes needed? So, that I can include them in the next patch.

manu added a comment.Wed, Sep 16, 5:07 PM

It's a logging mistake @manu. I will correct it accordingly. Each nibble in that value denotes a port status. When you say you are touching only ax0, I see the correct nibble reflecting the change (bits[15:12]). From the values you showed here, looks like your SFP module is detected, but Link is still not. Is the link connected?

The paste I've added was done with no cable plugged, but I have link with a copper adapter, haven't yet setup my 10G switch but I guess it will be ok.

What is the problem with the SFP module earlier? Is it faulty?

Yes it was, I felt a bit stupid when really testing this afternoon :)

Apart from this logging issue, Is there any other changes needed? So, that I can include them in the next patch.

I think we're good.

Fine. Good to hear you could progress @manu. I will make that logging change and update the patch.

Hi Rajesh,

I'm working with Deciso in order to test the AMD SoC XGBE driver on a board using the EPYC 3201 (4 cores). Note that there are only two SFP+ ports on this board, which might be different to your situation.

While experimenting with hotplugging the SFP+ modules, I noticed some inconsistent behaviour.
Frequently, the link does not seem to be established (or at least, not right away). Below you can find the log output.

Both interfaces have IP-addresses assigned to them.

--- pull the module out of ax1

Sep 24 09:41:10 OPNsense kernel: ax0: xgbe_phy_sfp_signals: sfp_gpio_inputs: 0x7702
Sep 24 09:41:10 OPNsense kernel: ax1: xgbe_phy_sfp_signals: sfp_gpio_inputs: 0x7702
Sep 24 09:41:10 OPNsense kernel: ax1: xgbe_phy_sfp_detect: mod absent

-- insert module into ax1
Sep 24 09:41:17 OPNsense kernel: ax0: xgbe_phy_sfp_signals: sfp_gpio_inputs: 0x7202
Sep 24 09:41:17 OPNsense kernel: ax1: xgbe_phy_sfp_signals: sfp_gpio_inputs: 0x7202
Sep 24 09:41:19 OPNsense kernel: ax0: xgbe_phy_sfp_signals: sfp_gpio_inputs: 0x7002
Sep 24 09:41:19 OPNsense kernel: ax1: xgbe_phy_sfp_signals: sfp_gpio_inputs: 0x7002 <-- still down, expected up

--- after some time... (this does not happen every time, this is wildly inconsistent)

Sep 24 09:41:45 OPNsense kernel: ax1: Link is UP - 10Gbps/Full - flow control rx/tx
Sep 24 09:41:45 OPNsense kernel: ax1: link state changed to UP

Please note that I tested this with 3 different (but equivalent) SFP+ modules, to rule out any defects.

Let me know if you need more information and I will get back to you as soon as possible.

sys/dev/axgbe/xgbe-i2c.c
425

According to the logs with axgbe_debug_level=3, it seems that reading the EEPROM times out after +/- 60 bytes. Since the I2C controller operates at a frequency of 100 kHz, it seems that it needs at least 11ms to complete the read operation of the EEPROM. I changed this locally to ticks + (10 * hz) and that seemed to fix the issue. On the safe side I would set this to 20 or 30.

I wonder why this didn't happen on your side?

Hi @stephan.dewt_yahoo.co.uk

Regarding the following statement, Does it mean the Link doesn't come up at all? or It takes some delay to come up?

Frequently, the link does not seem to be established (or at least, not right away).

Regarding the following log,

xgbe_phy_sfp_signals: sfp_gpio_inputs: 0x7702

Every nibble in this value corresponds to a port (port0/left to port3/right). Values of the disabled ports are invalid. Currently, even if port 1 is touched you may get the log on port0 instance, which might be confusing. I will correct that and submit a patch.

Having said that, the logs you showed above looks correct. When you plugged in the module in port 1, you can see the second nibble(from left) changing value to 0 (which means the module and the link are recognized). So I assume the issue being said here is Link UP (explicit log) happens after a delay rather than instantly? How frequent is this?

According to the logs with axgbe_debug_level=3, it seems that reading the EEPROM times out after +/- 60 bytes. Since the I2C controller operates at a frequency of 100 kHz, it seems that it needs at least 11ms to complete the read operation of the EEPROM. I changed this locally to ticks + (10 * hz) and that seemed to fix the issue. On the safe side I would set this to 20 or 30.

I wonder why this didn't happen on your side?

Good to hear you have a fix for the same. In my setup, I haven't faced a situation where reading the EEPROM timing out. We faced a bit of delay in Link UP, but that's intermittent. It's interesting to hear this timeout scenario. If there are timeouts, we have "error" message which prints without increasing the log-level. Have you seen any of those error messages? Have you tried adding any logs to compute the number of eeprom bytes read before timeout and actual time needed to complete the eeprom read? We had that 1ms timeout inline with Linux. I will check whether Linux also shows any delay.

FYI, I am working remotely and have limited hardware access till oct 5th. I will try to respond your questions asap anyway.

Regarding the following statement, Does it mean the Link doesn't come up at all? or It takes some delay to come up?

Frequently, the link does not seem to be established (or at least, not right away).

Having said that, the logs you showed above looks correct. When you plugged in the module in port 1, you can see the second nibble(from left) changing value to 0 (which means the module and the link are recognized). So I assume the issue being said here is Link UP (explicit log) happens after a delay rather than instantly? How frequent is this?

In most cases the link comes up after a delay (a small delay could be normal, but 30 or more seconds isn't right). However, I noticed that switching between ports 0 and 1 always causes the port of the newly inserted module to give the correct signals, like so:

xgbe_phy_sfp_signals: sfp_gpio_inputs: 0x7002

But not always give a link UP signal no matter how long I wait.
As you said, I also see the correct nibble changing upon insertion, be it port 0 or 1 (0x702 and 0x7002 respectively).

How frequent is this?

The problem here is that I tried to reproduce a consistent scenario, but couldn't. rebooting, unloading and loading the driver all seemed to lead to varying results. Sometimes it would immediately recognize a link, at other times it would have a delay, and sometimes the link doesn't come up at all, even though the module recognizes a link.

Regarding the following log,

xgbe_phy_sfp_signals: sfp_gpio_inputs: 0x7702

Every nibble in this value corresponds to a port (port0/left to port3/right). Values of the disabled ports are invalid. Currently, even if port 1 is touched you may get the log on port0 instance, which might be confusing. I will correct that and submit a patch.

Please ignore the last nibble in this value, as one of the I/O ports of the PCA9535 is connected to something else.

According to the logs with axgbe_debug_level=3, it seems that reading the EEPROM times out after +/- 60 bytes. Since the I2C controller operates at a frequency of 100 kHz, it seems that it needs at least 11ms to complete the read operation of the EEPROM. I changed this locally to ticks + (10 * hz) and that seemed to fix the issue. On the safe side I would set this to 20 or 30.

I wonder why this didn't happen on your side?

Good to hear you have a fix for the same. In my setup, I haven't faced a situation where reading the EEPROM timing out. We faced a bit of delay in Link UP, but that's intermittent. It's interesting to hear this timeout scenario. If there are timeouts, we have "error" message which prints without increasing the log-level. Have you seen any of those error messages? Have you tried adding any logs to compute the number of eeprom bytes read before timeout and actual time needed to complete the eeprom read? We had that 1ms timeout inline with Linux. I will check whether Linux also shows any delay.

I have indeed seen those error messages with and without the increased log level. Actually, the verbose logs already give an indication as to how many bytes are read before timing out: (if my interpretation of the logs is correct)

ax0: xgbe_i2c_read: rx_slots 0 rx_len 128
ax0: xgbe_i2c_write: tx_slots 15 tx_len 128
ax0: xgbe_i2c_write: cmd 256 tx_len 128
ax0: xgbe_i2c_write: cmd 256 tx_len 127
ax0: xgbe_i2c_write: cmd 256 tx_len 126
ax0: xgbe_i2c_write: cmd 256 tx_len 125
ax0: xgbe_i2c_write: cmd 256 tx_len 124
ax0: xgbe_i2c_write: cmd 256 tx_len 123
ax0: xgbe_i2c_write: cmd 256 tx_len 122
ax0: xgbe_i2c_write: cmd 256 tx_len 121
ax0: xgbe_i2c_write: cmd 256 tx_len 120
ax0: xgbe_i2c_write: cmd 256 tx_len 119
ax0: xgbe_i2c_write: cmd 256 tx_len 118
ax0: xgbe_i2c_write: cmd 256 tx_len 117
ax0: xgbe_i2c_write: cmd 256 tx_len 116
ax0: xgbe_i2c_write: cmd 256 tx_len 115
ax0: xgbe_i2c_write: cmd 256 tx_len 114
ax0: xgbe_i2c_isr: ret 0 stop 0
ax0: xgbe_i2c_isr: isr 0x2510
ax0: xgbe_i2c_isr: I2C interrupt status=0x00002510
ax0: xgbe_i2c_read: op cmd 0
ax0: xgbe_i2c_read: rx_slots 15 rx_len 128
ax0: xgbe_i2c_write: tx_slots 15 tx_len 113
ax0: xgbe_i2c_write: cmd 256 tx_len 113
ax0: xgbe_i2c_write: cmd 256 tx_len 112
ax0: xgbe_i2c_write: cmd 256 tx_len 111
ax0: xgbe_i2c_write: cmd 256 tx_len 110
ax0: xgbe_i2c_write: cmd 256 tx_len 109
ax0: xgbe_i2c_write: cmd 256 tx_len 108
ax0: xgbe_i2c_write: cmd 256 tx_len 107
ax0: xgbe_i2c_write: cmd 256 tx_len 106
ax0: xgbe_i2c_write: cmd 256 tx_len 105
ax0: xgbe_i2c_write: cmd 256 tx_len 104
ax0: xgbe_i2c_write: cmd 256 tx_len 103
ax0: xgbe_i2c_write: cmd 256 tx_len 102
ax0: xgbe_i2c_write: cmd 256 tx_len 101
ax0: xgbe_i2c_write: cmd 256 tx_len 100
ax0: xgbe_i2c_write: cmd 256 tx_len 99
ax0: xgbe_i2c_isr: ret 0 stop 0
ax0: xgbe_i2c_isr: isr 0x2510
ax0: xgbe_i2c_isr: I2C interrupt status=0x00002510
ax0: xgbe_i2c_read: op cmd 0
ax0: xgbe_i2c_read: rx_slots 15 rx_len 113
ax0: xgbe_i2c_write: tx_slots 15 tx_len 98
ax0: xgbe_i2c_write: cmd 256 tx_len 98
ax0: xgbe_i2c_write: cmd 256 tx_len 97
ax0: xgbe_i2c_write: cmd 256 tx_len 96
ax0: xgbe_i2c_write: cmd 256 tx_len 95
ax0: xgbe_i2c_write: cmd 256 tx_len 94
ax0: xgbe_i2c_write: cmd 256 tx_len 93
ax0: xgbe_i2c_write: cmd 256 tx_len 92
ax0: xgbe_i2c_write: cmd 256 tx_len 91
ax0: xgbe_i2c_write: cmd 256 tx_len 90
ax0: xgbe_i2c_write: cmd 256 tx_len 89
ax0: xgbe_i2c_write: cmd 256 tx_len 88
ax0: xgbe_i2c_write: cmd 256 tx_len 87
ax0: xgbe_i2c_write: cmd 256 tx_len 86
ax0: xgbe_i2c_write: cmd 256 tx_len 85
ax0: xgbe_i2c_write: cmd 256 tx_len 84
ax0: xgbe_i2c_isr: ret 0 stop 0
ax0: xgbe_i2c_xfer: operation timed out
ax0: xgbe_i2c_isr: isr 0x2510
ax0: xgbe_i2c_isr: I2C interrupt status=0x00002510
ax0: xgbe_i2c_read: op cmd 0
ax0: xgbe_i2c_read: rx_slots 0 rx_len 98
ax0: xgbe_i2c_write: tx_slots 15 tx_len 83
ax0: xgbe_i2c_write: cmd 256 tx_len 83
ax0: xgbe_i2c_write: cmd 256 tx_len 82
ax0: xgbe_i2c_write: cmd 256 tx_len 81
ax0: xgbe_i2c_write: cmd 256 tx_len 80
ax0: xgbe_i2c_write: cmd 256 tx_len 79
ax0: ax0: xgbe_i2c_write: cmd 256 tx_len 78
xgbe_i2c_disable: final i2c_disable 0
ax0: ax0: xgbe_phy_i2c_read: ret2 -60 retry 1
xgbe_i2c_write: cmd 256 tx_len 77
ax0: ax0: xgbe_i2c_write: cmd 256 tx_len 76
ax0: sfp_base[XGBE_SFP_BASE_ID]     : 0x0003
xgbe_i2c_write: cmd 256 tx_len 75
ax0: ax0: xgbe_i2c_write: cmd 256 tx_len 74
sfp_base[XGBE_SFP_BASE_EXT_ID] : 0x0004
ax0: ax0: xgbe_i2c_write: cmd 256 tx_len 73
sfp_base[XGBE_SFP_BASE_CABLE]  : 0x0000
ax0: xgbe_i2c_write: cmd 256 tx_len 72
ax0: ax0: I2C error reading SFP EEPROM
xgbe_i2c_write: cmd 256 tx_len 71
ax0: i2c xfer started ---->>>
ax0: ax0: xgbe_i2c_write: cmd 3 tx_len 70
xgbe_i2c_disable: final i2c_disable 0
ax0: xgbe_i2c_write: cmd 4 tx_len 69
ax0: xgbe_i2c_isr: ret 0 stop 0
ax0: xgbe_i2c_isr: isr 0x710
ax0: xgbe_i2c_isr: I2C interrupt status=0x00000710
ax0: xgbe_i2c_read: op cmd 1
ax0: xgbe_i2c_write: tx_slots 15 tx_len 0
ax0: xgbe_i2c_isr: ret 0 stop 1
ax0: xgbe_i2c_xfer: I2C OP complete
ax0: xgbe_i2c_xfer: i2c xfer ret 0 abrt_source 0x0
ax0: i2c xfer finished ---->>>
ax0: xgbe_i2c_disable: final i2c_disable 0
ax0: xgbe_phy_sfp_detect: eeprom read failed

Furthermore, I applied a logic analyzer to actually see the I2C signal and see how much time it takes to complete an EEPROM read, this always comes up at around 11.5ms. If you are interested I can send over a CSV file containing the data.

FYI, I am working remotely and have limited hardware access till oct 5th. I will try to respond your questions asap anyway.

Understandable given the current situation. I appreciate the effort!

Thanks @stephan.dewt_yahoo.co.uk for the detailed info and sharing the logs.

So, with the increased timeout I believe you are not seeing EERPOM read issue and link always coming up. But, are you still facing the delays in link up often? Is it blocking your progress by anyway?

As I mentioned, we have experienced this delay in link up intermittently. But this EEPROM read issue is something interesting. I will see what is different in my setup. If that timeout increment is the only change, I will set that to 20 ms and include the changes to correct those logs and test to see if there is any other impact and submit the next patch. Please let me know if there are any other changes?

Furthermore, I applied a logic analyzer to actually see the I2C signal and see how much time it takes to complete an EEPROM read, this always comes up at around 11.5ms. If you are interested I can send over a CSV file containing the data.

This is interesting. It would be helpful if you share the details.

So, with the increased timeout I believe you are not seeing EERPOM read issue and link always coming up. But, are you still facing the delays in link up often? Is it blocking your progress by anyway?

The increased timeout indeed eliminates the issue of reading the EEPROM. I don't think this has anything to do with the link issue, as there is (as far as I know) a service timer that executes every second to see if any signals have changed, it then proceeds to read the EEPROM again, regardless of whether the module has changed. We should not see a Link UP after 30 seconds or more (or never) if the service timer executes every second. Could this have something to do with the integration with iflib? My suspicion comes from the fact that the "Link UP" message comes from iflib.
By the way, I also applied the logic analyzer to the Linux driver and saw the exact same results, except no EEPROM read fail....

This is interesting. It would be helpful if you share the details.

Of course, I will upload the files.

As a potential optimization: the PCA9535 has an interrupt signal attached to a GPIO line in the EPYC. Could it be worthwile to investigate replacing the service timer with an interrupt routine?

The increased timeout indeed eliminates the issue of reading the EEPROM. I don't think this has anything to do with the link issue, as there is (as far as I know) a service timer that executes every second to see if any signals have changed, it then proceeds to read the EEPROM again, regardless of whether the module has changed. We should not see a Link UP after 30 seconds or more (or never) if the service timer executes every second.

Just to make sure on my understanding. By link issue, you just mean the delay of more than 30 secs for Link UP? Or delayed link up + link not coming up intermittently? Just wanted to make sure whether you see link not coming up issue after you fix the EEPROM read issue.

Could this have something to do with the integration with iflib? My suspicion comes from the fact that the "Link UP" message comes from iflib.

I don't think IFLIB has a part here. Because the driver also has a Link UP message (on default log level). If that's not shown, the the driver couldn't recognize the link up.

By the way, I also applied the logic analyzer to the Linux driver and saw the exact same results, except no EEPROM read fail....

Interested to know Linux also exhibit similar behavior. I thought Linux also might have the EEPROM read fail (as it same use same timeout value), but not the other. But looks inverse.

As a potential optimization: the PCA9535 has an interrupt signal attached to a GPIO line in the EPYC. Could it be worthwile to investigate replacing the service timer with an interrupt routine?

From my perspective, may it be interrupt routine (or) timer based routine, Link state change will be notified it the driver see the appropriate change in the hardware signals. Please correct me if I am wrong.

And is that connection of PCA9535 Interrupt signal to GPIO pins board specific? If so, how can the driver notified about the GPIO pin to use as interrupt line? That info is needed by the "gpio_alloc_intr_resource" right?

Few more questions,

  1. Is this issue seen only during hotplug/hotunplug? or even when you have the link fixed, but just do the driver load/unload or a soft link down/up?
  1. Does enabling verbose logs points to any errors/points when you hit the issue (like it's shown during the EEPROM read timeout)?
  1. Which BIOS is being used? Is it a CUSTOM one or a standard release?

I will do some more debugging to recreate the issue in my setup and add more logs to debug the link status check path. I will also check internally to check for any known related issues.

Thanks for sharing the I2C logic analyzer details?

Hi Rajesh, apologies for the late reply.

Just to make sure on my understanding. By link issue, you just mean the delay of more than 30 secs for Link UP? Or delayed link up + link not coming up intermittently? Just wanted to make sure whether you see link not coming up issue after you fix the EEPROM read issue.

I mean the delayed link up + link not coming up intermittently.

From my perspective, may it be interrupt routine (or) timer based routine, Link state change will be notified it the driver see the appropriate change in the hardware signals. Please correct me if I am wrong.

And is that connection of PCA9535 Interrupt signal to GPIO pins board specific? If so, how can the driver notified about the GPIO pin to use as interrupt line? That info is needed by the "gpio_alloc_intr_resource" right?

Scratch that idea, it is indeed board-specific. A timer based routine is fine.

Few more questions,

  1. Is this issue seen only during hotplug/hotunplug? or even when you have the link fixed, but just do the driver load/unload or a soft link down/up?

The issue is seen both during hotplug/hotunplug. As I said, I tried to create a reproducible scenario. During testing this morning, I tried:

  • Loading the driver while the module is plugged in caused the link to go UP immediately when assigning an IP address on interface 0 (different from last week, same driver code).
  • Assigning an IP address to interface 1, after which I pulled out the module of ax0 and inserted it into ax1 caused the link to go UP immediately.
  • Having both interfaces assigned, switching the module from ax1 to ax0 again caused the link to go UP, but only after about 10 seconds.
  • Doing this again, the exact same thing happens, except the link goes UP immediately on the third step.
  1. Does enabling verbose logs points to any errors/points when you hit the issue (like it's shown during the EEPROM read timeout)?

Not as far as I can see.

  1. Which BIOS is being used? Is it a CUSTOM one or a standard release?

It is a custom BIOS.

I will do some more debugging to recreate the issue in my setup and add more logs to debug the link status check path. I will also check internally to check for any known related issues.

I also tested the link issue with an optical module+cable. This time, I had both interfaces configured beforehand. After inserting the module into ax0, no link is established. Inserting it into ax1 also does nothing:

dmesg | grep Link
ax0: link_status returned Link: 0 an_restart: 0
ax1: ax0: Link Deactive
ax0: ax1: xgbe_phy_status: Link 0
phy_link 0 Link 0 new_state 0
ax1: link_status returned Link: 0 an_restart: 0
ax1: Link Deactive
ax0: link_status returned Link: 0 an_restart: 0
Link Deactive
ax0: ax1: xgbe_phy_status: Link 0
ax0: phy_link 0 Link 0 new_state 0
ax1: link_status returned Link: 0 an_restart: 0
ax1: Link Deactive
link_status returned Link: 0 an_restart: 0
Link Deactive
ax0: xgbe_phy_status: Link 0
ax0: phy_link 0 Link 0 new_state 0
ax1: link_status returned Link: 0 an_restart: 0
ax1: Link Deactive
link_status returned Link: 0 an_restart: 0
ax0: ax1: Link Deactive
ax1: xgbe_phy_status: Link 0
ax1: phy_link 0 Link 0 new_state 0
ax1: link_status returned Link: 0 an_restart: 0
ax1: Link Deactive
ax0: link_status returned Link: 0 an_restart: 0
Link Deactive
ax0: xgbe_phy_status: Link 0
ax0: phy_link 0 Link 0 new_state 0
ax1: link_status returned Link: 0 an_restart: 0
ax1: Link Deactive
ax0: link_status returned Link: 0 an_restart: 0
ax1: Link Deactive
ax1: ax0: xgbe_phy_status: Link 0
phy_link 0 Link 0 new_state 0
ax1: link_status returned Link: 0 an_restart: 0
ax1: Link Deactive
ax1: link_status returned Link: 0 an_restart: 0
ax0: Link Deactive
ax0: ax1: xgbe_phy_status: Link 0
ax0: ax1: phy_link 0 Link 0 new_state 0
ax1: link_status returned Link: 0 an_restart: 0
ax1: Link Deactive

Interestingly, after unloading the driver and loading it again, it immediately establishes a link, after which the link goes DOWN after a few seconds (exclusively on the optical module).

Furthermore, when running

ifconfig -m ax0

I do not see any option for 1000Base-T. My knowledge on this subject is a little fuzzy, but I would like to connect to a 1Gb switch via RJ45. Autonegotiation should then establish a 1Gbit link. The actual media reported when inserting both the RJ45 SFP+ module and the optical module is 10GBase-KR. I don't think this is correct. Could you elaborate on this?

Hi @stephan.dewt_yahoo.co.uk

The issue is seen both during hotplug/hotunplug. As I said, I tried to create a reproducible scenario. During testing this morning, I tried:

In an attempt to recreate the issue, I tried testing the soft link up/down (scripted ifconfig ax0 down/ifconfig ax0 up) for around 500 iteration with a maximum timeout of 10 seconds. I couldn't face any delay or link not coming up. All went good. So, looks it mostly to do with only hotplug/hotunplug scenarios. Your comment about optical testing also adds to this point.

Unfortunately, I couldn't do the hotplug test until Oct 5th with my remote setup as mentioned before. Sorry about that. I will do my hotplug testing next week and see whether I can recreate the problem. Until then, I will see if I can recreate the problem somehow.

It is a custom BIOS.

I assume, there is no issues from BIOS side.

I also tested the link issue with an optical module+cable. This time, I had both interfaces configured beforehand. After inserting the module into ax0, no link is established. Inserting it into ax1 also does nothing:

We have tested with optical cable as well, but at minimal level compared to RJ45. I will try optical cable also in my testing next week for recreating the issue.

I do not see any option for 1000Base-T. My knowledge on this subject is a little fuzzy, but I would like to connect to a 1Gb switch via RJ45. Autonegotiation should then establish a 1Gbit link. The actual media reported when inserting both the RJ45 SFP+ module and the optical module is 10GBase-KR. I don't think this is correct. Could you elaborate on this?

We have a code issue here. Currently, we are hard coding (in media_status call back) the media type to 10GBase-KR if it's a 10G link irrespective of whatever module being plugged. Also, we haven't added 1000Base-T in the supported media(ifmedia_add). That is why you are not seeing 1000Base-T in ifconfig -m output. I will make the appropriate changes (along with the previous discussed changes) and submit a new patch by end of today.

We have tested 10G, 1G, back-to-back and switch configs. But I don't remember testing a 10G Link with a 1G switch and autoneg to 1G link. I will try to do this test as well next week. But from a code perspective, this should work.

Summarizing, I will give an updated patch today. And I will do some more testing next week. Meanwhile, if I have any clues/inputs I will update accordingly.

Hi Rajesh,

Unfortunately, I couldn't do the hotplug test until Oct 5th with my remote setup as mentioned before. Sorry about that. I will do my hotplug testing next week and see whether I can recreate the problem. Until then, I will see if I can recreate the problem somehow.

That's no problem at all. The link issue is indeed hotplug-related. I will also continue to see if I can recreate the problem. I suspect that it is a timing issue, so I'm logging everything to see if I can find the issue.

We have a code issue here. Currently, we are hard coding (in media_status call back) the media type to 10GBase-KR if it's a 10G link irrespective of whatever module being plugged. Also, we haven't added 1000Base-T in the supported media(ifmedia_add). That is why you are not seeing 1000Base-T in ifconfig -m output. I will make the appropriate changes (along with the previous discussed changes) and submit a new patch by end of today.

I suspected such a thing, thanks for confirming.

We have tested 10G, 1G, back-to-back and switch configs. But I don't remember testing a 10G Link with a 1G switch and autoneg to 1G link. I will try to do this test as well next week. But from a code perspective, this should work.

I have confirmed this to be working (the sfp+ module does the job of autonegotiation for us).

Thanks for all the hard work, I look forward to the patch.

Changes:

  1. Increased timeout from 1 to 20 for SFP EEPROM read
  2. Changes to add supports for appropriate media types
  3. Changes to list appropriate media type in ifconfig output
  4. Logging changes.
manu added a comment.Thu, Oct 1, 6:39 AM

Changes:

  1. Increased timeout from 1 to 20 for SFP EEPROM read
  2. Changes to add supports for appropriate media types
  3. Changes to list appropriate media type in ifconfig output
  4. Logging changes.

Hi,

Thanks for the update.
It seems that you've also include non related changes (likely a mis-merge like the IOMMU/DMAR rename in GENERIC).
After fixing this I'll commit this driver and we can take care of the latest bugs later in the tree, having it in tree will mean that it will be easier for people to
report bugs.
Thanks.