Page MenuHomeFreeBSD

ixgbe: add MDIO bus support
Needs ReviewPublic

Authored by adrian on May 3 2025, 2:06 AM.
Tags
None
Referenced Files
F119311107: D50128.id155233.diff
Sat, Jun 7, 12:44 PM
F119308432: D50128.diff
Sat, Jun 7, 12:00 PM
Unknown Object (File)
Sat, Jun 7, 3:15 AM
Unknown Object (File)
Wed, Jun 4, 3:28 PM
Unknown Object (File)
Tue, Jun 3, 10:05 PM
Unknown Object (File)
Mon, Jun 2, 8:59 PM
Unknown Object (File)
Mon, Jun 2, 9:42 AM
Unknown Object (File)
Sun, Jun 1, 5:06 PM

Details

Reviewers
kbowling
Group Reviewers
Restricted Owners Package(Owns No Changed Paths)
Summary

This works enough to let me see the marvell switch on the MDIO bus.

It uses clause 22, which ixgbe's existing MDIO code doesn't currently
support, so it's implemented in a new source file.

Since mdio(4) is now required, add it where appropriate to GENERIC kernels.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 64047
Build 60931: arc lint + arc unit

Event Timeline

Owners added a reviewer: Restricted Owners Package.May 3 2025, 2:06 AM
adrian requested review of this revision.May 3 2025, 2:06 AM
adrian added inline comments.
sys/dev/ixgbe/if_ix.c
279

oops i need to delete this too

1212

and this is here because i'm not quite sure yet what the preconditions should be to actually /attach/ an mdio bus to an ix interface. Let's figure THAT out first. :-)

sys/dev/ixgbe/ixgbe_mdio.c
5

i likely need to bump this to 2024, since the code i grabbed came from dpdk

sys/dev/ixgbe/if_ix.c
262

I would like these pushed down the file quite a ways, maybe after ixgbe_if_i2c_req, and add function prototypes above as necessary.

In general this looks fine aside from the cleanups you noted..

if_ix.c can be consider the FreeBSD "driver" where we can host our own changes and that seems the way DPDK structured it too with the functions in the PMD so no particular problem. But for posterity, in general Intel likes to stuff generic hardware dependent stuff in ixgbe_{chip}.c and access it through ixgbe_api.c which would kind of make sense here but we might also consider those files "upstream" (i.e. we can replay DPDK commits or out of tree driver commits more freely to them) so you are following DPDK as "upstream" in this case.

sys/dev/ixgbe/if_ix.c
262

honestly, i'm thinking of also making an if_ix_mdio.c file and just putting these in there.

(Especially if i'm going to use this as an excuse to play with clause 45 MDIO bus stuff and add those routines as i can then speak to the on-chip PHY register space and use it as a dev platform.)

sys/dev/ixgbe/if_ix.c
1212

Another angle.. what happens if this is exposed to a device with no mdio bus?

sys/dev/ixgbe/if_ix.c
262

That would be fine too, if_bypass.c is similar in spirit as a local hack (Intel doesn't like bypass adapters..)

In general this looks fine aside from the cleanups you noted..

if_ix.c can be consider the FreeBSD "driver" where we can host our own changes and that seems the way DPDK structured it too with the functions in the PMD so no particular problem. But for posterity, in general Intel likes to stuff generic hardware dependent stuff in ixgbe_{chip}.c and access it through ixgbe_api.c which would kind of make sense here but we might also consider those files "upstream" (i.e. we can replay DPDK commits or out of tree driver commits more freely to them) so you are following DPDK as "upstream" in this case.

As far as i can tell, DPDK kinda "broke" the c22 routines later on in their development. The chipsets docs for the x550 state the c22 support isn't available, but the x553 backplane docs say it is. I had to go back over their git history to find the initial import. They eventually replaced em with c45 only versions in the PHY code and left the c22 versions in rte_pmd_ixgbe.c. They don't even have an API for doing generic MDIO through ixgbe_api :(

sys/dev/ixgbe/if_ix.c
1212

Yup, that's why I need to figure this out.

The linux upstream driver always attaches an mdio bus, but for the backplane chipsets it only attaches it to the lowest present backplane pci dev as it's shared between all MACs. (The firmware / hardware semaphore guards against overlapping use though.0

migrate mdio stuff to if_ix_mdio.[ch], add/cleanup my copyright.

erj added inline comments.
sys/conf/files
2295

Is the "ixv inet" entry here required; ixv shouldn't need this file, right? I'm still not familiar with what needs to be added where in these kernel compile config files.

sys/dev/ixgbe/if_ix.c
1212

What happens when there's supposed to be an mdio device but it doesn't get attached here? I'm assuming the mdio driver will complain about something first, and then nothing will happen here, silently? It'd be good to find a way to see if a device is expected to have an mdio bus from within the ixgbe driver, but there doesn't seem to be a universal-ish mechanism for this like in ixl with its device/function capabilities admin queue commands.

kgalazka added inline comments.
sys/dev/ixgbe/ixgbe_mdio.c
94

I need to double check but I think IXGBE_MSCA_OLD_PROTOCOL can be used only with X552/X553 devices, and if I understand correctly MDIO interface is exposed for all types of HW. I wonder if it won't cause any unexpected side effects if used eg. with E610 (https://reviews.freebsd.org/D50067) which has completely new FW.

152

I'm not 100% sure how this will behave if another thread already acquired this semaphore. Also it guards single read/write and some operations require accessing several registers. I think you need to grab a iflib ctx lock in ixgbe_mdio_(read|wrige)reg

sys/dev/ixgbe/if_ix.c
1212

it /looks/ like the linux driver always attaches the mdio device, no matter what?

I'm looking at drivers/net/ethernet/ixgbe/ixgbe_phy.c : ixgbe_mii_bus_init(), which creates it for any NIC that this driver probes/attaches, and ixgbe_mii_bus_init() is called in ixgbe_probe().

So, is linux's ixgbe driver in this particular probe/attach path only supporting a subset of what ours does? Or is there something else I'm missing?

sys/dev/ixgbe/ixgbe_mdio.c
94

Would you mind checking? I'm comparing this to what the linux driver is doing too to try and make sure we're not deviating from what they're doing with the hardware.

Thanks!

152

I'm not 100% sure how this will behave if another thread already acquired this semaphore. Also it guards single read/write and some operations require accessing several registers. I think you need to grab a iflib ctx lock in ixgbe_mdio_(read|wrige)reg

I was wondering about that. Ok, lemme go find the iflib ctx lock bits and add them in here. Thanks!

sys/dev/ixgbe/ixgbe_mdio.c
152

Hm, the only iflib locks are /in/ iflib. How am I supposed to interact with this?

Should the mdio bus be exposed /by/ iflib first?

sys/dev/ixgbe/ixgbe_mdio.c
152

I would assume if you hold the fw semaphore you'd have exclusive access to the adapter.. is the worry that callers might also have requested the semaphore before calling this?

In e1000 there are some gymnastics around ASSERT_CTX_LOCK_HELD and you can see where I prod it in a similar situation for accessing the NVM with struct sx *iflib_ctx_lock = iflib_ctx_lock_get(sc->ctx).. I forget the exact details why we had to do that, it might have been related to the watchdog (which used to trip a lot) triggering stalls, but it does serialize access to the hardware semaphore.

add iflib ctx lock, suggested by kbowling / erj

adrian marked 3 inline comments as done.

update copyright

adrian marked 2 inline comments as done and an inline comment as not done.Sat, May 10, 4:44 AM

ok, i've cleaned up a bit! whats left before it can be landed?

I would be somewhat curious why we need to serialize like this. Note that nothing else prodding the swfw in ixgbe is explicitly locked or asserted.

I would be somewhat curious why we need to serialize like this. Note that nothing else prodding the swfw in ixgbe is explicitly locked or asserted.

I was thinking about ixgbe_if_init and ixgbe_config_link. IIRC under the hood it does couple of MDIO reads and writes. The SWFW semaphore is hold for individual accesses, but not during whole operation. Without CTX lock writes from mdio bus may interfere with that and bringing link up may fail.

TBH now I wonder if we shouldn't hold CTX lock also while changing advertised speeds or flow control settings.

I would be somewhat curious why we need to serialize like this. Note that nothing else prodding the swfw in ixgbe is explicitly locked or asserted.

I was thinking about ixgbe_if_init and ixgbe_config_link. IIRC under the hood it does couple of MDIO reads and writes. The SWFW semaphore is hold for individual accesses, but not during whole operation. Without CTX lock writes from mdio bus may interfere with that and bringing link up may fail.

ixgbe_if_init and ixgbe_config_link are all be under the CTX lock.

TBH now I wonder if we shouldn't hold CTX lock also while changing advertised speeds or flow control settings.

I think actually changing the link happens in the admin task which is protected by CTX lock. So reading or writing with swfw should be enough to serialize the mdio access. A problem would exist if we create some deadlock but I don't see that offhand with what is going on here (but I haven't thought hard about it).

@adrian Would you mind posting device id and hw->phy.type from the system you're using for testing?

So this panics the kernel if e6000sw is built in or loaded at boot time.

From someone who has been testing it:

{{{
ix0: Ethernet address: 00:e0:ed:b6:dd:91
ix0: nvm 0.58.0 eTrack 0x80000847
mdio1: <MDIO> on ix0
panic: _sx_xlock_hard: recursed on non-recursive sx 0xfffff80001b009e0 @ /usr/src/sys/dev/ixgbe/if_ix_mdio.c:50

cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff82fd9330
vpanic() at vpanic+0x13f/frame 0xffffffff82fd9460
panic() at panic+0x43/frame 0xffffffff82fd94c0
_sx_xlock_hard() at _sx_xlock_hard+0x95e/frame 0xffffffff82fd9570
_sx_xlock() at _sx_xlock+0xa9/frame 0xffffffff82fd95b0
ixgbe_mdio_readreg() at ixgbe_mdio_readreg+0x51/frame 0xffffffff82fd95f0
e6000sw_readreg() at e6000sw_readreg+0x94/frame 0xffffffff82fd9630
e6000sw_probe() at e6000sw_probe+0x193/frame 0xffffffff82fd9670
device_probe_child() at device_probe_child+0x156/frame 0xffffffff82fd96c0
device_probe() at device_probe+0x8b/frame 0xffffffff82fd96e0
bus_attach_children() at bus_attach_children+0x3e/frame 0xffffffff82fd9700
mdio_attach() at mdio_attach+0x1e/frame 0xffffffff82fd9720
device_attach() at device_attach+0x45b/frame 0xffffffff82fd9770
bus_attach_children() at bus_attach_children+0x4a/frame 0xffffffff82fd9790
ixgbe_if_attach_post() at ixgbe_if_attach_post+0x193e/frame 0xffffffff82fd9830
iflib_device_register() at iflib_device_register+0x2839/frame 0xffffffff82fd9960
iflib_device_attach() at iflib_device_attach+0xaa/frame 0xffffffff82fd9990
device_attach() at device_attach+0x45b/frame 0xffffffff82fd99e0
bus_attach_children() at bus_attach_children+0x4a/frame 0xffffffff82fd9a00
pci_attach() at pci_attach+0xd7/frame 0xffffffff82fd9a40
acpi_pci_attach() at acpi_pci_attach+0x15/frame 0xffffffff82fd9a80
device_attach() at device_attach+0x45b/frame 0xffffffff82fd9ad0
bus_attach_children() at bus_attach_children+0x4a/frame 0xffffffff82fd9af0
pcib_attach_child() at pcib_attach_child+0x40/frame 0xffffffff82fd9b10
acpi_pcib_pci_attach() at acpi_pcib_pci_attach+0x95/frame 0xffffffff82fd9b40
device_attach() at device_attach+0x45b/frame 0xffffffff82fd9b90
bus_attach_children() at bus_attach_children+0x4a/frame 0xffffffff82fd9bb0
pci_attach() at pci_attach+0xd7/frame 0xffffffff82fd9bf0
acpi_pci_attach() at acpi_pci_attach+0x15/frame 0xffffffff82fd9c30
device_attach() at device_attach+0x45b/frame 0xffffffff82fd9c80
bus_attach_children() at bus_attach_children+0x4a/frame 0xffffffff82fd9ca0
acpi_pcib_acpi_attach() at acpi_pcib_acpi_attach+0x428/frame 0xffffffff82fd9d00
device_attach() at device_attach+0x45b/frame 0xffffffff82fd9d50
bus_attach_children() at bus_attach_children+0x4a/frame 0xffffffff82fd9d70
acpi_probe_children() at acpi_probe_children+0x6f/frame 0xffffffff82fd9dd0
acpi_attach() at acpi_attach+0x9dc/frame 0xffffffff82fd9e60
device_attach() at device_attach+0x45b/frame 0xffffffff82fd9eb0
bus_attach_children() at bus_attach_children+0x4a/frame 0xffffffff82fd9ed0
nexus_acpi_attach() at nexus_acpi_attach+0x65/frame 0xffffffff82fd9ef0
device_attach() at device_attach+0x45b/frame 0xffffffff82fd9f40
bus_generic_new_pass() at bus_generic_new_pass+0x126/frame 0xffffffff82fd9f70
root_bus_configure() at root_bus_configure+0x26/frame 0xffffffff82fd9f90
configure() at configure+0x9/frame 0xffffffff82fd9fa0
mi_startup() at mi_startup+0x1e8/frame 0xffffffff82fd9ff0
KDB: enter: panic
[ thread pid 0 tid 100000 ]
Stopped at kdb_enter+0x33: movq $0,0x1230722(%rip)
}}}

... the iflib ctx lock is held during the calls to the driver attach/post_attach.

We /may/ need to move the child bus attach stuff to a new iflib routine that's documented to being called without the ctx lock held?

I would assume if you hold the fw semaphore you'd have exclusive access to the adapter..

I don't know how I missed that while reading the datasheets, but there are two registers used for SW/FW synchronization. IXGBE_SWSM is used to ensure that only one SW thread is accessing HW. I'm sorry for misleading you.

We /may/ need to move the child bus attach stuff to a new iflib routine that's documented to being called without the ctx lock held?

Because @kbowling is right about the semaphore I think we don't need to use a iflib CTX lock for MDIO read operation. For writes I think CTX lock is still required. Some operations on PHY use read/modify/write pattern (e.g. ixgbe_init_ext_t_x550em). Without CTX lock write from MDIO bus may get between that read and write.

Regarding E610 - it does have IXGBE_MSCA register, but it's strongly advised to use ACI commands for PHY access instead. It may be better idea to not add independent interface but set phy->ops.read_reg_mdi and phy->ops.write_reg_mdi for X553 backplane and rely on phy.ops.read_reg and phy.ops.write_reg to do the right thing for given HW.

I would assume if you hold the fw semaphore you'd have exclusive access to the adapter..

I don't know how I missed that while reading the datasheets, but there are two registers used for SW/FW synchronization. IXGBE_SWSM is used to ensure that only one SW thread is accessing HW. I'm sorry for misleading you.

We /may/ need to move the child bus attach stuff to a new iflib routine that's documented to being called without the ctx lock held?

Because @kbowling is right about the semaphore I think we don't need to use a iflib CTX lock for MDIO read operation. For writes I think CTX lock is still required. Some operations on PHY use read/modify/write pattern (e.g. ixgbe_init_ext_t_x550em). Without CTX lock write from MDIO bus may get between that read and write.

Regarding E610 - it does have IXGBE_MSCA register, but it's strongly advised to use ACI commands for PHY access instead. It may be better idea to not add independent interface but set phy->ops.read_reg_mdi and phy->ops.write_reg_mdi for X553 backplane and rely on phy.ops.read_reg and phy.ops.write_reg to do the right thing for given HW.

ooooo that's good to know.

We need a new API for two reasons:

  • because it doesn't support clause 22 transfers, and
  • because ixgbe_read_phy_reg_mdi() (and the OG clause 22 code in DPDK) assumes the phy address is attached to hw->phy.addr, and not passed in as a parameter.

If we can get that cleaned up in the Intel tree then I'm happy cleaning this up too. I'd /much/ prefer to use the PHY ops.

Let's get the E610/E810 support landed first, so we can then churn on this? I'm not in a rush, we at least now know how to make this work. :-)