Page MenuHomeFreeBSD

ure: fix spurious link flaps from MII
ClosedPublic

Authored by rkitover_gmail.com on Fri, Mar 6, 1:27 AM.
Referenced Files
F150516244: D55682.id173249.diff
Wed, Apr 1, 10:24 PM
Unknown Object (File)
Sun, Mar 29, 12:51 AM
Unknown Object (File)
Sat, Mar 28, 9:56 AM
Unknown Object (File)
Thu, Mar 26, 7:21 AM
Unknown Object (File)
Wed, Mar 25, 1:22 PM
Unknown Object (File)
Wed, Mar 25, 9:21 AM
Unknown Object (File)
Wed, Mar 25, 9:21 AM
Unknown Object (File)
Wed, Mar 25, 9:21 AM

Details

Summary

A race condition in the MII layer causes spurious link down events, see:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252165

. In statchg, on link down, check if the PHY reports the link as
actually down using the BMSR register, if not, force the status of the
link to back up and restart TX. Do the same in a MII linkchg handler.

On actual link up, restart TX in case it went idle and down.

I have tested this patch with a Realtek 8153 USB interface, and it fixes
the link flaps.

Signed-off-by: Rafael Kitover <rkitover@gmail.com>

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Remove frozen TX fix, only needed for axge

pouria added inline comments.
sys/dev/usb/net/if_ure.c
536

style.9

Unfortunately, even after applying your patch, I still have the issue:

% mdo tail -n 500 /var/log/messages | grep -E 'ue0|8153'
Mar  7 00:57:55 cornelia kernel: rgephy0: <RTL8251/8153 1000BASE-T media interface> PHY 0 on miibus0
Mar  7 00:57:55 cornelia kernel: ue0: <USB Ethernet> on ure0
Mar  7 00:57:55 cornelia kernel: ue0: Ethernet address: 00:e0:4c:a4:4e:ef
Mar  7 00:57:55 cornelia kernel: ue0: link state changed to DOWN
Mar  7 00:58:16 cornelia kernel: ue0: link state changed to UP
Mar  7 01:01:44 cornelia kernel: ue0: link state changed to DOWN

usbconfig:

ugen0.14: <RTL8153 Gigabit Ethernet Adapter Realtek Semiconductor Corp.> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (28mA)

bLength = 0x0012
bDescriptorType = 0x0001
bcdUSB = 0x0320
bDeviceClass = 0x0000  <Probed by interface class>
bDeviceSubClass = 0x0000
bDeviceProtocol = 0x0000
bMaxPacketSize0 = 0x0009
idVendor = 0x0bda
idProduct = 0x8153
bcdDevice = 0x3100
iManufacturer = 0x0001  <retrieving string failed>
iProduct = 0x0002  <retrieving string failed>
iSerialNumber = 0x0007  <retrieving string failed>
bNumConfigurations = 0x0002

Also, shortly after plugging in the usb dock, ifconfig ue0 takes about ~2 seconds to respond with the media status:

% time ifconfig ue0
ue0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=68009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:e0:4c:a4:4e:ef
media: Ethernet autoselect (none <half-duplex>)
status: no carrier
nd6 options=823<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL,STABLEADDR>
ifconfig ue0  0.00s user 0.01s system 0% cpu 1.687 total

The initial link flaps on device initialization is not fixed by this patch, but, if you run with and without the patch for like an hour or two, you will see the difference.

I don't know about the ifconfig slowness, may be MII related, I will investigate.

Despite the Ethernet port being connected, after replugging in the USB dock it reports that ue0 is DOWN, even though the Ethernet lights are on and the other Ethernet endpoint reports connected and UP:

% mdo tail -f /var/log/messages
Mar  7 01:20:09 cornelia kernel: ugen0.11: <GenesysLogic USB3.1 Hub> at usbus0
Mar  7 01:20:09 cornelia kernel: uhub3 on uhub0
Mar  7 01:20:09 cornelia kernel: uhub3: <GenesysLogic USB3.1 Hub, class 9/0, rev 3.20/6.63, addr 18> on usbus0
Mar  7 01:20:09 cornelia kernel: uhub3: 4 ports with 4 removable, self powered
Mar  7 01:20:10 cornelia kernel: ugen0.12: <Realtek USB 10/100/1000 LAN> at usbus0
Mar  7 01:20:10 cornelia kernel: ure0 on uhub3
Mar  7 01:20:10 cornelia kernel: ure0: <Realtek USB 10/100/1000 LAN, class 0/0, rev 3.20/31.00, addr 19> on usbus0
Mar  7 01:20:10 cornelia kernel: miibus0: <MII bus> on ure0
Mar  7 01:20:10 cornelia kernel: rgephy0: <RTL8251/8153 1000BASE-T media interface> PHY 0 on miibus0
Mar  7 01:20:10 cornelia kernel: rgephy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto
Mar  7 01:20:10 cornelia kernel: ue0: <USB Ethernet> on ure0
Mar  7 01:20:10 cornelia kernel: ue0: Ethernet address: 00:e0:4c:a4:4e:ef
Mar  7 01:20:10 cornelia kernel: ue0: link state changed to DOWN
Mar  7 01:20:10 cornelia kernel: ugen0.13: <GenesysLogic USB2.1 Hub> at usbus0
Mar  7 01:20:10 cornelia kernel: uhub4 on uhub0
Mar  7 01:20:10 cornelia kernel: uhub4: <GenesysLogic USB2.1 Hub, class 9/0, rev 2.10/6.63, addr 20> on usbus0
Mar  7 01:20:11 cornelia kernel: uhub4: 4 ports with 4 removable, self powered

Despite the Ethernet port being connected, after replugging in the USB dock it reports that ue0 is DOWN, even though the Ethernet lights are on and the other Ethernet endpoint reports connected and UP:

And now, after about eight minutes, ue0 suddenly reports that it's up.

Mar  7 01:28:17 cornelia kernel: ue0: link state changed to UP

The initial link flaps on device initialization is not fixed by this patch, but, if you run with and without the patch for like an hour or two, you will see the difference.

I don't know about the ifconfig slowness, may be MII related, I will investigate.

I'm not expert in usb driver. I just want to help by testing it.

Can you please verify which behaviors happen for you with and without the patch?

Can you please verify which behaviors happen for you with and without the patch?

Done.
It seems your patch decrease the flapping rate.

Mar  9 19:37:25 cornelia kernel: ue0: link state changed to UP
Mar  9 19:37:25 cornelia kernel: ue0: 2 link states coalesced
Mar  9 19:37:25 cornelia kernel: ue0: link state changed to UP
Mar  9 19:37:25 cornelia kernel: ue0: 2 link states coalesced
Mar  9 19:37:25 cornelia kernel: ue0: link state changed to UP

Without your patch, it stops at UP state.
With the patch, it can flap to DOWN, and I have to wait for the next random flap for it to become UP.

My results are different, are you applying to CURRENT?

In my case, after the first 4 link flaps there are no more. Your link should not be going down at all.

Is it possible your dongle is defective?

My results are different, are you applying to CURRENT?

Yes

In my case, after the first 4 link flaps there are no more. Your link should not be going down at all.

It rarely happens.

Is it possible your dongle is defective?

It's a dock for my steam deck and it works fine with linux and windows.

To be clear, my comments are not meant to block this revision. instead, this change is LGTM.
But it doesn't completely fix the issue. I wanted you to be aware of that.

sys/dev/usb/net/if_ure.c
447–449

use bool for new_link and old_link

505–507

Up to your judgement, but try to not break strings as much as possible.

545

style: Do not test without a comparison, or with unary ! (except for booleans).

This revision is now accepted and ready to land.Tue, Mar 10, 11:28 AM

Thanks!

So apparently the issue still manifests for you due to some kind of configuration difference between our systems.

I'd like to fix this as well.

Can you give me your log? Do a:

grep -E 'ure|ue0' /var/log/messages

, may be ue1 depending on configuration.

Thanks!

So apparently the issue still manifests for you due to some kind of configuration difference between our systems.

I'd like to fix this as well.

Can you give me your log? Do a:

grep -E 'ure|ue0' /var/log/messages

, may be ue1 depending on configuration.

I've to apply your patch again and wait to see the new logs.
Will tell you.

Sorry about the style violations, I will read the guide more carefully and adjust both DRs.

I'm working on one for axe now, that's the third remaining common USB Ethernet interface using MII, once we finish axge and this one (ure) that will be all of them.

Sorry about the style violations, I will read the guide more carefully and adjust both DRs.

don't worry about it. we all make those mistakes.
that's why we review each other's work.

This revision now requires review to proceed.Wed, Mar 11, 8:20 PM
This revision is now accepted and ready to land.Wed, Mar 11, 8:57 PM

Apologies if I'm being impatient, but when does the patch go into the tree?

Apologies if I'm being impatient, but when does the patch go into the tree?

In PR 252165, the original reporter Ali Abdallah said,

This is an issue in all miibus modules, rgephy.c, truephy.c, mlphy.c, etc...
Please note that also in mii_pollstat, mii->mii_media_status is initialized to 0 (not sure why), and that also races with miibus_linkchg.

So shall other drivers be checked as well, but not only if_ure(4) ?

I've to apply your patch again and wait to see the new logs.
Will tell you.

That problem no longer occurs for me.
This revision has been tested and verified to work.
For D55631, please wait for someone else to review and test the driver, as I don't have the required hardware.
I'll commit this one.

This revision was automatically updated to reflect the committed changes.

Thank you pouria!

zlei:

So, I'm trying to do the fix for axge too, it's the linked DR.

That's the two major dongle chipsets, I was also going to do axe.

The pi 3 and some other devices have a different driver chipset, but I will just wait for Adrian to actually fix MII as far as those go.

I have tested the ure driver across several environments. Here are the results:

Test 1: Raspberry Pi 4(arm64)
OS: 16-CURRENT
Method: route -n monitor (x5)
Result: OK

Test 2: Pine64(arm64)
OS: 15.0-RELEASE
Method: route -n monitor (x5)
Result: OK

Test 3: Raspberry Pi Zero 2W(arm64)
OS: 15.0-RELEASE
Method: route -n monitor (x4) + CARP configured on the ure interface.
Result: NG (Failed)

dmesg output for Test 3:

net0: link state changed to DOWN
carp: 1@net0: BACKUP -> INIT (hardware interface up)
carp: 1@net0: INIT -> BACKUP (initialization complete)
net0: 3 link states coalesced
net0: link state changed to UP
carp: 1@net0: BACKUP -> INIT (hardware interface down)
carp: demoted by 240 to 240 (interface down)

Next Steps:
I will perform CARP testing on the Raspberry Pi 4 (16-CURRENT).
I plan to apply the "Lock MII bus operations" changes to the axe and axge drivers on the Pi Zero 2W (15.0-RELEASE) to see if it resolves the issue.

Regarding the axe(include with CARP) and axge drivers, preliminary testing is ongoing, and no issues have been observed so far.

FYI
I have created and am currently testing a patch that includes "Lock-mii-bus-operation" for 15.0-RELEASE. (Link: patch)
I ported some parts from the axge driver (specifically the changes around ure_attach at line 515).

So far, it has passed an 11-minute test without any link errors, just 15 spurious link downs[OK].