Page MenuHomeFreeBSD

convert ixgbe to iflib
AbandonedPublic

Authored by cramerj_intel.com on Feb 6 2016, 2:25 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Dec 21, 4:14 PM
Unknown Object (File)
Thu, Dec 19, 7:12 AM
Unknown Object (File)
Tue, Dec 17, 4:48 PM
Unknown Object (File)
Mon, Dec 16, 10:10 AM
Unknown Object (File)
Sat, Dec 14, 7:59 PM
Unknown Object (File)
Fri, Dec 13, 1:36 AM
Unknown Object (File)
Wed, Dec 11, 11:51 AM
Unknown Object (File)
Sat, Dec 7, 4:56 AM

Details

Summary

This is a preliminary conversion of iflib.

  • IOV support is untested
  • netmap support is untested

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 7062
Build 7241: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
gallatin edited edge metadata.
jeffrey.e.pieper_intel.com edited edge metadata.

Stop the bus, I want to get off...

I'm seeing panics during init/ifup on certain parts. More specifically:

none71@pci0:130:0:0: class=0x020000 card=0x004c1170 chip=0x10fb8086 rev=0x01 hdr=0x00

vendor     = 'Intel Corporation'
device     = '82599ES 10-Gigabit SFI/SFP+ Network Connection'
class      = network
subclass   = ethernet
cap 01[40] = powerspec 3  supports D0 D3  current D0
cap 05[50] = MSI supports 1 message, 64 bit, vector masks
cap 11[70] = MSI-X supports 64 messages
             Table in map 0x20[0x0], PBA in map 0x20[0x2000]
cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR RO NS
             link x8(x8) speed 5.0(5.0) ASPM disabled(L0s)
cap 03[e0] = VPD
ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
ecap 0003[140] = Serial 1 008cfaffff18a128
ecap 000e[150] = ARI 1
ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled
                 0 VFs configured out of 64 supported
                 First VF RID Offset 0x0180, VF RID Stride 0x0002
                 VF Device ID 0x10ed
                 Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304

none72@pci0:130:0:1: class=0x020000 card=0x004c1170 chip=0x10fb8086 rev=0x01 hdr=0x00

vendor     = 'Intel Corporation'
device     = '82599ES 10-Gigabit SFI/SFP+ Network Connection'
class      = network
subclass   = ethernet
cap 01[40] = powerspec 3  supports D0 D3  current D0
cap 05[50] = MSI supports 1 message, 64 bit, vector masks
cap 11[70] = MSI-X supports 64 messages
             Table in map 0x20[0x0], PBA in map 0x20[0x2000]
cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR RO NS
             link x8(x8) speed 5.0(5.0) ASPM disabled(L0s)
cap 03[e0] = VPD
ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
ecap 0003[140] = Serial 1 008cfaffff18a128
ecap 000e[150] = ARI 1
ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled
                 0 VFs configured out of 64 supported
                 First VF RID Offset 0x0180, VF RID Stride 0x0002
                 VF Device ID 0x10ed
                 Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304

These are Dell X520 mezz cards.

if_ix_iflib_init1.jpg (400×720 px, 86 KB)

if_ix_iflib_init2.jpg (400×720 px, 73 KB)

if_ix_iflib_init3.jpg (400×720 px, 96 KB)

if_ix_iflib_init4.jpg (400×720 px, 65 KB)

Another small nit: Print the driver version at attach as well as in sysctl.

This revision now requires changes to proceed.Oct 19 2016, 4:42 PM

Is this with the driver in kernel or as a module? And with INVARIANTS or
without?

Jeffrey:

Is this the same or different as reported in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211771 with the unmodified driver? The wording caught my eye as something we had already discussed a bit.

In D5213#172435, @kmacy wrote:

What svn rev are you using?

9e6f38428d632aa7be0c61c681627f0085399d3e convert ixgbe to iflib
d7d4326b414d043a46827d9d0ea449b119ed358c Increase timeouts so tests have more chances to succeed on MIPS64EB in QEMU.
ca21fba3dea2485ec44a5d32e3a82e068b78a87c Fix ipfw table lookup handler to return entry value, but not its index.
23dcf48f60666d39ff92b68e8c578ff41fbda16e Add FFS pager, which uses buffer cache read operation to validate pages. See the comments for more detailed description of the algorithm.
5579c4169c49e58a6a1e122d3c5b56637942d452 Add big-endian uzip file system and choose right file system to proceed tests with.
b4afcf142fd2513f22ed1c98088864b268d47693 hyperv/vmbus: Expose channel management taskqueue for driver to use.
38255c19df588d1ff0548be101755a7a756fb3a1 RPI3 is retired in preference to GENERIC-UP. Use that instead.
fcbea8e537b499aaf197be36cdfb34986232eec6 Use MODULES_EXTRA rather than MODULES_OVERRIDE for dtb.
8c4ba2c67a6e233659bf79b2fb35ae3c2f60d578 [net80211] Initial full-offload scan support.
c430547a157a39f3559c61424a7cae2d97360478 Fix typo in comment.

In D5213#172438, @kmacy wrote:

Is this with the driver in kernel or as a module? And with INVARIANTS or
without?

Loaded as a module without INVARIANTS.

Jeffrey:

Is this the same or different as reported in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211771 with the unmodified driver? The wording caught my eye as something we had already discussed a bit.

FreeBSD u0807 12.0-CURRENT FreeBSD 12.0-CURRENT #0 9e6f384(arcpatch-D5213): Wed Oct 19 13:03:43 UTC 2016 root@s254:/usr/obj/diskless/os/FreeBSD/12-CURRENT_2016-08-29-D5213/usr/src/sys/VALIDATION amd64
[root@u0807 /]# kldload if_ix
[root@u0807 /]# ifconfig ix0 190.2.8.7 up
[root@u0807 /]# dmesg
ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port 0xe020-0xe03f mem 0xec000000-0xec03ffff,0xec080000-0xec083fff irq 58 at device 0.0 numa-domain 1 on pci11
ix0: Using MSIX interrupts with 9 vectors
ix0: Ethernet address: 00:8c:fa:18:a1:28
ix0: PCI Express Bus: Speed 5.0GT/s Width x8
ix0: netmap queues/slots: TX 8/2048, RX 8/2048
ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port 0xe000-0xe01f mem 0xec040000-0xec07ffff,0xec084000-0xec087fff irq 61 at device 0.1 numa-domain 1 on pci11
ix1: Using MSIX interrupts with 9 vectors
ix1: Ethernet address: 00:8c:fa:18:a1:29
ix1: PCI Express Bus: Speed 5.0GT/s Width x8
ix1: netmap queues/slots: TX 8/2048, RX 8/2048
ix0: link state changed to UP

Another data point:

Neither an X520-DA2 adapter (same device id) nor X552 show this behavior.

In D5213#172435, @kmacy wrote:

What svn rev are you using?

9e6f38428d632aa7be0c61c681627f0085399d3e convert ixgbe to iflib
d7d4326b414d043a46827d9d0ea449b119ed358c Increase timeouts so tests have more chances to succeed on MIPS64EB in QEMU.
ca21fba3dea2485ec44a5d32e3a82e068b78a87c Fix ipfw table lookup handler to return entry value, but not its index.
23dcf48f60666d39ff92b68e8c578ff41fbda16e Add FFS pager, which uses buffer cache read operation to validate pages. See the comments for more detailed description of the algorithm.
5579c4169c49e58a6a1e122d3c5b56637942d452 Add big-endian uzip file system and choose right file system to proceed tests with.
b4afcf142fd2513f22ed1c98088864b268d47693 hyperv/vmbus: Expose channel management taskqueue for driver to use.
38255c19df588d1ff0548be101755a7a756fb3a1 RPI3 is retired in preference to GENERIC-UP. Use that instead.
fcbea8e537b499aaf197be36cdfb34986232eec6 Use MODULES_EXTRA rather than MODULES_OVERRIDE for dtb.
8c4ba2c67a6e233659bf79b2fb35ae3c2f60d578 [net80211] Initial full-offload scan support.
c430547a157a39f3559c61424a7cae2d97360478 Fix typo in comment.

I assume this means you did a git pull (asked for svn rev, but this is git log information) to update to the "latest" head prior to these tests?

Please try running worth INVARIANTS on the offending hardware.

@jeffrey.e.pieper_intel.com Please run with INVARIANTS. No one else has the offending hardware.

@jeffrey.e.pieper_intel.com and please post dmesg for the system it crashes on.

In D5213#172500, @kmacy wrote:

@jeffrey.e.pieper_intel.com and please post dmesg for the system it crashes on.

I read it the first time, thank you.

@jeffrey.e.pieper_intel.com The panic you've hit means that the admin task was not assigned a taskqueue. In all likelihood we're hitting an initialization edge case on the system itself, not that particular adapter.

In D5213#172502, @kmacy wrote:

@jeffrey.e.pieper_intel.com The panic you've hit means that the admin task was not assigned a taskqueue. In all likelihood we're hitting an initialization edge case on the system itself, not that particular adapter.

That makes sense, as the device id is the same as a standard X520-DA2 adapter.

ix_iflib_init_debug1.jpg (400×720 px, 80 KB)

ix_iflib_init_debug2.jpg (400×720 px, 91 KB)

@jeffrey.e.pieper_intel.com are all your test systems multi-socket or just this one?

In D5213#172512, @kmacy wrote:

@jeffrey.e.pieper_intel.com are all your test systems multi-socket or just this one?

For the most part. Everything I've been using for testing iflib is MP, with the exception of The BDX-DE (X552). The systems in question are actually mainly used as clients (I have many). They are Dell C6220, all with X520 mezz cards. The only UP systems that I have (besides desktop/mobile systems) are Supermicro Microcloud servers w/ 8 nodes per chassis, which are also used as clients.

That makes sense, as the device id is the same as a standard X520-DA2 adapter.

ix_iflib_init_debug1.jpg (400×720 px, 80 KB)

ix_iflib_init_debug2.jpg (400×720 px, 91 KB)

In this case irq_alloc isn't getting called for the admin task. I'll take a closer look at ixgbe and fix the logic in iflib to handle this case.

sbruno edited edge metadata.

Preinitialize admin task and add some INVARIANT debugging in case this
doesn't capture why Intel lab testing fails so miserably.

Jeffrey:

Give this a spin and lets see if we can make forward progress with it.

Preinitialize admin task and add some INVARIANT debugging in case this
doesn't capture why Intel lab testing fails so miserably.

Jeffrey:

Give this a spin and lets see if we can make forward progress with it.

That seemed to do the trick. Ran some basic functionality on X520 (both adapter and mezz card) as well a couple BDX-DE skus and X540. I'm currently running netperf stress tests. I will do some more comprehensive testing (driver load/unload stress, mtu tests etc) after they've ran 24 hours.

Because this is a structural change of some magnitude, does anyone have a handle on how the man page should be updated?

I am confused by the question about the manual page. That can be put under a separate review if you like.

In D5213#173227, @gnn wrote:

I am confused by the question about the manual page. That can be put under a separate review if you like.

The changes to the driver are going to need an update to the man page for ix(4). That is my only point. I haven't done it, but I am noting that it needs to be done.

jeffrey.e.pieper_intel.com edited edge metadata.

There is an issue where a reset can cause the interface to be effectively dead and is only recoverable by reloading the driver. This was found during an mtu test, where the mtu is changed and traffic is ran between each iteration. It can be reproduced with just iterations of ifconfig up/down with ping and netperf between each iteration. See the attached log and repro script.


This revision now requires changes to proceed.Oct 31 2016, 5:20 PM

Also, I mentioned this earlier, but PLEASE print the driver version in dmesg at attach as well as in sysctl. It isn't very optimal to have no clue as to what driver version is being used.

hrm ... I had the same behavior on an em(4) system when executing an "ifconfig em0 down up" here. Are you sure this is an "iflib" problem?

hrm ... I had the same behavior on an em(4) system when executing an "ifconfig em0 down up" here. Are you sure this is an "iflib" problem?

Considering this doesn't reproduce on ix legacy, I'm pretty sure. The mtu test completes with no errors anyways.

hrm ... I had the same behavior on an em(4) system when executing an "ifconfig em0 down up" here. Are you sure this is an "iflib" problem?

Considering this doesn't reproduce on ix legacy, I'm pretty sure. The mtu test completes with no errors anyways.

ix_iflib usually fails within a few minutes. I've been running the script on ix legacy for 10 minutes with no failures.

ix_iflib usually fails within a few minutes. I've been running the script on ix legacy for 10 minutes with no failures.

I was able to make this happen this morning it I do *exactly* what Jeffrey is doing. There's some kind of race here that I think is related to handling of reinit being to "graceful".

Jeffrey, can you modify your iflib install with this? I'll update the review in a bit after I'm done with my testing, but so far, this seems to be "doing the same" if not "right" thing:

Index: sys/net/iflib.c
===================================================================
--- sys/net/iflib.c     (revision 308582)
+++ sys/net/iflib.c     (working copy)
@@ -3228,7 +3228,7 @@
                                        err = IFDI_PROMISC_SET(ctx, if_getflags(ifp));
                                }
                        } else
-                               reinit = 1;
+                               iflib_if_init(ctx);
                } else if (if_getdrvflags(ifp) & IFF_DRV_RUNNING) {
                        iflib_stop(ctx);
                }
kmacy edited edge metadata.
sbruno added a reviewer: sbruno.

We're awaiting testing of this at Intel for final verification.

sbruno edited edge metadata.

Drop iflib.c changes as they were committed in support of other ethernet drivers.

sbruno edited edge metadata.

Catch up to head churn in iflib

Remove shims for legacy vs iflib in head. iflib will be the default
for 12-current and forward.

Try to cleanup and prepare for merge.

Make a buildable version of iflib'd ixgbe.

Remove duped function found while compiling for MIPS64 OCTEON1

Sean,

The ixv fails to build because there is no ifdi_if.h in SRCS. Could you update the Makefile?

Thanks,
Chris

This revision now requires changes to proceed.Feb 2 2017, 8:04 AM
sbruno edited edge metadata.

Add ifdi_if.h to ixv(4) module Makefile

sys/dev/ixgbe/ixgbe_sysctl.c
557

Oops, I believe there is a problem from here: Duplicate code.

sbruno edited edge metadata.

De-dupe added file.

On a router/firewall use case: About -13% performance drop (4 cores Intel Xeon L5630 with Intel 82599EB NIC)

x r313730, paquets-per-second
+ r313730-D5213, paquets-per-second
+--------------------------------------------------------------------------+
|      +                                                        x          |
|     ++               x                   ++                 x xx         |
|                                    |__________________A_______M_________||
||_____M_____________A___________________|                                 |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5       2439877     2923715.5     2907858.5     2816269.4     210659.57
+   5     2251528.5       2679501       2259267     2424025.7     229988.26
Difference at 95.0% confidence
        -392244 +/- 321639
        -13.9278% +/- 10.7244%
        (Student's t, pooled s = 220536)

graph.png (1×1 px, 35 KB)

Flamgraph before applying this patch:

Flamegraph after applying this patch:

cramerj_intel.com added inline comments.
sys/dev/ixgbe/if_ix.c
3942

In iflib.c, the function parameter passed to legacy ISRs is the shared context pointer. In this case, adapter. Changing the above two lines to...
struct adapter *adapter = arg;
struct ix_rx_queue *que = adapter->rx_queues;
...fixes a panic we encounter with legacy interrupts.

The function ixgbe_intr used in if_txrx struct filled in ix_txrx.c is defined in if_ix.c but ix_txrx.c is used in both ix and ixv drivers so ixv build as KLD cannot be loaded.

This revision now requires changes to proceed.Mar 22 2017, 2:31 PM
sys/modules/ix/Makefile
1

Hrm ... This part of the patch needs to be regenerated.

sys/dev/ixgbe/if_ix.c
5200

Why is this being commented out and not removed?

sys/dev/ixgbe/ix_txrx.c
41

I'm unclear why this part of the patch won't apply, but I'll figure it out to start testing.

Shouldn't D10293 supersede this anyways?

Oh yeah, since I "own" this review. I'll abandon it in favor of 10293