Page MenuHomeFreeBSD

import iflib
ClosedPublic

Authored by kmacy on Feb 6 2016, 12:37 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Apr 14, 3:46 AM
Unknown Object (File)
Fri, Apr 12, 1:02 PM
Unknown Object (File)
Tue, Apr 9, 3:33 AM
Unknown Object (File)
Mon, Apr 8, 9:28 PM
Unknown Object (File)
Mon, Apr 8, 2:55 PM
Unknown Object (File)
Mon, Apr 8, 12:53 PM
Unknown Object (File)
Thu, Apr 4, 7:32 PM
Unknown Object (File)
Mar 11 2024, 2:00 AM

Details

Summary

iflib is a library to eliminate the need for frequently duplicated device independent logic propagated (poorly) across many network drivers.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I don't see a change to sys/conf/files to add iflib.c

Similarly, I don't see changes in the ixgbe / ixl diffs to add a depend on iflib in sys/conf/files.*

Wasn't in the diff itself - see update.

update for bus_get_cpus size interface
update for ATR support in ixl

use correct reference point for diff

sys/net/iflib.c
2963

Looks like this requires nextbsd 9230bff and a5090db as another dependency (the changes to add a register method to device_if.m).

Are you compile testing this on -current?

When linking, the mp_ring here conflicts with the one in the cxgbe driver if both iflib and cxgbe are included in the kernel.

sys/net/iflib.c
425

This (and the similar SYSINIT in the TASKQGROUP_DEFINE()) prevent building drivers using iflib into the kernel.

The sysinits run well after the pci bus is walked, and you wind up tripping over uninitialized stuff everywhere.

When linking, the mp_ring here conflicts with the one in the cxgbe driver if both iflib and cxgbe are included in the kernel.

Will rename.

sys/net/iflib.c
425

Hrrm. Can one safely run them before the pci tree is walked?

sys/net/iflib.c
2680

So, who is supposed to clear this? I don't see any code to clear this flag.

I confess that I noticed this not because of brilliant code review skills, but because I have a box accepting only 1/N ssh sessions, and complaining constantly about 'syslogd: sendto: No buffer space available'. When poking around in kgdb, I see this flag set on several rings.

sys/net/iflib.c
277

What is this in_use field for? Just debugging?

Update to m_init and kernel config to compile ixl/ixlv/ix/ixv in kernel with GENERIC

  • does not address Gallatin's most recent issues
sys/net/iflib.c
278

It's how many descriptors in the ring that are in use. It's incremented in transmit and decremented in tx_reclaim. It's just for reporting / debugging.

2681

Sorry, that was meant interoperate with other backpressure code that is only in my development branch.

Address issues raised by Gallatin:

  • don't mark a queue as closed without corresponding driver backpressure support
  • rename mp_ring structures / types / functions to avoid namespace collisions when linking cxgbe in to the kernel

Build comments

sys/conf/files
1896 ↗(On Diff #13451)

Should the changes for this part be in D5213 instead?

sys/conf/files.amd64
239 ↗(On Diff #13451)

These should be moved to D5214?

sys/conf/files
1896 ↗(On Diff #13459)

Probably. But it's more difficult managing it as part of that patchset.

sys/conf/files.amd64
239 ↗(On Diff #13459)

Again. Probably, but since iflib is kind of package deal anyway if iflib goes in without one or both of those drivers this part of the change will just be dropped.

sys/net/iflib.c
426

I'm trying to fix this but it's hard to diagnose with DDB being unusable at this stage in boot thanks to USB "modernization." If you know what values I should pass that would be great. It may well require jhb's SMP startup changes.

move initialization earlier in boot for the benefit of statically linked drivers

sys/conf/files.amd64
239 ↗(On Diff #13478)

Actually, the initial version that goes in probably won't look like this. There will be the "legacy" version as the default and the alternative iflib version.

sys/net/iflib.c
427

I think this is fixed by my latest iflib and taskq changes.

sys/net/iflib.c
1663

Settiung of the hwassist bits seems to have been removed from the drivers (at least ixl), and does not seem to appear anywhere else in iflib. Should it also appear in iflib_init_locked()?

To follow up on my last comment, fixing this (patch appended) seems to result in the kernel actually sending down packets with csum offload, which then causes the ixl NIC to stop passing traffic, and to die occasionally with: "ixl0: Malicious Driver Detection event 0x02 on TX queue 5 pf number 0x00"

diff --git a/sys/net/iflib.c b/sys/net/iflib.c
--- a/sys/net/iflib.c
+++ b/sys/net/iflib.c
@@ -1681,6 +1681,7 @@ static void
 iflib_init_locked(if_ctx_t ctx)
 {
        if_softc_ctx_t sctx = &ctx->ifc_softc_ctx;
+       if_t ifp = ctx->ifc_ifp;
        iflib_fl_t fl;
        iflib_txq_t txq;
        iflib_rxq_t rxq;
@@ -1697,6 +1698,15 @@ iflib_init_locked(if_ctx_t ctx)
                iflib_netmap_rxq_init(ctx, rxq);
        }
 
+
+       /* Set hardware offload abilities */
+       if_clearhwassist(ifp);
+       if (if_getcapenable(ifp) & IFCAP_TXCSUM)
+               if_sethwassistbits(ifp, CSUM_TCP | CSUM_UDP, 0);
+       if (if_getcapenable(ifp) & IFCAP_TSO4)
+               if_sethwassistbits(ifp, CSUM_TSO, 0);
+       if (if_getcapenable(ifp) & IFCAP_TXCSUM_IPV6)
+               if_sethwassistbits(ifp,  (CSUM_TCP_IPV6 | CSUM_UDP_IPV6), 0);
        IFDI_INIT(ctx);
        for (i = 0, rxq = ctx->ifc_rxqs; i < sctx->isc_nqsets; i++, rxq++) {
                for (j = 0, fl = rxq->ifr_fl; j < rxq->ifr_nfl; j++, fl++) {

We still seem to be missing the device_register api required by iflib. Eg:

diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m
index a5319c6..18d9fd7 100644
--- a/sys/kern/device_if.m
+++ b/sys/kern/device_if.m
@@ -62,6 +62,11 @@ CODE {
        {
            return 0;
        }
+
+       static void *null_register(device_t dev)
+       {
+           return NULL;
+       }
 };
        
 /**
@@ -316,3 +321,7 @@ METHOD int resume {
 METHOD int quiesce {
        device_t dev;
 } DEFAULT null_quiesce;
+
+METHOD void * register {
+       device_t dev;
+} DEFAULT null_register;
sys/net/iflib.c
427

I did a fresh git clone of current and arc patch today (after walking the tree back prior to the last commit to sys/bus.h so that it would apply), and I'm still seeing panic on boot when ixl is built into the kernel.

I noticed that the taskq change is still using SI_ORDER_SMP.

1504

Actually, you DO indeed need INVARIANTS around this. W/o Invariants, "delta" triggers an unused warning, killing the build. Sorry.

Hmm.. Even when loading the module at boot on -current, I'm still seeing a panic:

ixl0: using 28 queues
ixl0: Using MSIX interrupts with 29 vectors
ixl0: allocated for 28 queues

Fatal trap 12: page fault while in kernel mode
cpuid = 18; apic id = 14
fault virtual address = 0x1818
fault code = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff80ad711c
stack pointer = 0x28:0xfffffe3fca9f3fb0
frame pointer = 0x28:0xfffffe3fca9f4020
code segment = base 0x0, limit 0xfffff, type 0x1b

= DPL 0, pres 1, long 1, def32 0, gran 1

processor eflags = interrupt enabled, resume, IOPL = 0
current process = 123 (kldload)
[ thread pid 123 tid 100220 ]
Stopped at taskqgroup_attach+0x4c: lock cmpxchgq %rsi,0x1818(%r15)
db> bt
Tracing pid 123 tid 100220 td 0xfffff8013b123000
taskqgroup_attach() at taskqgroup_attach+0x4c/frame 0xfffffe3fca9f4020
iflib_irq_alloc_generic() at iflib_irq_alloc_generic+0x357/frame 0xfffffe3fca9f40b0
ixl_if_msix_intr_assign() at ixl_if_msix_intr_assign+0x66/frame 0xfffffe3fca9f4130iflib_device_register() at iflib_device_register+0x17a9/frame 0xfffffe3fca9f4470
iflib_device_attach() at iflib_device_attach+0xb7/frame 0xfffffe3fca9f44a0
device_attach() at device_attach+0x423/frame 0xfffffe3fca9f4500
pci_driver_added() at pci_driver_added+0xed/frame 0xfffffe3fca9f4540
devclass_driver_added() at devclass_driver_added+0x7d/frame 0xfffffe3fca9f4580
devclass_add_driver() at devclass_add_driver+0x11c/frame 0xfffffe3fca9f45c0
module_register_init() at module_register_init+0x177/frame 0xfffffe3fca9f45f0
linker_load_module() at linker_load_module+0xca8/frame 0xfffffe3fca9f4920
kern_kldload() at kern_kldload+0xea/frame 0xfffffe3fca9f4960
sys_kldload() at sys_kldload+0x5b/frame 0xfffffe3fca9f4990
amd64_syscall() at amd64_syscall+0x502/frame 0xfffffe3fca9f4ab0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe3fca9f4ab0

  • syscall (304, FreeBSD ELF64, sys_kldload), rip = 0x80086b83a, rsp = 0x7fffffffe7e8, rbp = 0x7fffffffed00 ---

db>

Hmm.. Even when loading the module at boot on -current, I'm still seeing a panic:

D5205 is supposed to fix this crash.

In D5211#114958, @erj wrote:

Hmm.. Even when loading the module at boot on -current, I'm still seeing a panic:

D5205 is supposed to fix this crash.

That pre-dates iflib; iflib (mostly) obviates that review

Hmm.. Even when loading the module at boot on -current, I'm still seeing a panic:

I can confirm that moving back to SI_SUB_SMP in the DECLARE_MODULE macro fixes it enough to at least work with an ixl module loaded post-boot.

diff --git a/sys/net/iflib.c b/sys/net/iflib.c
index ab5f74c..37d970e 100644
--- a/sys/net/iflib.c
+++ b/sys/net/iflib.c
@@ -421,7 +421,7 @@ static moduledata_t iflib_moduledata = {
        NULL
 };
 
-DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_INIT_IF, SI_ORDER_ANY);
+DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_SMP, SI_ORDER_ANY);
 MODULE_VERSION(iflib, 1);
 
 MODULE_DEPEND(iflib, pci, 1, 1, 1);

We still seem to be missing the device_register api required by iflib. Eg:

Sorry, I'm sure it was in an earlier patch. Will put it back up.

Hmm.. Even when loading the module at boot on -current, I'm still seeing a panic:

I can confirm that moving back to SI_SUB_SMP in the DECLARE_MODULE macro fixes it enough to at least work with an ixl module loaded post-boot.

diff --git a/sys/net/iflib.c b/sys/net/iflib.c
index ab5f74c..37d970e 100644
--- a/sys/net/iflib.c
+++ b/sys/net/iflib.c
@@ -421,7 +421,7 @@ static moduledata_t iflib_moduledata = {
        NULL
 };
 
-DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_INIT_IF, SI_ORDER_ANY);
+DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_SMP, SI_ORDER_ANY);
 MODULE_VERSION(iflib, 1);
 
 MODULE_DEPEND(iflib, pci, 1, 1, 1);

Ok. I needed that change to make my system boot with ixgbe linked in to the kernel.

To follow up on my last comment, fixing this (patch appended) seems to result in the kernel actually sending down packets with csum offload, which then causes the ixl NIC to stop passing traffic, and to die occasionally with: "ixl0: Malicious Driver Detection event 0x02 on TX queue 5 pf number 0x00"

The flag setting appears to have gotten lost in a mismerge jumping between branches. The first place I'd look to fix that error would be at my changes to ixl when I removed packet parsing.

Hmm.. Even when loading the module at boot on -current, I'm still seeing a panic:

I can confirm that moving back to SI_SUB_SMP in the DECLARE_MODULE macro fixes it enough to at least work with an ixl module loaded post-boot.

diff --git a/sys/net/iflib.c b/sys/net/iflib.c
index ab5f74c..37d970e 100644
--- a/sys/net/iflib.c
+++ b/sys/net/iflib.c
@@ -421,7 +421,7 @@ static moduledata_t iflib_moduledata = {
        NULL
 };
 
-DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_INIT_IF, SI_ORDER_ANY);
+DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_SMP, SI_ORDER_ANY);
 MODULE_VERSION(iflib, 1);
 
 MODULE_DEPEND(iflib, pci, 1, 1, 1);

Uhm ... did you pull in the corresponding update to the task queue patch? That's probably the real problem.

In D5211#114963, @kmacy wrote:

Hmm.. Even when loading the module at boot on -current, I'm still seeing a panic:

I can confirm that moving back to SI_SUB_SMP in the DECLARE_MODULE macro fixes it enough to at least work with an ixl module loaded post-boot.

diff --git a/sys/net/iflib.c b/sys/net/iflib.c
index ab5f74c..37d970e 100644
--- a/sys/net/iflib.c
+++ b/sys/net/iflib.c
@@ -421,7 +421,7 @@ static moduledata_t iflib_moduledata = {
        NULL
 };
 
-DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_INIT_IF, SI_ORDER_ANY);
+DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_SMP, SI_ORDER_ANY);
 MODULE_VERSION(iflib, 1);
 
 MODULE_DEPEND(iflib, pci, 1, 1, 1);

Uhm ... did you pull in the corresponding update to the task queue patch? That's probably the real problem.

Yes.

I think the problem is actually in that file. Specifically, the SYSINIT() in the TASKQGROUP_DEFINE() macro that uses SI_SUB_SMP pretty much makes certain that when the DRIVER_MODULE executes earlier, the taskq stuff will still be NULL. Then that stays cached, and the drivers trip over it.

In D5211#114963, @kmacy wrote:

Hmm.. Even when loading the module at boot on -current, I'm still seeing a panic:

I can confirm that moving back to SI_SUB_SMP in the DECLARE_MODULE macro fixes it enough to at least work with an ixl module loaded post-boot.

diff --git a/sys/net/iflib.c b/sys/net/iflib.c
index ab5f74c..37d970e 100644
--- a/sys/net/iflib.c
+++ b/sys/net/iflib.c
@@ -421,7 +421,7 @@ static moduledata_t iflib_moduledata = {
        NULL
 };
 
-DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_INIT_IF, SI_ORDER_ANY);
+DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_SMP, SI_ORDER_ANY);
 MODULE_VERSION(iflib, 1);
 
 MODULE_DEPEND(iflib, pci, 1, 1, 1);

Uhm ... did you pull in the corresponding update to the task queue patch? That's probably the real problem.

Yes.

I think the problem is actually in that file. Specifically, the SYSINIT() in the TASKQGROUP_DEFINE() macro that uses SI_SUB_SMP pretty much makes certain that when the DRIVER_MODULE executes earlier, the taskq stuff will still be NULL. Then that stays cached, and the drivers trip over it.

No. The updated taskq patch doesn't use SI_SUB_SMP.

In D5211#114965, @kmacy wrote:

No. The updated taskq patch doesn't use SI_SUB_SMP.

There are 2 patches in that review. The second patch updates 2 SYSINITS to use the IF order, but leaves one at SMP.

In D5211#114965, @kmacy wrote:

No. The updated taskq patch doesn't use SI_SUB_SMP.

There are 2 patches in that review. The second patch updates 2 SYSINITS to use the IF order, but leaves one at SMP.

Right. Adjust updates the number of threads once SMP comes on line. Are you saying that that's still a problem?

It's odd that ixgbe works with the change. Although I admit I haven't tried ixgbe as a module with that change.

Right. Adjust updates the number of threads once SMP comes on line. Are you saying that that's still a problem?

Honestly, I haven't thought much about it. I'm just trying to get things to not crash so that I can try to get ixl working in a controlled environment before I test it for real. All I can say for sure is that the patch, as it stands today, goes "kaboom" with both ixl compiled as a module, and with it compiled into the kernel.

I'll try to diagnose this after I get ixl not to barf on the descriptors we're sending down to it when csum offload is enabled.

Add missed device_if.m changes

  • set hwassist flags in init
  • set tcp_hlen even without TSO
  • include opt_inet.h & opt_inet6.h
  • make enabled code compile
  • fix !INVARIANTS warning

Please note that the complete set of changes against HEAD is maintained in the HEAD_MERGE/iflib branch at https://github.com/NextBSD/NextBSD.git

fix tcp header assignment in header parsing

  • TSO fixes
  • serialize doorbell updates
  • extend tx pkt_info format for bxe TSO
  • heavily revise rx pkt info for bxe S/G across rings
  • don't break out of transmit loop on busdma map failure
  • report internal state and error statistics via sysctl
  • also check if there are more credits available when deciding to drain
  • re-merge against HEAD
  • Restore small packet rx optimization
  • simplify tx db locking
  • Mbuf free optimizations
  • bus_get_cpus change
  • fix memory corruptions caused by bxe and an probe routines
  • support for architectures without 32-bit atomics
  • taskqueue group fixes
  • support for drivers that don't know what their msix-bar is and so map everything
  • changes to private ioctl interface
  • sysctl extensions
  • lots of good stuff

Are the changes to if_an, if_bxe, and kern_mbuf.c intentional?

The change to kern_mbuf allows iflib to avoid gratuitous memory touches on free. The other driver changes are_not_ optional as they fix use after free memory corruptions in those drivers' probe routines that have hit iflib on driver reload (unload and then reload a driver kmod).

You can leave out the mbuf changes in the initial commit I guess. They also rerequire adding defines to mbuf.h which I missed.

I've made the changes to if_an and committed that part of the patch to freebsd-head

sys/dev/an/if_an.c
322 ↗(On Diff #16239)

Prototype needs to be removed from the headers.

sys/dev/an/if_an_pci.c
152 ↗(On Diff #16239)

This line can be moved into an_probe_pci() instead of duplicating the id list search

159 ↗(On Diff #16239)

It is gratuitous, but it's also a safety pattern that everyone follows,

add missed change to mbuf.h

Sorry. Thought you'd just remove references to that flag.

accidentally lost the driver probe fixes

remove unintended header changes

sys/dev/bxe/bxe.c
2495 ↗(On Diff #16263)

This should work, device_set_desc_copy() is intended for this usage pattern. Moving it out to bxe_attach() is the wrong thing, IMHO.

sys/dev/bxe/bxe.c
2495 ↗(On Diff #16263)

No. Unless you know with 100% certainty that the driver will subsequently attach (which you can't as you might have another driver that eats device nodes to facilitate passthrough) you should never ever touch the device node in probe. Probe is purely yay or nay.

That's not true. device_set_desc and friends are specifically designed to be operated on in device_probe.

Maybe at one point it worked. It causes use after frees now.

In any event. I don't see any point in setting the description until attach. It may be the freeing of descbuf isn't permitted, but in that case you'd have a leak if it didn't subsequently attach. This is not a robust interface.

Add conf changes to enable iflib to build

gallatin edited edge metadata.
This revision is now accepted and ready to land.May 15 2016, 1:07 AM
mizhka added inline comments.
sys/net/mp_ring.c
45

MIPS buildkernel failed with:
mp_ring.c:204: warning: implicit declaration of function 'atomic_cmpset_64'

Could you please add support for non-x86 platforms (arm, mips32)?

Thanks!

There is a directive for platforms without 64-bit atomics. Such arches
should put that in their param.h, sorry for the oversight. A lot of balls
in the air.

-M

Thank you for quick reply!

Another issue is:
src/sys/net/iflib.c: In function '_iflib_fl_refill':
src/sys/net/iflib.c:1538: warning: comparison is always true due to limited range of data type
src/sys/net/iflib.c: In function 'iflib_fl_bufs_free':
src/sys/net/iflib.c:1667: warning: comparison is always true due to limited range of data type

I suppose it's due to unsigned types. But build is failed due to warnings.

Thanks!

Sorry. Send Scott a patch. I can't do anything about it.

-M

Thanks! BTW, it was last compilation issue under MIPS o32.