Page MenuHomeFreeBSD

import iflib
ClosedPublic

Authored by kmacy on Feb 6 2016, 12:37 AM.

Details

Summary

iflib is a library to eliminate the need for frequently duplicated device independent logic propagated (poorly) across many network drivers.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
kmacy added a comment.Feb 11 2016, 9:52 PM

I don't see a change to sys/conf/files to add iflib.c

Similarly, I don't see changes in the ixgbe / ixl diffs to add a depend on iflib in sys/conf/files.*

Wasn't in the diff itself - see update.

kmacy updated this revision to Diff 13244.Feb 12 2016, 1:19 AM

update for bus_get_cpus size interface
update for ATR support in ixl

kmacy updated this revision to Diff 13245.Feb 12 2016, 1:21 AM

use correct reference point for diff

gallatin added inline comments.Feb 15 2016, 7:59 PM
sys/net/iflib.c
2962 ↗(On Diff #13245)

Looks like this requires nextbsd 9230bff and a5090db as another dependency (the changes to add a register method to device_if.m).

Are you compile testing this on -current?

When linking, the mp_ring here conflicts with the one in the cxgbe driver if both iflib and cxgbe are included in the kernel.

gallatin added inline comments.Feb 16 2016, 10:17 PM
sys/net/iflib.c
424 ↗(On Diff #13245)

This (and the similar SYSINIT in the TASKQGROUP_DEFINE()) prevent building drivers using iflib into the kernel.

The sysinits run well after the pci bus is walked, and you wind up tripping over uninitialized stuff everywhere.

kmacy added a comment.Feb 17 2016, 9:19 PM

When linking, the mp_ring here conflicts with the one in the cxgbe driver if both iflib and cxgbe are included in the kernel.

Will rename.

sys/net/iflib.c
424 ↗(On Diff #13245)

Hrrm. Can one safely run them before the pci tree is walked?

gallatin added inline comments.Feb 18 2016, 3:43 PM
sys/net/iflib.c
2679 ↗(On Diff #13245)

So, who is supposed to clear this? I don't see any code to clear this flag.

I confess that I noticed this not because of brilliant code review skills, but because I have a box accepting only 1/N ssh sessions, and complaining constantly about 'syslogd: sendto: No buffer space available'. When poking around in kgdb, I see this flag set on several rings.

gallatin added inline comments.Feb 18 2016, 10:36 PM
sys/net/iflib.c
276 ↗(On Diff #13245)

What is this in_use field for? Just debugging?

kmacy updated this revision to Diff 13451.Feb 18 2016, 11:29 PM

Update to m_init and kernel config to compile ixl/ixlv/ix/ixv in kernel with GENERIC

  • does not address Gallatin's most recent issues
kmacy added inline comments.Feb 19 2016, 12:30 AM
sys/net/iflib.c
277 ↗(On Diff #13451)

It's how many descriptors in the ring that are in use. It's incremented in transmit and decremented in tx_reclaim. It's just for reporting / debugging.

2680 ↗(On Diff #13451)

Sorry, that was meant interoperate with other backpressure code that is only in my development branch.

kmacy updated this revision to Diff 13459.Feb 19 2016, 12:33 AM

Address issues raised by Gallatin:

  • don't mark a queue as closed without corresponding driver backpressure support
  • rename mp_ring structures / types / functions to avoid namespace collisions when linking cxgbe in to the kernel
erj added a subscriber: erj.Feb 19 2016, 12:47 AM

Build comments

sys/conf/files
1896 ↗(On Diff #13451)

Should the changes for this part be in D5213 instead?

sys/conf/files.amd64
239 ↗(On Diff #13451)

These should be moved to D5214?

kmacy added inline comments.Feb 19 2016, 12:55 AM
sys/conf/files
1896 ↗(On Diff #13459)

Probably. But it's more difficult managing it as part of that patchset.

sys/conf/files.amd64
239 ↗(On Diff #13459)

Again. Probably, but since iflib is kind of package deal anyway if iflib goes in without one or both of those drivers this part of the change will just be dropped.

kmacy added inline comments.Feb 19 2016, 1:18 AM
sys/net/iflib.c
425 ↗(On Diff #13459)

I'm trying to fix this but it's hard to diagnose with DDB being unusable at this stage in boot thanks to USB "modernization." If you know what values I should pass that would be great. It may well require jhb's SMP startup changes.

kmacy updated this revision to Diff 13478.Feb 19 2016, 4:44 AM

move initialization earlier in boot for the benefit of statically linked drivers

kmacy added inline comments.Feb 19 2016, 9:54 PM
sys/conf/files.amd64
239 ↗(On Diff #13478)

Actually, the initial version that goes in probably won't look like this. There will be the "legacy" version as the default and the alternative iflib version.

sys/net/iflib.c
426 ↗(On Diff #13478)

I think this is fixed by my latest iflib and taskq changes.

gallatin added inline comments.Feb 20 2016, 3:14 AM
sys/net/iflib.c
1662 ↗(On Diff #13478)

Settiung of the hwassist bits seems to have been removed from the drivers (at least ixl), and does not seem to appear anywhere else in iflib. Should it also appear in iflib_init_locked()?

To follow up on my last comment, fixing this (patch appended) seems to result in the kernel actually sending down packets with csum offload, which then causes the ixl NIC to stop passing traffic, and to die occasionally with: "ixl0: Malicious Driver Detection event 0x02 on TX queue 5 pf number 0x00"

diff --git a/sys/net/iflib.c b/sys/net/iflib.c
--- a/sys/net/iflib.c
+++ b/sys/net/iflib.c
@@ -1681,6 +1681,7 @@ static void
 iflib_init_locked(if_ctx_t ctx)
 {
        if_softc_ctx_t sctx = &ctx->ifc_softc_ctx;
+       if_t ifp = ctx->ifc_ifp;
        iflib_fl_t fl;
        iflib_txq_t txq;
        iflib_rxq_t rxq;
@@ -1697,6 +1698,15 @@ iflib_init_locked(if_ctx_t ctx)
                iflib_netmap_rxq_init(ctx, rxq);
        }
 
+
+       /* Set hardware offload abilities */
+       if_clearhwassist(ifp);
+       if (if_getcapenable(ifp) & IFCAP_TXCSUM)
+               if_sethwassistbits(ifp, CSUM_TCP | CSUM_UDP, 0);
+       if (if_getcapenable(ifp) & IFCAP_TSO4)
+               if_sethwassistbits(ifp, CSUM_TSO, 0);
+       if (if_getcapenable(ifp) & IFCAP_TXCSUM_IPV6)
+               if_sethwassistbits(ifp,  (CSUM_TCP_IPV6 | CSUM_UDP_IPV6), 0);
        IFDI_INIT(ctx);
        for (i = 0, rxq = ctx->ifc_rxqs; i < sctx->isc_nqsets; i++, rxq++) {
                for (j = 0, fl = rxq->ifr_fl; j < rxq->ifr_nfl; j++, fl++) {

We still seem to be missing the device_register api required by iflib. Eg:

diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m
index a5319c6..18d9fd7 100644
--- a/sys/kern/device_if.m
+++ b/sys/kern/device_if.m
@@ -62,6 +62,11 @@ CODE {
        {
            return 0;
        }
+
+       static void *null_register(device_t dev)
+       {
+           return NULL;
+       }
 };
        
 /**
@@ -316,3 +321,7 @@ METHOD int resume {
 METHOD int quiesce {
        device_t dev;
 } DEFAULT null_quiesce;
+
+METHOD void * register {
+       device_t dev;
+} DEFAULT null_register;
gallatin added inline comments.Feb 22 2016, 7:06 PM
sys/net/iflib.c
426 ↗(On Diff #13478)

I did a fresh git clone of current and arc patch today (after walking the tree back prior to the last commit to sys/bus.h so that it would apply), and I'm still seeing panic on boot when ixl is built into the kernel.

I noticed that the taskq change is still using SI_ORDER_SMP.

1503 ↗(On Diff #13478)

Actually, you DO indeed need INVARIANTS around this. W/o Invariants, "delta" triggers an unused warning, killing the build. Sorry.

Hmm.. Even when loading the module at boot on -current, I'm still seeing a panic:

ixl0: using 28 queues
ixl0: Using MSIX interrupts with 29 vectors
ixl0: allocated for 28 queues

Fatal trap 12: page fault while in kernel mode
cpuid = 18; apic id = 14
fault virtual address = 0x1818
fault code = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff80ad711c
stack pointer = 0x28:0xfffffe3fca9f3fb0
frame pointer = 0x28:0xfffffe3fca9f4020
code segment = base 0x0, limit 0xfffff, type 0x1b

= DPL 0, pres 1, long 1, def32 0, gran 1

processor eflags = interrupt enabled, resume, IOPL = 0
current process = 123 (kldload)
[ thread pid 123 tid 100220 ]
Stopped at taskqgroup_attach+0x4c: lock cmpxchgq %rsi,0x1818(%r15)
db> bt
Tracing pid 123 tid 100220 td 0xfffff8013b123000
taskqgroup_attach() at taskqgroup_attach+0x4c/frame 0xfffffe3fca9f4020
iflib_irq_alloc_generic() at iflib_irq_alloc_generic+0x357/frame 0xfffffe3fca9f40b0
ixl_if_msix_intr_assign() at ixl_if_msix_intr_assign+0x66/frame 0xfffffe3fca9f4130iflib_device_register() at iflib_device_register+0x17a9/frame 0xfffffe3fca9f4470
iflib_device_attach() at iflib_device_attach+0xb7/frame 0xfffffe3fca9f44a0
device_attach() at device_attach+0x423/frame 0xfffffe3fca9f4500
pci_driver_added() at pci_driver_added+0xed/frame 0xfffffe3fca9f4540
devclass_driver_added() at devclass_driver_added+0x7d/frame 0xfffffe3fca9f4580
devclass_add_driver() at devclass_add_driver+0x11c/frame 0xfffffe3fca9f45c0
module_register_init() at module_register_init+0x177/frame 0xfffffe3fca9f45f0
linker_load_module() at linker_load_module+0xca8/frame 0xfffffe3fca9f4920
kern_kldload() at kern_kldload+0xea/frame 0xfffffe3fca9f4960
sys_kldload() at sys_kldload+0x5b/frame 0xfffffe3fca9f4990
amd64_syscall() at amd64_syscall+0x502/frame 0xfffffe3fca9f4ab0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe3fca9f4ab0

  • syscall (304, FreeBSD ELF64, sys_kldload), rip = 0x80086b83a, rsp = 0x7fffffffe7e8, rbp = 0x7fffffffed00 ---

db>

erj added a comment.Feb 22 2016, 7:43 PM

Hmm.. Even when loading the module at boot on -current, I'm still seeing a panic:

D5205 is supposed to fix this crash.

In D5211#114958, @erj wrote:

Hmm.. Even when loading the module at boot on -current, I'm still seeing a panic:

D5205 is supposed to fix this crash.

That pre-dates iflib; iflib (mostly) obviates that review

Hmm.. Even when loading the module at boot on -current, I'm still seeing a panic:

I can confirm that moving back to SI_SUB_SMP in the DECLARE_MODULE macro fixes it enough to at least work with an ixl module loaded post-boot.

diff --git a/sys/net/iflib.c b/sys/net/iflib.c
index ab5f74c..37d970e 100644
--- a/sys/net/iflib.c
+++ b/sys/net/iflib.c
@@ -421,7 +421,7 @@ static moduledata_t iflib_moduledata = {
        NULL
 };
 
-DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_INIT_IF, SI_ORDER_ANY);
+DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_SMP, SI_ORDER_ANY);
 MODULE_VERSION(iflib, 1);
 
 MODULE_DEPEND(iflib, pci, 1, 1, 1);
kmacy added a comment.Feb 22 2016, 8:13 PM

We still seem to be missing the device_register api required by iflib. Eg:

Sorry, I'm sure it was in an earlier patch. Will put it back up.

Hmm.. Even when loading the module at boot on -current, I'm still seeing a panic:

I can confirm that moving back to SI_SUB_SMP in the DECLARE_MODULE macro fixes it enough to at least work with an ixl module loaded post-boot.

diff --git a/sys/net/iflib.c b/sys/net/iflib.c
 index ab5f74c..37d970e 100644
 --- a/sys/net/iflib.c
 +++ b/sys/net/iflib.c
 @@ -421,7 +421,7 @@ static moduledata_t iflib_moduledata = {
         NULL
  };
  
 -DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_INIT_IF, SI_ORDER_ANY);
 +DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_SMP, SI_ORDER_ANY);
  MODULE_VERSION(iflib, 1);
  
  MODULE_DEPEND(iflib, pci, 1, 1, 1);

Ok. I needed that change to make my system boot with ixgbe linked in to the kernel.

To follow up on my last comment, fixing this (patch appended) seems to result in the kernel actually sending down packets with csum offload, which then causes the ixl NIC to stop passing traffic, and to die occasionally with: "ixl0: Malicious Driver Detection event 0x02 on TX queue 5 pf number 0x00"

The flag setting appears to have gotten lost in a mismerge jumping between branches. The first place I'd look to fix that error would be at my changes to ixl when I removed packet parsing.

kmacy added a comment.Feb 22 2016, 8:15 PM

Hmm.. Even when loading the module at boot on -current, I'm still seeing a panic:

I can confirm that moving back to SI_SUB_SMP in the DECLARE_MODULE macro fixes it enough to at least work with an ixl module loaded post-boot.

diff --git a/sys/net/iflib.c b/sys/net/iflib.c
 index ab5f74c..37d970e 100644
 --- a/sys/net/iflib.c
 +++ b/sys/net/iflib.c
 @@ -421,7 +421,7 @@ static moduledata_t iflib_moduledata = {
         NULL
  };
  
 -DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_INIT_IF, SI_ORDER_ANY);
 +DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_SMP, SI_ORDER_ANY);
  MODULE_VERSION(iflib, 1);
  
  MODULE_DEPEND(iflib, pci, 1, 1, 1);

Uhm ... did you pull in the corresponding update to the task queue patch? That's probably the real problem.

In D5211#114963, @kmacy wrote:

Hmm.. Even when loading the module at boot on -current, I'm still seeing a panic:

I can confirm that moving back to SI_SUB_SMP in the DECLARE_MODULE macro fixes it enough to at least work with an ixl module loaded post-boot.

diff --git a/sys/net/iflib.c b/sys/net/iflib.c
 index ab5f74c..37d970e 100644
 --- a/sys/net/iflib.c
 +++ b/sys/net/iflib.c
 @@ -421,7 +421,7 @@ static moduledata_t iflib_moduledata = {
         NULL
  };
  
 -DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_INIT_IF, SI_ORDER_ANY);
 +DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_SMP, SI_ORDER_ANY);
  MODULE_VERSION(iflib, 1);
  
  MODULE_DEPEND(iflib, pci, 1, 1, 1);

Uhm ... did you pull in the corresponding update to the task queue patch? That's probably the real problem.

Yes.

I think the problem is actually in that file. Specifically, the SYSINIT() in the TASKQGROUP_DEFINE() macro that uses SI_SUB_SMP pretty much makes certain that when the DRIVER_MODULE executes earlier, the taskq stuff will still be NULL. Then that stays cached, and the drivers trip over it.

kmacy added a comment.Feb 22 2016, 8:21 PM
In D5211#114963, @kmacy wrote:

Hmm.. Even when loading the module at boot on -current, I'm still seeing a panic:

I can confirm that moving back to SI_SUB_SMP in the DECLARE_MODULE macro fixes it enough to at least work with an ixl module loaded post-boot.

diff --git a/sys/net/iflib.c b/sys/net/iflib.c
 index ab5f74c..37d970e 100644
 --- a/sys/net/iflib.c
 +++ b/sys/net/iflib.c
 @@ -421,7 +421,7 @@ static moduledata_t iflib_moduledata = {
         NULL
  };
  
 -DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_INIT_IF, SI_ORDER_ANY);
 +DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_SMP, SI_ORDER_ANY);
  MODULE_VERSION(iflib, 1);
  
  MODULE_DEPEND(iflib, pci, 1, 1, 1);

Uhm ... did you pull in the corresponding update to the task queue patch? That's probably the real problem.

Yes.

I think the problem is actually in that file. Specifically, the SYSINIT() in the TASKQGROUP_DEFINE() macro that uses SI_SUB_SMP pretty much makes certain that when the DRIVER_MODULE executes earlier, the taskq stuff will still be NULL. Then that stays cached, and the drivers trip over it.

No. The updated taskq patch doesn't use SI_SUB_SMP.

In D5211#114965, @kmacy wrote:

No. The updated taskq patch doesn't use SI_SUB_SMP.

There are 2 patches in that review. The second patch updates 2 SYSINITS to use the IF order, but leaves one at SMP.

kmacy added a comment.Feb 22 2016, 8:28 PM
In D5211#114965, @kmacy wrote:

No. The updated taskq patch doesn't use SI_SUB_SMP.

There are 2 patches in that review. The second patch updates 2 SYSINITS to use the IF order, but leaves one at SMP.

Right. Adjust updates the number of threads once SMP comes on line. Are you saying that that's still a problem?

It's odd that ixgbe works with the change. Although I admit I haven't tried ixgbe as a module with that change.

Right. Adjust updates the number of threads once SMP comes on line. Are you saying that that's still a problem?

Honestly, I haven't thought much about it. I'm just trying to get things to not crash so that I can try to get ixl working in a controlled environment before I test it for real. All I can say for sure is that the patch, as it stands today, goes "kaboom" with both ixl compiled as a module, and with it compiled into the kernel.

I'll try to diagnose this after I get ixl not to barf on the descriptors we're sending down to it when csum offload is enabled.

kmacy updated this revision to Diff 13616.Feb 22 2016, 8:36 PM

Add missed device_if.m changes

kmacy updated this revision to Diff 13618.Feb 22 2016, 10:46 PM
  • set hwassist flags in init
  • set tcp_hlen even without TSO
kmacy updated this revision to Diff 13620.Feb 22 2016, 11:32 PM
  • include opt_inet.h & opt_inet6.h
  • make enabled code compile
  • fix !INVARIANTS warning

Please note that the complete set of changes against HEAD is maintained in the HEAD_MERGE/iflib branch at https://github.com/NextBSD/NextBSD.git

kmacy updated this revision to Diff 13625.Feb 23 2016, 1:42 AM

fix tcp header assignment in header parsing

kmacy updated this revision to Diff 13966.Mar 2 2016, 2:09 AM
  • TSO fixes
  • serialize doorbell updates
  • extend tx pkt_info format for bxe TSO
  • heavily revise rx pkt info for bxe S/G across rings
  • don't break out of transmit loop on busdma map failure
  • report internal state and error statistics via sysctl
  • also check if there are more credits available when deciding to drain
  • re-merge against HEAD
kmacy updated this revision to Diff 14002.Mar 3 2016, 12:25 AM
  • Restore small packet rx optimization
  • simplify tx db locking
kmacy updated this revision to Diff 16239.May 12 2016, 6:57 AM
  • Mbuf free optimizations
  • bus_get_cpus change
  • fix memory corruptions caused by bxe and an probe routines
  • support for architectures without 32-bit atomics
  • taskqueue group fixes
  • support for drivers that don't know what their msix-bar is and so map everything
  • changes to private ioctl interface
  • sysctl extensions
  • lots of good stuff

Are the changes to if_an, if_bxe, and kern_mbuf.c intentional?

The change to kern_mbuf allows iflib to avoid gratuitous memory touches on free. The other driver changes are_not_ optional as they fix use after free memory corruptions in those drivers' probe routines that have hit iflib on driver reload (unload and then reload a driver kmod).

You can leave out the mbuf changes in the initial commit I guess. They also rerequire adding defines to mbuf.h which I missed.

scottl removed a subscriber: scottl.May 12 2016, 4:40 PM
scottl added a subscriber: scottl.
scottl added a subscriber: davidcs.May 12 2016, 4:55 PM

I've made the changes to if_an and committed that part of the patch to freebsd-head

sys/dev/an/if_an.c
322 ↗(On Diff #16239)

Prototype needs to be removed from the headers.

sys/dev/an/if_an_pci.c
152 ↗(On Diff #16239)

This line can be moved into an_probe_pci() instead of duplicating the id list search

159 ↗(On Diff #16239)

It is gratuitous, but it's also a safety pattern that everyone follows,

I need the changes to mbuf.h

kmacy updated this revision to Diff 16259.May 12 2016, 6:31 PM

add missed change to mbuf.h

kmacy added a comment.May 12 2016, 6:31 PM

Sorry. Thought you'd just remove references to that flag.

kmacy updated this revision to Diff 16260.May 12 2016, 6:55 PM

missed changes to sys

kmacy updated this revision to Diff 16261.May 12 2016, 6:57 PM

accidentally lost the driver probe fixes

kmacy updated this revision to Diff 16263.May 12 2016, 7:02 PM

remove unintended header changes

scottl added inline comments.May 12 2016, 8:53 PM
sys/dev/bxe/bxe.c
2495 ↗(On Diff #16263)

This should work, device_set_desc_copy() is intended for this usage pattern. Moving it out to bxe_attach() is the wrong thing, IMHO.

kmacy added inline comments.May 12 2016, 9:46 PM
sys/dev/bxe/bxe.c
2495 ↗(On Diff #16263)

No. Unless you know with 100% certainty that the driver will subsequently attach (which you can't as you might have another driver that eats device nodes to facilitate passthrough) you should never ever touch the device node in probe. Probe is purely yay or nay.

That's not true. device_set_desc and friends are specifically designed to be operated on in device_probe.

kmacy added a comment.May 12 2016, 9:52 PM

Maybe at one point it worked. It causes use after frees now.

kmacy added a comment.May 12 2016, 9:55 PM

In any event. I don't see any point in setting the description until attach. It may be the freeing of descbuf isn't permitted, but in that case you'd have a leak if it didn't subsequently attach. This is not a robust interface.

kmacy updated this revision to Diff 16267.May 12 2016, 9:58 PM

Add conf changes to enable iflib to build

gallatin accepted this revision.May 15 2016, 1:07 AM
gallatin edited edge metadata.
This revision is now accepted and ready to land.May 15 2016, 1:07 AM
mizhka_gmail.com added inline comments.
sys/net/mp_ring.c
44 ↗(On Diff #16263)

MIPS buildkernel failed with:
mp_ring.c:204: warning: implicit declaration of function 'atomic_cmpset_64'

Could you please add support for non-x86 platforms (arm, mips32)?

Thanks!

kmacy added a comment.May 18 2016, 8:22 AM

There is a directive for platforms without 64-bit atomics. Such arches
should put that in their param.h, sorry for the oversight. A lot of balls
in the air.

-M

Thank you for quick reply!

Another issue is:
src/sys/net/iflib.c: In function '_iflib_fl_refill':
src/sys/net/iflib.c:1538: warning: comparison is always true due to limited range of data type
src/sys/net/iflib.c: In function 'iflib_fl_bufs_free':
src/sys/net/iflib.c:1667: warning: comparison is always true due to limited range of data type

I suppose it's due to unsigned types. But build is failed due to warnings.

Thanks!

kmacy added a comment.May 18 2016, 8:52 AM

Sorry. Send Scott a patch. I can't do anything about it.

-M

Thanks! BTW, it was last compilation issue under MIPS o32.