Page MenuHomeFreeBSD

Summary: widen net_epoch coverage up to all packet processing
ClosedPublic

Authored by glebius on Feb 7 2019, 10:40 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Oct 22, 3:44 AM
Unknown Object (File)
Mon, Oct 20, 12:34 AM
Unknown Object (File)
Sat, Oct 18, 9:44 AM
Unknown Object (File)
Mon, Oct 13, 8:48 AM
Unknown Object (File)
Mon, Oct 13, 8:37 AM
Unknown Object (File)
Mon, Oct 13, 8:37 AM
Unknown Object (File)
Mon, Oct 13, 8:37 AM
Unknown Object (File)
Mon, Oct 13, 8:37 AM

Details

Summary

When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.

However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.

On output path we already enter epoch quite early - in ip_output(),
ip6_output().

This patch does the same for the input path. All ISR processing,
any packet injection to the network stack shall be performed in
net_epoch. Any leaf function that walks network configuration now
asserts epoch.

Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so need to be changed.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 23074
Build 22141: arc lint + arc unit

Event Timeline

This panics my test vm:

panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/netinet6/in6.c:1828
cpuid = 0
time = 1549622344
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x47/frame 0xfffffe00004ef240
vpanic() at vpanic+0x20c/frame 0xfffffe00004ef2a0
panic() at panic+0x43/frame 0xfffffe00004ef300
in6_ifawithifp() at in6_ifawithifp+0x347/frame 0xfffffe00004ef360
ip6_output() at ip6_output+0x2c42/frame 0xfffffe00004ef5f0
nd6_ns_output() at nd6_ns_output+0x46e/frame 0xfffffe00004ef6e0
nd6_dad_starttimer() at nd6_dad_starttimer+0xef/frame 0xfffffe00004ef720
nd6_dad_timer() at nd6_dad_timer+0x34b/frame 0xfffffe00004ef7e0
softclock_call_cc() at softclock_call_cc+0x1d2/frame 0xfffffe00004ef8b0
softclock() at softclock+0xa3/frame 0xfffffe00004ef8f0
ithread_loop() at ithread_loop+0x2f2/frame 0xfffffe00004ef960
fork_exit() at fork_exit+0xb0/frame 0xfffffe00004ef9b0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00004ef9b0

  • trap 0, rip = 0, rsp = 0, rbp = 0 ---

KDB: enter: panic
[ thread pid 12 tid 100037 ]
Stopped at kdb_enter+0x6a: movq $0,kdb_why
db>

I don't think I have anything unusual set. I do have 'net.fibs=3' in loader.conf, for the regression tests.

I'm pretty sure that ipv6_activate_all_interfaces="YES" in /etc/rc.conf is the trigger for the previous panic.
That also matches nicely with the backtrace. Presumably nd6_dad_timer() doesn't enter the NET_EPOCH.

The bad news is that without it I see a different panic:

login: panic: malloc(M_WAITOK) in epoch context
cpuid = 7
time = 1549638751
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x47/frame 0xfffffe009dbca2e0
vpanic() at vpanic+0x20c/frame 0xfffffe009dbca340
panic() at panic+0x43/frame 0xfffffe009dbca3a0
malloc_dbg() at malloc_dbg+0x193/frame 0xfffffe009dbca3d0
malloc() at malloc+0x2a/frame 0xfffffe009dbca440
inp_findmoptions() at inp_findmoptions+0x71/frame 0xfffffe009dbca490
inp_setmoptions() at inp_setmoptions+0x1ec3/frame 0xfffffe009dbca680
ip_ctloutput() at ip_ctloutput+0x826/frame 0xfffffe009dbca6e0
udp_ctloutput() at udp_ctloutput+0x169/frame 0xfffffe009dbca730
sosetopt() at sosetopt+0x115/frame 0xfffffe009dbca7d0
kern_setsockopt() at kern_setsockopt+0x158/frame 0xfffffe009dbca850
sys_setsockopt() at sys_setsockopt+0x33/frame 0xfffffe009dbca880
amd64_syscall() at amd64_syscall+0x436/frame 0xfffffe009dbca9b0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe009dbca9b0

  • syscall (105, FreeBSD ELF64, sys_setsockopt), rip = 0x8007e60fa, rsp = 0x7fffffffe078, rbp = 0x7fffffffe0f0 ---

KDB: enter: panic
[ thread pid 768 tid 100473 ]
Stopped at kdb_enter+0x6a: movq $0,kdb_why
db>

  • Enter epoch in DAD timer.
  • Revert entering epoch for setsockopt. This needs to be fine grained.

This one boots, but panics when I kldload pfsync:

login: panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/netpfil/pf/pf_if.c:465
cpuid = 6
time = 1549918230
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe009dcb0f10
vpanic() at vpanic+0x1b4/frame 0xfffffe009dcb0f70
panic() at panic+0x43/frame 0xfffffe009dcb0fd0
pfi_kif_update() at pfi_kif_update+0x29b/frame 0xfffffe009dcb1030
pfi_initialize_vnet() at pfi_initialize_vnet+0x211/frame 0xfffffe009dcb1060
vnet_pf_init() at vnet_pf_init+0x153/frame 0xfffffe009dcb1090
vnet_register_sysinit() at vnet_register_sysinit+0xfd/frame 0xfffffe009dcb10c0
linker_load_module() at linker_load_module+0xb78/frame 0xfffffe009dcb13e0
linker_load_dependencies() at linker_load_dependencies+0x2fd/frame 0xfffffe009dcb1430
link_elf_load_file() at link_elf_load_file+0x1107/frame 0xfffffe009dcb14f0
linker_load_module() at linker_load_module+0x982/frame 0xfffffe009dcb1810
kern_kldload() at kern_kldload+0xef/frame 0xfffffe009dcb1860
sys_kldload() at sys_kldload+0x5b/frame 0xfffffe009dcb1890
amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe009dcb19b0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe009dcb19b0

  • syscall (304, FreeBSD ELF64, sys_kldload), rip = 0x8002db96a, rsp = 0x7fffffffe5a8, rbp = 0x7fffffffeb20 ---

KDB: enter: panic
[ thread pid 911 tid 100460 ]
Stopped at kdb_enter+0x3b: movq $0,kdb_why
db>

I've got a quick patch for that. I'm not completely happy with it, but it does at least get us a bit further:

diff --git a/sys/netpfil/pf/pf_if.c b/sys/netpfil/pf/pf_if.c
index f187db2a224..12f3728489a 100644
--- a/sys/netpfil/pf/pf_if.c
+++ b/sys/netpfil/pf/pf_if.c
@@ -117,14 +117,18 @@ pfi_initialize_vnet(void)
        struct ifg_group *ifg;
        struct ifnet *ifp;
        struct pfi_kif *kif;
+       struct epoch_tracker et;

        V_pfi_buffer_max = 64;
        V_pfi_buffer = malloc(V_pfi_buffer_max * sizeof(*V_pfi_buffer),
            PFI_MTYPE, M_WAITOK);

        kif = malloc(sizeof(*kif), PFI_MTYPE, M_WAITOK);
+       NET_EPOCH_ENTER(et);
        PF_RULES_WLOCK();
+
        V_pfi_all = pfi_kif_attach(kif, IFG_ALL);
+
        PF_RULES_WUNLOCK();

        IFNET_RLOCK();
@@ -133,6 +137,7 @@ pfi_initialize_vnet(void)
        CK_STAILQ_FOREACH(ifp, &V_ifnet, if_link)
                pfi_attach_ifnet(ifp);
        IFNET_RUNLOCK();
+       NET_EPOCH_EXIT(et);
 }
@@ -320,7 +320,9 @@ pfi_attach_ifnet(struct ifnet *ifp)
 {
        struct pfi_kif *kif;

-       kif = malloc(sizeof(*kif), PFI_MTYPE, M_WAITOK);
+       kif = malloc(sizeof(*kif), PFI_MTYPE, M_NOWAIT);
+       if (! kif)
+               return; /* XXX This is bad, but we can't M_WAIT in NET_EPOCH */

        PF_RULES_WLOCK();
        V_pfi_update++;
@@ -340,7 +342,9 @@ pfi_attach_ifgroup(struct ifg_group *ifg)
 {
        struct pfi_kif *kif;

-       kif = malloc(sizeof(*kif), PFI_MTYPE, M_WAITOK);
+       kif = malloc(sizeof(*kif), PFI_MTYPE, M_NOWAIT);
+       if (! kif)
+               return; /* XXX. Kinda bad, but we have to hold net epoch here and can't sleep. */

        PF_RULES_WLOCK();
        V_pfi_update++;

After that there's another panic:

panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/netpfil/pf/pf_if.c:474
cpuid = 5
time = 1549920875
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe009dc90230
vpanic() at vpanic+0x1b4/frame 0xfffffe009dc90290
panic() at panic+0x43/frame 0xfffffe009dc902f0
pfi_kif_update() at pfi_kif_update+0x29b/frame 0xfffffe009dc90350
pfi_attach_ifnet() at pfi_attach_ifnet+0x8d/frame 0xfffffe009dc90370
if_attach_internal() at if_attach_internal+0x4a6/frame 0xfffffe009dc903c0
pfsync_clone_create() at pfsync_clone_create+0x13b/frame 0xfffffe009dc90400
if_clone_createif() at if_clone_createif+0xb5/frame 0xfffffe009dc90450
if_clone_simple() at if_clone_simple+0x8e/frame 0xfffffe009dc904a0
vnet_pfsync_init() at vnet_pfsync_init+0x25/frame 0xfffffe009dc904c0
vnet_register_sysinit() at vnet_register_sysinit+0xfd/frame 0xfffffe009dc904f0
linker_load_module() at linker_load_module+0xb78/frame 0xfffffe009dc90810
kern_kldload() at kern_kldload+0xef/frame 0xfffffe009dc90860
sys_kldload() at sys_kldload+0x5b/frame 0xfffffe009dc90890
amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe009dc909b0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe009dc909b0

  • syscall (304, FreeBSD ELF64, sys_kldload), rip = 0x8002db96a, rsp = 0x7fffffffe5a8, rbp = 0x7fffffffeb20 ---

That one seems like a more generic problem. Wouldn't we expect to hold NET_EPOCH in if_clone or if_attach?
A very similar panic happens if I only load pf (so not pfsync) and run the pf tests, when it creates an epair interface in the vnet jail.

The plan is to not be in epoch during full ifattach/ifclone event since these events have many places that allocate memory. Needs to be more fine grained. I'll ponder more over that.

  • In pfi_initialize_vnet() rely on IFNET_RLOCK() which is sx(9) for stability of ifnet and ifgroup lists. First allocate enough pfi_kif structs, then enter epoch and attach them.
  • pfi_get_ifaces() NET_EPOCH_ENTER since pfi_skip_if() traverses ifgroup list.

New panic, I suspect while an interface is being moved into or out of a vnet jail.
The pf tests seem to trigger a lot of edge cases, so you may want to see if you can get them to run too. They don't need much setup (basically, pkg install scapy kyua, then kldload pfsync ; cd /usr/tests/sys/netpfil/pf ; sudo kyua test).

Unread portion of the kernel message buffer:
panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/net/if.c:967
cpuid = 5
time = 1550749763
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0091c8e3e0
vpanic() at vpanic+0x1b4/frame 0xfffffe0091c8e440
panic() at panic+0x43/frame 0xfffffe0091c8e4a0
if_purgeaddrs() at if_purgeaddrs+0x248/frame 0xfffffe0091c8e530
if_detach_internal() at if_detach_internal+0x7e6/frame 0xfffffe0091c8e5b0
if_vmove() at if_vmove+0x3c/frame 0xfffffe0091c8e600
ifhwioctl() at ifhwioctl+0xe18/frame 0xfffffe0091c8e680
ifioctl() at ifioctl+0x424/frame 0xfffffe0091c8e750
kern_ioctl() at kern_ioctl+0x29b/frame 0xfffffe0091c8e7c0
sys_ioctl() at sys_ioctl+0x15d/frame 0xfffffe0091c8e890
amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe0091c8e9b0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0091c8e9b0

  • syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x80048633a, rsp = 0x7fffffffe1f8, rbp = 0x7fffffffe230 ---

KDB: enter: panic

curthread () at ./machine/pcpu.h:230
230
asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (OFFSETOF_CURTHREAD));
(kgdb)

Update to cover panics discovered by pf test suite.

panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/netinet6/nd6_rtr.c:874
cpuid = 3
time = 1552657485
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0091c86300
vpanic() at vpanic+0x19d/frame 0xfffffe0091c86350
panic() at panic+0x43/frame 0xfffffe0091c863b0
defrouter_select_fib() at defrouter_select_fib+0x52a/frame 0xfffffe0091c86460
defrouter_del() at defrouter_del+0x15f/frame 0xfffffe0091c864a0
nd6_purge() at nd6_purge+0x1af/frame 0xfffffe0091c864f0
if_detach_internal() at if_detach_internal+0x7f6/frame 0xfffffe0091c86570
if_detach() at if_detach+0x3d/frame 0xfffffe0091c86590
epair_clone_destroy() at epair_clone_destroy+0x98/frame 0xfffffe0091c865e0
if_clone_destroyif() at if_clone_destroyif+0x175/frame 0xfffffe0091c86630
if_clone_destroy() at if_clone_destroy+0x205/frame 0xfffffe0091c86680
ifioctl() at ifioctl+0x3de/frame 0xfffffe0091c86750
kern_ioctl() at kern_ioctl+0x28a/frame 0xfffffe0091c867c0
sys_ioctl() at sys_ioctl+0x15d/frame 0xfffffe0091c86890
amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe0091c869b0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0091c869b0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8004852ca, rsp = 0x7fffffffe228, rbp = 0x7fffffffe240 ---
KDB: enter: panic
[ thread pid 93197 tid 100466 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db>

During the sys/netinet/fibs_test:slaac_on_nondefault_fib6 test, I believe.

  • WIP: compiles, boots in VM
  • Enter epoch in DAD timer.
  • Revert entering epoch for setsockopt. This needs to be fine grained.
  • - In pfi_initialize_vnet() rely on IFNET_RLOCK() which is sx(9) for
  • Revert if_purgeaddrs() change. This function is called in control path.
  • Enter epoch for duration of ip6_output() just as we do for IPv4.
  • Enter epoch in route_output().
  • Revert this branch changes to in6_joingroup_locked(). Function can
  • Another ND6 timer.
  • Enter epoch before calling nd6_prefix_onlink() and nd6_lookup().
  • pfi_kif_update() now requires epoch.
  • Resolve rebase glitch.
  • Revert back defrouter_select_fib().
  • Fix few lock order issues.
  • Revert changes to ifa_maintain_loopback_route().
  • Merge remote-tracking branch 'FreeBSD/master' into net_epoch
  • Merge branch 'epoch-trace' into net_epoch
  • Merge branch 'epoch-trace' into net_epoch
  • Merge branch 'master' into net_epoch
  • Catch up on c6f2373c081e1431319aefa29218d59c1d01ff30.
  • Fix mismerge.
  • Merge remote-tracking branch 'FreeBSD/master' into net_epoch
  • Restore whitespace to what we have in head.
  • Merge remote-tracking branch 'FreeBSD/master' into net_epoch

I'd like to go forward with this change. It will bring a number of new epoch recursions, that don't exist today, but I want to resolve them separately, one per subsystem, otherwise the changeset will get bigger and more messy. I need your review, guys. :)

gallatin added inline comments.
sys/net/if_ethersubr.c
807

Rather than here, can we enter the net epoch in intr_event_execute_handlers(), so that there is just one epoch acquisition per interrupt of type INTR_NET?

Drivers that do something weird with taskqueues rather than using ithreads can be fixed to handle this as well.

This revision is now accepted and ready to land.Sep 26 2019, 11:26 PM

I'd like to go forward with this change. It will bring a number of new epoch recursions, that don't exist today, but I want to resolve them separately, one per subsystem, otherwise the changeset will get bigger and more messy. I need your review, guys. :)

I should be able to look at this again a bit more in detail next week Wednesday/Thursday, if that is still okay with you? It's a rather large change and as you found the devil is in the detail, so I don't just want to scroll through.

I kept wondering if the problem in the end is very much matching VNETs, setting them on the edges and dealing with recursion...

sys/net/if_ethersubr.c
807

Funny you accepted the entire change yet had a change request here, @gallatin ...

@gallatin: Network drivers can also use worker threads instead of IRQ's to process packets. Especially USB network devices.

@gleb: Why are you not patching: uether_rxflush() ?

sys/net/if_ethersubr.c
807

Drew, yes, this is the plan. But that would be a separate change.

  • Enter the net_epoch in uether_rxflush(). While here, rewrite the whole
This revision now requires review to proceed.Sep 27 2019, 7:12 PM
glebius removed a subscriber: gleb.

@hselasky Good catch, thanks.

P.S. gleb@ and glebius@ are different people ;)

@bz That's fine, no hurry. I'm not sure that enter/exit points are 100% matching for VNETs. For example there is nothing wrong for a packet to travel between VNETs within one epoch.

@gallatin: Network drivers can also use worker threads instead of IRQ's to process packets. Especially USB network devices.

One solution would be for drivers to set a flag to indicate they have already taken an epoch, that would let weird stuff that uses taskqueues or threads still work but allow high performance things to take the epoch once per invocation.

sys/net/if_ethersubr.c
807

I'd much rather get this in and improve it inplace than iterate forever with it out of the tree... the perfect is the enemy of the good.

@gallatin yes, I already got a branch that does this. Created it for Olivier Cochard's benchmarking of ipfw. https://github.com/glebius/FreeBSD/tree/net_epoch+IFF_KNOWSEPOCH+pfil

Hi,

I see wrappers are missing for WLAN:

net80211/ieee80211_input.c: ifp->if_input(ifp, m);
net80211/ieee80211_hostap.c: ifp->if_input(ifp, m);

??

sys/contrib/ipfilter/netinet/ip_fil_freebsd.c
735

Should we call ip6_output() within an epoch before we return? i.e.,
struct epoch_tracker et;
int error = 0;
...
NET_EPOCH_ENTER(et);
error = ip6_output(...);
NET_EPOCH_EXIT(et);
return(error);

i can test out epoch wrapping around the net80211 stuff but should this be done at the driver level too? The wifi drivers tend to process bursts of frames in a deferred context..

also glebius poke me on irc. I'll go see if I need to rejoin various channels when I get to work. I just updated my freebsd-head testing boxes so I can throw this on a few wifi nics and test..

sys/contrib/ipfilter/netinet/ip_fil_freebsd.c
735

Never mind. I'll need to look at implementing basic functionality first.

Hi,

I see wrappers are missing for WLAN:

net80211/ieee80211_input.c: ifp->if_input(ifp, m);
net80211/ieee80211_hostap.c: ifp->if_input(ifp, m);

??

Nope. For WLAN the if_input == ether_input() which enters epoch (at this stage). Later will be pushed up the driver level.

The ipfilter patch looks good.

This revision is now accepted and ready to land.Oct 1 2019, 7:13 PM
This revision was automatically updated to reflect the committed changes.