Differential D26324

net: Mitigate vnet / epair cleanup races
ClosedPublic
Actions

Authored by kp on Sep 4 2020, 2:45 PM.

Details

Reviewers

donner

Group Reviewers

network

Commits

rS365457: net: mitigate vnet / epair cleanup races

Summary

There's a race where dying vnets move their interfaces back to their
original vnet, and if_epair cleanup (where deleting one interface also
deletes the other end of the epair). This is commonly triggered by the
pf tests, but also by cleanup of vnet jails.

As we've not yet been able to fix the root cause of the issue work
around the panic by not dereferencing a NULL softc in epair_qflush() and
by not re-attaching DYING interfaces.
This isn't a full fix, but makes a very common panic far less likely.

PR: 244703, 238870

Diff Detail

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 33391
Build 30691: arc lint + arc unit

Event Timeline

kp created this revision.Sep 4 2020, 2:45 PM

Herald added subscribers: melifaro, ae. · View Herald TranscriptSep 4 2020, 2:45 PM

kp requested review of this revision.Sep 4 2020, 2:45 PM

Harbormaster completed remote builds in B33391: Diff 76651.Sep 4 2020, 2:45 PM

While this isn't a correct fix it turns destroying a vnet and epair interface at the same time from something that panics every 5-10 times into something that's relatively safe to to.

I'd like to get this in for 12.2.

kp added subscribers: kevans, bz.Sep 4 2020, 2:46 PM

donner accepted this revision as: donner.Sep 4 2020, 4:00 PM

donner added a subscriber: donner.

donner added inline comments.

sys/net/if.c
1308	Should this moved before the reassign test?

This revision is now accepted and ready to land.Sep 4 2020, 4:00 PM

kevans added inline comments.Sep 5 2020, 1:13 AM

sys/net/if.c
1308	Looking at existing if_reassign implementations, I think I agree. It looks relatively harmless as-is at a cursory glance, but repositioning it would save a little bit of work.

Move DYING test before if_reassign

This revision now requires review to proceed.Sep 5 2020, 9:49 AM

Harbormaster completed remote builds in B33410: Diff 76687.Sep 5 2020, 9:49 AM

You are correct, that would be better.

In the mean time my dev vm has spent the night running the pf tests in a loop without panic. That's a pretty great improvement in usability.

My jail/vnet burn test is still generating a panic:

First, creating 480 jails (a 4GB RAM VM with bird2 installed is enough), here is my script
Then destroy all of them, here is my script

[root@bhyve-VM]~# sh /data/destroy-480-jails.sh
Deleting jail1
Deleting jail2
Sep  7 17:49:37 router bird[62568]: KRT: Error sending route 192.0.0.0/20 to kernel: Address already in use
Sep  7 17:49:37 router bird[62568]: KRT: Error sending route 198.51.100.2/32 to kernel: Address already in use
Deleting jail37 router bird[62568]: Shutdown completed
Sep  7 17:49:37 router bird[83469]: KRT: Error sending route 192.0.0.0/20 to kernel: Address already in use
Sep  7 17:49:37 router bird[83469]: KRT: Error sending route 198.51.100.3/32 to kernel: Address already in use
Deleting jail47 router bird[83469]: Shutdown completed
Sep  7 17:49:37 router bird[8150]: KRT: Error sending route 192.0.0.0/20 to kernel: Address already in use
Sep  7 17:49:37 router bird[8150]: KRT: Error sending route 198.51.100.4/32 to kernel: Address already in use
Deleting jail57 router bird[8150]: Shutdown completed
Sep  7 17:49:37 router bird[26651]: KRT: Error sending route 192.0.0.0/20 to kernel: Address already in use
Sep  7 17:49:37 router bird[26651]: KRT: Error sending route 198.51.100.5/32 to kernel: Address already in use
Sep  7 17:49:37 router bird[26651]: Shutdown completed


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x448
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff809e1ad2
stack pointer           = 0x28:0xfffffe0098195330
frame pointer           = 0x28:0xfffffe00981953b0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 9441 (ifconfig)
trap number             = 12
panic: page fault
cpuid = 0
time = 1599500977
KDB: stack backtrace:
#0 0xffffffff80a44ae5 at kdb_backtrace+0x65
#1 0xffffffff809fb8f2 at vpanic+0x182
#2 0xffffffff809fb763 at panic+0x43
#3 0xffffffff80d4d997 at trap_fatal+0x387
#4 0xffffffff80d4d9ef at trap_pfault+0x4f
#5 0xffffffff80d4d06d at trap+0x27d
#6 0xffffffff80d24378 at calltrap+0x8
#7 0xffffffff809f597f at _rm_rlock_hard+0x38f
#8 0xffffffff80b25cf2 at rtinit+0x2e2
#9 0xffffffff80b3727e at in_scrubprefix+0x29e
#10 0xffffffff80b53cad at rip_ctlinput+0x8d
#11 0xffffffff80a804fc at pfctlinput+0x5c
#12 0xffffffff80afbf3a at if_down+0x12a
#13 0xffffffff80af9c32 at if_detach_internal+0x2c2
#14 0xffffffff80af995e at if_detach+0x2e
#15 0xffffffff82725b91 at epair_clone_destroy+0x81
#16 0xffffffff80b00c88 at if_clone_destroyif+0xd8
#17 0xffffffff80b00b56 at if_clone_destroy+0x196

I'm not surprised there are still cases that panic. Given that this is a mitigation rather than a fix I don't consider that to be a blocker for committing this.

That said, it is useful to have more ways to reproduce this, especially after the mitigation. That'll help validate any real fixes we come up with.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 8 2020, 2:54 PM

Closed by commit rS365457: net: mitigate vnet / epair cleanup races (authored by kp). · Explain Why

This revision was automatically updated to reflect the committed changes.

kp added a commit: rS365457: net: mitigate vnet / epair cleanup races.

Herald added a subscriber: imp. · View Herald TranscriptSep 8 2020, 2:54 PM