Page MenuHomeFreeBSD

em: fix a null de-reference in em_free_pci_resources
ClosedPublic

Authored by mhorne on Nov 17 2020, 11:58 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 20 2023, 6:54 AM
Unknown Object (File)
Dec 12 2023, 2:18 PM
Unknown Object (File)
Dec 9 2023, 8:56 AM
Unknown Object (File)
Dec 9 2023, 6:57 AM
Unknown Object (File)
Nov 23 2023, 3:31 PM
Unknown Object (File)
Nov 19 2023, 3:58 AM
Unknown Object (File)
Nov 19 2023, 3:54 AM
Unknown Object (File)
Nov 16 2023, 11:59 PM

Details

Summary

Cope with the fact that a failure in iflib_device_register() can result
in em_free_pci_resources() being called after receive queues have
already been freed. In particular, a failure to allocate MSI-X IRQ
resources will goto fail_queues, where IFDI_QUEUES_FREE() will be called
via iflib_tx_structures_free(), preceding the call to IFDI_DETACH().

A similar check is present in ixgbe(4) and ixl(4).

Test Plan

This was found and fixed by NetApp, with the following panic backtrace:

(kgdb-amd64-7.11-693) bt
#0  kdb_enter (why=0xffffffff80cd52fb "panic", msg=<optimized out>) at ../../../../src/sys/kern/subr_kdb.c:514
#1  0xffffffff806a6c6c in vpanic (fmt=0xffffffff80d5a38c "page fault (%s %s %s, %s) on VA %#lx cs:rip %#lx:%#lx rflags %#lx", ap=0xffffffff824f5770) at ../../../../src/sys/kern/kern_shutdown.c:1355
#2  0xffffffff806a5863 in panic (fmt=0xffffffff80fe2bd8 <gdb_consdev> "\250\065\314\200\377\377\377\377\001") at ../../../../src/sys/kern/kern_shutdown.c:1187
#3  0xffffffff80a8f8ee in trap_fatal (frame=0xffffffff824f5990, eva=144) at ../../../../src/sys/amd64/amd64/trap.c:1064
#4  0xffffffff80a8fd82 in trap_pfault (frame=0xffffffff80fe2bd8 <gdb_consdev>, usermode=-2140097504) at ./machine/cpufunc.h:430
#5  0xffffffff80a8f352 in trap (frame=0xffffffff824f5990) at ../../../../src/sys/amd64/amd64/trap.c:578
#6  0xffffffff80a68d97 in <signal handler called> () at ../../../../src/sys/amd64/amd64/exception.S:257
#7  iflib_irq_free (ctx=0xfffff80020cb1c00, irq=0x80) at ../../../../src/sys/net/iflib.c:5885
#8  0xffffffff8042ba1b in em_free_pci_resources (ctx=0xfffff80020cb1c00) at ../../../../src/sys/dev/e1000/if_em.c:2559
#9  0xffffffff80425fd3 in em_if_detach (ctx=0xfffff80020cb1c00) at ../../../../src/sys/dev/e1000/if_em.c:1204
#10 0xffffffff807e37a9 in IFDI_DETACH (_ctx=<optimized out>) at ./ifdi_if.h:55
#11 iflib_device_register (dev=0xfffff80020892e00, sc=<optimized out>, sctx=0xffffffff80fc6ac0 <igb_sctx_init>, ctxp=0xffffffff824f5c40) at ../../../../src/sys/net/iflib.c:4911
#12 0xffffffff807e5167 in iflib_device_attach (dev=0xfffff80020892e00) at ../../../../src/sys/net/iflib.c:4926
#13 0xffffffff806f4cee in DEVICE_ATTACH (dev=0xfffff80020892e00) at ./device_if.h:195
#14 device_attach (dev=0xfffff80020892e00) at ../../../../src/sys/kern/subr_bus.c:3029
#15 0xffffffff806f6291 in device_probe_and_attach (dev=0xfffff80020892e00) at ../../../../src/sys/kern/subr_bus.c:2987
#16 bus_generic_attach (dev=<optimized out>) at ../../../../src/sys/kern/subr_bus.c:3925
#17 0xffffffff804f3052 in pci_attach (dev=0xfffff80020893800) at ../../../../src/sys/dev/pci/pci.c:5443
#18 0xffffffff806f4cee in DEVICE_ATTACH (dev=0xfffff80020893800) at ./device_if.h:195
#19 device_attach (dev=0xfffff80020893800) at ../../../../src/sys/kern/subr_bus.c:3029
#20 0xffffffff806f48b7 in device_probe_and_attach (dev=0xfffff80020893800) at ../../../../src/sys/kern/subr_bus.c:2987
#21 0xffffffff822bad20 in xxxxxxxxxx_attach (dev=0xfffff80020893a00) at xxxxxxxxxx.c:0000
#22 0xffffffff806f4cee in DEVICE_ATTACH (dev=0xfffff80020893a00) at ./device_if.h:195
#23 device_attach (dev=0xfffff80020893a00) at ../../../../src/sys/kern/subr_bus.c:3029
#24 0xffffffff806f6291 in device_probe_and_attach (dev=0xfffff80020893a00) at ../../../../src/sys/kern/subr_bus.c:2987
#25 bus_generic_attach (dev=<optimized out>) at ../../../../src/sys/kern/subr_bus.c:3925
value has been optimized out
(More stack frames follow...)

I was able to reproduce this panic quite easily in a bhyve vm by forcing iflib_legacy_setup() to fail, and verified that this patch avoids the panic.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 34868
Build 31891: arc lint + arc unit

Event Timeline

mhorne created this revision.

I was able to reproduce the issue for myself, so I will commit in a day or two if I don't hear otherwise. If someone can give this a quick second look it would be appreciated.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 2 2020, 5:37 PM
This revision was automatically updated to reflect the committed changes.