Drop ixgbe rx lock during TCP LRO
ClosedPublic
Actions

Authored by kbowling on Jul 24 2017, 7:39 AM.

Details

Reviewers

sbruno
kmacy

Group Reviewers

Intel Networking

Commits

rS321476: Drop ixgbe RX lock during TCP_LRO processing. This eliminates a "storm"

Summary

There is LOR mayhem while holding the NIC's rx queue lock when the upper stack is called inside TCP LRO

Example LOR:

lock order reversal:
 1st 0xfffff80116d4cb90 ix0:rx(9) (ix0:rx(9)) @ /d0/kev/freebsd/sys/dev/ixgbe/ix_txrx.c:1660
 2nd 0xfffff801941dc408 if_lagg rmlock (if_lagg rmlock) @ /d0/kev/freebsd/sys/net/if_lagg.c:1679
stack backtrace:
#0 0xffffffff8097d1d0 at witness_debugger+0x70
#1 0xffffffff8097d0c3 at witness_checkorder+0xe23
#2 0xffffffff80912bb9 at _rm_rlock_debug+0x119
#3 0xffffffff82424ad2 at lagg_input+0x42
#4 0xffffffff80a19708 at ether_nh_input+0x188
#5 0xffffffff80a300f0 at netisr_dispatch_src+0x80
#6 0xffffffff80a18e12 at ether_input+0x62
#7 0xffffffff80a648e5 at tcp_lro_flush+0x1c5
#8 0xffffffff80a64a8b at tcp_lro_flush_all+0x13b
#9 0xffffffff805533a2 at ixgbe_rxeof+0x8a2
#10 0xffffffff805466a7 at ixgbe_msix_que+0x97
#11 0xffffffff808dc6b9 at intr_event_execute_handlers+0x99
#12 0xffffffff808dcd56 at ithread_loop+0xb6
#13 0xffffffff808d9af4 at fork_exit+0x84
#14 0xffffffff80d073ae at fork_trampoline+0xe

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

kbowling created this revision.Jul 24 2017, 7:39 AM

Herald added subscribers: ae, imp. · View Herald TranscriptJul 24 2017, 7:39 AM

swills added a subscriber: swills.Jul 24 2017, 2:42 PM

sbruno added inline comments.Jul 24 2017, 3:57 PM

sys/dev/ixgbe/ix_txrx.c
1901	Don't we need to reaquire the rxr lock at some point further down? I assume after tcp_lro_flush_all(llro).

kbowling added inline comments.Jul 24 2017, 7:55 PM

sys/dev/ixgbe/ix_txrx.c
1901	What are you basing this on? Does lro get frobbed from elsewhere?

sbruno added inline comments.Jul 24 2017, 8:01 PM

sys/dev/ixgbe/ix_txrx.c
1901	I overlooked the fact that we acquire the RX lock at the start of this function, not that it we inherit the lock from the calling function. I do not see tcp_lro_flush_all() being invoked from elsewhere in the driver.

sbruno accepted this revision.Jul 24 2017, 8:02 PM

This revision is now accepted and ready to land.Jul 24 2017, 8:02 PM

To document my understanding.. this is safe because interrupts are disabled while handling rx, including passing work to the 'que_task' if processing limits are present.

Playing lock ping pong is not ideal.. the entire rxeof function should not need to take an rx lock, but the reward of reworking this is low versus getting iflib-ixgbe merged which includes a much better locking model.

What is proposed here is a stopgap that can easily be MFCed.

Closed by commit rS321476: Drop ixgbe RX lock during TCP_LRO processing. This eliminates a "storm" (authored by sbruno). · Explain WhyJul 25 2017, 2:38 PM

This revision was automatically updated to reflect the committed changes.

cramerj_intel.com mentioned this in D11727: ixgbe(4): Update HEAD to use iflib.Jul 25 2017, 4:47 PM