Details

Reviewers

sbruno
shurd
jtl

Group Reviewers

Intel Networking

Commits

rS323008: Continuation of lock cleanup in e1000.

Summary

post-cold sleep instead of DELAY when waiting for firmware

The e1000_swfw_sync functions will do a busy wait for hundreds of milliseconds while waiting for firmware to relinquish a hardware resource. This change builds on the previous change converting the softc lock to an sx and changes all waits to sleeps once interrupts are enabled (and thus it's safe to sleep).

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 11382
Build 11743: arc lint + arc unit

Event Timeline

kmacy created this revision.Aug 22 2017, 2:46 AM

Herald added a reviewer: Intel Networking. · View Herald TranscriptAug 22 2017, 2:46 AM

Herald added subscribers: ae, imp. · View Herald Transcript

shurd added inline comments.Aug 22 2017, 4:39 AM

sys/dev/e1000/e1000_82571.c
1002	Should this be a panic()?

update to reflect potential for brokenness

shurd added inline comments.Aug 22 2017, 5:07 AM

sys/dev/e1000/e1000_82571.c
1002	Well, looks non-trivial to actually handle failure cases through em_if_init() -> em_reset(), so better to fail than to carry on after knowing you're wrong.

Can we also get a description of what this fixes, for future reference?

sys/dev/e1000/e1000_82571.c
505	This should be "82574" now.

In D12101#251141, @erj wrote:

Can we also get a description of what this fixes, for future reference?

Two problems:

As we've discussed at length elsewhere, Intel drivers will busy wait for arbitrary lengths of time with a default mutex held when waiting for the firmware to release a resource.
There is a lot of mostly duplicated code with tiny (dubious) variations.

In D12101#251145, @kmacy wrote:

In D12101#251141, @erj wrote:

Can we also get a description of what this fixes, for future reference?

Two problems:

As we've discussed at length elsewhere, Intel drivers will busy wait for arbitrary lengths of time with a default mutex held when waiting for the firmware to release a resource.

There is a lot of mostly duplicated code with tiny (dubious) variations.

Yeah, this has been discussed in emails; I just meant put that info in the summary so other people in the future can get a feel for the reasoning behind this change / look at some discussion without trying to find old emails.

johalun0_gmail.com added a subscriber: johalun0_gmail.com.Aug 23 2017, 7:08 PM

The busy waiting would cause 10% CPU usage on my system when no cable connected to em0 (traced to em_if_update_admin_status).
This patch also fixes this. I can no longer detect any unusual CPU usage.

kmacy added a subscriber: jtl.Aug 23 2017, 8:45 PM

kbowling added a subscriber: gallatin.Aug 23 2017, 8:46 PM

kbowling added a subscriber: kbowling.

Looks good.

This revision is now accepted and ready to land.Aug 23 2017, 8:48 PM

@erj I updated the summary - is there anything more I should add?

@jtl any comments?

In D12101#251279, @johalun0_gmail.com wrote:

The busy waiting would cause 10% CPU usage on my system when no cable connected to em0 (traced to em_if_update_admin_status).
This patch also fixes this. I can no longer detect any unusual CPU usage.

Getting a lot of witness output. Like a few per second or so.

Sleeping on "e1000_delay" with the following non-sleepable locks held:
exclusive sleep mutex &dev_spec->swflag_mutex (network driver) r = 0 (0xfffffe00010bd598) locked @ /usr/home/johannes/dev/freebsd/freebsd-base-graphics/sys/dev/e1000/e1000_ich8lan.c:1884
stack backtrace:
#0 0xffffffff80ad3593 at witness_debugger+0x73
#1 0xffffffff80ad49be at witness_warn+0x45e
#2 0xffffffff80a7ba1d at _sleep+0x6d
#3 0xffffffff80a7c217 at pause_sbt+0x117
#4 0xffffffff80562a01 at e1000_read_phy_reg_mdic+0xf1
#5 0xffffffff80566887 at __e1000_read_phy_reg_hv+0x147
#6 0xffffffff80564519 at e1000_phy_has_link_generic+0x159
#7 0xffffffff80553485 at e1000_check_for_copper_link_ich8lan+0x35
#8 0xffffffff80535bd1 at em_if_update_admin_status+0x1d1
#9 0xffffffff80b7cfcb at _task_fn_admin+0x1bb
#10 0xffffffff80ab26d9 at gtaskqueue_run_locked+0x129
#11 0xffffffff80ab2468 at gtaskqueue_thread_loop+0x88
#12 0xffffffff80a33a44 at fork_exit+0x84
#13 0xffffffff80edb9ce at fork_trampoline+0xe

In D12101#251426, @johalun0_gmail.com wrote:

In D12101#251279, @johalun0_gmail.com wrote:

The busy waiting would cause 10% CPU usage on my system when no cable connected to em0 (traced to em_if_update_admin_status).
This patch also fixes this. I can no longer detect any unusual CPU usage.

Getting a lot of witness output. Like a few per second or so.

I think this revision depends on D11969. It changes the core lock from a mutex to an sx lock, which would allow sleeping while its held.

In D12101#251488, @erj wrote:

In D12101#251426, @johalun0_gmail.com wrote:

In D12101#251279, @johalun0_gmail.com wrote:

The busy waiting would cause 10% CPU usage on my system when no cable connected to em0 (traced to em_if_update_admin_status).
This patch also fixes this. I can no longer detect any unusual CPU usage.

Getting a lot of witness output. Like a few per second or so.

I think this revision depends on D11969. It changes the core lock from a mutex to an sx lock, which would allow sleeping while its held.

Actually the problem is that I didn't completely eliminate the use of mutexes in the OS independent code. Update forthcoming.

Eliminate remaining mutex use in os independent e1000 code.
Fold in iflib updates.

This revision now requires review to proceed.Aug 24 2017, 10:06 PM

kmacy mentioned this in D11969: refactoring in support of *future* change to cope with slow configuration path on INTC and BRCM drivers.Aug 24 2017, 10:07 PM

In D12101#251537, @kmacy wrote:

In D12101#251488, @erj wrote:

In D12101#251426, @johalun0_gmail.com wrote:

In D12101#251279, @johalun0_gmail.com wrote:

The busy waiting would cause 10% CPU usage on my system when no cable connected to em0 (traced to em_if_update_admin_status).
This patch also fixes this. I can no longer detect any unusual CPU usage.

Getting a lot of witness output. Like a few per second or so.

I think this revision depends on D11969. It changes the core lock from a mutex to an sx lock, which would allow sleeping while its held.

Actually the problem is that I didn't completely eliminate the use of mutexes in the OS independent code. Update forthcoming.

Seem to be fixed now. Not related to this but I noticed I'm getting a lot of *_bus_dmamem_* related witness warnings from iwm before link state is changed to UP. Same thing, non-sleepable lock held.

Huh .... Testing this review by itself this morning. I see the following startup panic:

Sleeping on "e1000_delay" with the following non-sleepable locks held:
exclusive sleep mutex em2:tx(0):callo (em2:tx(0):callo) r = 0 (0xfffff80004175068) locked @ /home/sbruno/bsd/fbsd_head/sys/kern/kern_mutex.c:182
stack backtrace:
#0 0xffffffff80ac85d3 at witness_debugger+0x73
#1 0xffffffff80ac99cf at witness_warn+0x43f
#2 0xffffffff80a708fc at _sleep+0x6c
#3 0xffffffff80a71117 at pause_sbt+0x117
#4 0xffffffff805620e1 at e1000_read_phy_reg_mdic+0xf1
#5 0xffffffff80562b5b at e1000_read_phy_reg_igp+0x5b
#6 0xffffffff80535d6c at em_if_timer+0xcc
#7 0xffffffff80b76279 at iflib_timer+0x69
#8 0xffffffff80a7e375 at softclock_call_cc+0x155
#9 0xffffffff80a7e75c at softclock+0x7c
#10 0xffffffff80a2bed1 at intr_event_execute_handlers+0x91
#11 0xffffffff80a2c5d6 at ithread_loop+0xb6
#12 0xffffffff80a29304 at fork_exit+0x84
#13 0xffffffff80ecf7ce at fork_trampoline+0xe
panic: sleepq_add: td 0xfffff80003d59000 to sleep on wchan 0xffffffff81c7d260 with sleeping prohibited
cpuid = 0
time = 4
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00f55f4730
vpanic() at vpanic+0x19c/frame 0xfffffe00f55f47b0
kassert_panic() at kassert_panic+0x126/frame 0xfffffe00f55f4820
sleepq_add() at sleepq_add+0x34f/frame 0xfffffe00f55f4870
_sleep() at _sleep+0x26c/frame 0xfffffe00f55f4910
pause_sbt() at pause_sbt+0x117/frame 0xfffffe00f55f4960
e1000_read_phy_reg_mdic() at e1000_read_phy_reg_mdic+0xf1/frame 0xfffffe00f55f49b0
e1000_read_phy_reg_igp() at e1000_read_phy_reg_igp+0x5b/frame 0xfffffe00f55f49e0
em_if_timer() at em_if_timer+0xcc/frame 0xfffffe00f55f4a10
iflib_timer() at iflib_timer+0x69/frame 0xfffffe00f55f4a40
softclock_call_cc() at softclock_call_cc+0x155/frame 0xfffffe00f55f4af0
softclock() at softclock+0x7c/frame 0xfffffe00f55f4b20
intr_event_execute_handlers() at intr_event_execute_handlers+0x91/frame 0xfffffe00f55f4b60
ithread_loop() at ithread_loop+0xb6/frame 0xfffffe00f55f4bb0
fork_exit() at fork_exit+0x84/frame 0xfffffe00f55f4bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00f55f4bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 12 tid 100023 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why

Yeah, legacy em(4) device on em2:

em2@pci0:5:4:0: class=0x020000 card=0x10768086 chip=0x10768086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82541GI Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base 0xfebe0000, size 131072, enabled
    bar   [14] = type Memory, range 32, base 0xfebc0000, size 131072, enabled
    bar   [18] = type I/O Port, range 32, base 0xec00, size 64, enabled
    cap 01[dc] = powerspec 2  supports D0 D3  current D0
    cap 07[e4] = PCI-X supports 2048 burst read, 1 split transaction

move more blocking work to deferred context

In D12101#251708, @sbruno wrote:

Huh .... Testing this review by itself this morning. I see the following startup panic:

Sleeping on "e1000_delay" with the following non-sleepable locks held:
exclusive sleep mutex em2:tx(0):callo (em2:tx(0):callo) r = 0 (0xfffff80004175068) locked @ /home/sbruno/bsd/fbsd_head/sys/kern/kern_mutex.c:182
stack backtrace:
#0 0xffffffff80ac85d3 at witness_debugger+0x73
#1 0xffffffff80ac99cf at witness_warn+0x43f
#2 0xffffffff80a708fc at _sleep+0x6c
#3 0xffffffff80a71117 at pause_sbt+0x117
#4 0xffffffff805620e1 at e1000_read_phy_reg_mdic+0xf1
#5 0xffffffff80562b5b at e1000_read_phy_reg_igp+0x5b
#6 0xffffffff80535d6c at em_if_timer+0xcc
#7 0xffffffff80b76279 at iflib_timer+0x69
#8 0xffffffff80a7e375 at softclock_call_cc+0x155
#9 0xffffffff80a7e75c at softclock+0x7c
#10 0xffffffff80a2bed1 at intr_event_execute_handlers+0x91
#11 0xffffffff80a2c5d6 at ithread_loop+0xb6
#12 0xffffffff80a29304 at fork_exit+0x84
#13 0xffffffff80ecf7ce at fork_trampoline+0xe
panic: sleepq_add: td 0xfffff80003d59000 to sleep on wchan 0xffffffff81c7d260 with sleeping prohibited
cpuid = 0
time = 4
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00f55f4730
vpanic() at vpanic+0x19c/frame 0xfffffe00f55f47b0
kassert_panic() at kassert_panic+0x126/frame 0xfffffe00f55f4820
sleepq_add() at sleepq_add+0x34f/frame 0xfffffe00f55f4870
_sleep() at _sleep+0x26c/frame 0xfffffe00f55f4910
pause_sbt() at pause_sbt+0x117/frame 0xfffffe00f55f4960
e1000_read_phy_reg_mdic() at e1000_read_phy_reg_mdic+0xf1/frame 0xfffffe00f55f49b0
e1000_read_phy_reg_igp() at e1000_read_phy_reg_igp+0x5b/frame 0xfffffe00f55f49e0
em_if_timer() at em_if_timer+0xcc/frame 0xfffffe00f55f4a10
iflib_timer() at iflib_timer+0x69/frame 0xfffffe00f55f4a40
softclock_call_cc() at softclock_call_cc+0x155/frame 0xfffffe00f55f4af0
softclock() at softclock+0x7c/frame 0xfffffe00f55f4b20
intr_event_execute_handlers() at intr_event_execute_handlers+0x91/frame 0xfffffe00f55f4b60
ithread_loop() at ithread_loop+0xb6/frame 0xfffffe00f55f4bb0
fork_exit() at fork_exit+0x84/frame 0xfffffe00f55f4bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00f55f4bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 12 tid 100023 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why

Please pull the latest from the master-bogofix branch and try again.

sbruno accepted this revision.Aug 29 2017, 10:30 PM

This revision is now accepted and ready to land.Aug 29 2017, 10:30 PM

Closed by commit rS323008: Continuation of lock cleanup in e1000. (authored by sbruno). · Explain WhyAug 30 2017, 12:20 AM

This revision was automatically updated to reflect the committed changes.

Commit was reverted

incorporate some missing changes from dev branch

Harbormaster completed remote builds in B11368: Diff 32652.Sep 4 2017, 10:45 PM

Update my tree before updating revision

Harbormaster completed remote builds in B11370: Diff 32654.Sep 4 2017, 10:57 PM

shurd mentioned this in D12235: iflib rollup patch..Sep 5 2017, 9:15 PM

Not sure where those head/* directories came from...

Remove extra directory adds

Harbormaster completed remote builds in B11382: Diff 32688.Sep 5 2017, 11:25 PM

So let me try to sum up in my own words what's going on here:

We want to be able to sleep when talking to the hardware, rather than poll
We want to change the driver lock from a mutex to an sx lock to allow sleeping when talking to h/w
We can't just change to an SX lock and be done because the multicast code holds a mutex (and maybe an RW lock) when calling ifioctl
We noticed that the multicast code does not check return values from driver ioctls, so it seems safe to defer it into another context
We decided to defer mutlicast (and promisc, since it can be coming from the multicast code) into another context at the iflib level

While, obviously the best solution would be to refactor the multicast code to avoid holding a lock, I think that I generally agree with the approach, and think it is very clever.

However, if we're going to defer the mutlicast code into another context, can we please consider *NOT* doing it at the iflib level? I'm thinking we could re-work this hack so this works for all drivers if we made an mcast kproc and submitted mutlicast add/del requests to a global mcast work queue. Then the deferred multicast kproc can call down into drivers with no locks held. Then we assert no locks in the ifioctl entry points.

In D12101#264936, @gallatin wrote:

So let me try to sum up in my own words what's going on here:

We want to be able to sleep when talking to the hardware, rather than poll

We want to change the driver lock from a mutex to an sx lock to allow sleeping when talking to h/w

We can't just change to an SX lock and be done because the multicast code holds a mutex (and maybe an RW lock) when calling ifioctl

We noticed that the multicast code does not check return values from driver ioctls, so it seems safe to defer it into another context

We decided to defer mutlicast (and promisc, since it can be coming from the multicast code) into another context at the iflib level

While, obviously the best solution would be to refactor the multicast code to avoid holding a lock, I think that I generally agree with the approach, and think it is very clever.

However, if we're going to defer the mutlicast code into another context, can we please consider *NOT* doing it at the iflib level? I'm thinking we could re-work this hack so this works for all drivers if we made an mcast kproc and submitted mutlicast add/del requests to a global mcast work queue. Then the deferred multicast kproc can call down into drivers with no locks held. Then we assert no locks in the ifioctl entry points.

I'm not sure that that's the way to go. However, we will definitely take it under consideration. I agree that the right thing to do is fix multicast and promiscuous callers. At a casual glance refactoring multicast was not easy, but I will take another look.

Thanks.

• hselasky added a subscriber: • hselasky.Oct 31 2017, 7:35 AM

• hselasky added inline comments.

sys/net/iflib.c
5209	Remember that using SX over MTX affects priority resolution when the mutex is congested ???

• hselasky added inline comments.Oct 31 2017, 7:36 AM

sys/dev/e1000/e1000_osdep.h
88	Why not use pause_mtx() ?

• hselasky added inline comments.Oct 31 2017, 7:38 AM

sys/dev/e1000/e1000_osdep.h
85	These cold checks are wrong. Please remove. Pause() already does this check properly.
94	These cold checks are wrong. Please remove. Pause() already does this check properly.

../dev/usb/usb_util.c:usb_pause_mtx(struct mtx *mtx, int timo)

FYI

I think you'll get a lot less pushback if you serialize the multicast stuff in the stack, rather than the driver framework. This will allow you to put warnings / asserts into all the ioctl entry points above the drivers, so as to lock in the "you can't hold a lock while calling into a driver" rule.

sys/dev/e1000/e1000_osdep.h
88	Because the driver has no idea what lock objects are held by callers to ifioctl. And just dropping a mutex is horrifically unsafe anyway.

When running head at r325049, my home firewall would often hit an LOR in iflib_timer() when it jumped to the hung label. This happened twice a day.

lock order reversal:
 1st 0xfffffe0000f36068 igb1:tx(0):call (igb1:tx(0):call) @ /usr/src/sys/kern/kern_mutex.c:182
 2nd 0xfffff80003b0c150 igb1 (iflib ctx lock) @ /usr/src/sys/net/iflib.c:2139

After applying this patch on top of r325049, the box has been up for 3 days. I'd call that a success.

rstone added a subscriber: rstone.Nov 17 2017, 2:37 PM

rstone added inline comments.

sys/dev/e1000/e1000_osdep.h
104	Is this correct? "msec_delay_irq()" implies to me a delay being used in interrupt context, where sleeping is not safe

Split in to many separate reviews.

sys/net/iflib.c
5209	Not relevant for infrequent long running configuration operations.

swfw_sync DELAY -> sleep conversion
AbandonedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 32688

sys/dev/e1000/e1000_80003es2lan.c

sys/dev/e1000/e1000_82571.c

sys/dev/e1000/e1000_82575.c

sys/dev/e1000/e1000_hw.h

sys/dev/e1000/e1000_i210.h

sys/dev/e1000/e1000_i210.c

sys/dev/e1000/e1000_ich8lan.c

sys/dev/e1000/e1000_mac.h

sys/dev/e1000/e1000_mac.c

sys/dev/e1000/e1000_osdep.h

sys/dev/e1000/if_em.c

sys/net/iflib.h

sys/net/iflib.c

swfw_sync DELAY -> sleep conversionAbandonedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 32688

sys/dev/e1000/e1000_80003es2lan.c

sys/dev/e1000/e1000_82571.c

sys/dev/e1000/e1000_82575.c

sys/dev/e1000/e1000_hw.h

sys/dev/e1000/e1000_i210.h

sys/dev/e1000/e1000_i210.c

sys/dev/e1000/e1000_ich8lan.c

sys/dev/e1000/e1000_mac.h

sys/dev/e1000/e1000_mac.c

sys/dev/e1000/e1000_osdep.h

sys/dev/e1000/if_em.c

sys/net/iflib.h

sys/net/iflib.c

swfw_sync DELAY -> sleep conversion
AbandonedPublic
Actions

Revision Contents
Changeset List