Details

Reviewers

glebius
gallatin
jhb
mav
jeff
kib

Commits

rS357744: Fix for unbalanced EPOCH(9) usage in the generic kernel interrupt

Summary

The issue is that interrupt handlers are removed via intr_event_execute_handlers() when IH_DEAD is set. The thread removing the interrupt is woken up, and calls intr_event_update(). When this happens, the ie_hflags are cleared and re-built from all the remaining handlers sharing the event. When the last IH_NET handler is removed, the IH_NET flag will be cleared from ih_hflags (or ie_hflags may still be being rebuilt in a different context), and the ithread_execute_handlers() may return with ie_hflags missing IH_NET. So we can end up in a scenario where IH_NET was present before calling ithread_execute_handlers, and is not present at its return, meaning we must cache the need for epoch locally.

This can happen when loading and unloading network drivers.
Also make sure the ie_hflags is not cleared before being updated.

This is a regression issue after r357004.

Backtrace:
panic()
_epoch_enter_preempt() # trying to access epoch tracker on stack of dead thread
ifunit_ref()
ifioctl()
fo_ioctl()
kern_ioctl()
sys_ioctl()
syscallenter()
amd64_syscall()

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

• hselasky created this revision.Feb 3 2020, 3:56 PM

Herald added a subscriber: imp. · View Herald TranscriptFeb 3 2020, 3:56 PM

Harbormaster completed remote builds in B29132: Diff 67708.Feb 3 2020, 3:56 PM

To expand on this, the issue is that interrupt handlers are removed via intr_event_execute_handlers() when IH_DEAD is set. The thread removing the intr is woken up, and he calls intr_event_update(). When this happens, the ie_hflags are cleared and re-built from all the remaining handlers sharing the event. When the last IH_NET handler is removed, the IH_NET flag will be cleared from ih_hflags (or ie_hflags may still be being rebuilt in a different context), and the ithread_execute_handlers() may return with ie_hflags missing IH_NET. So we can end up in a scenario where IH_NET was present before calling ithread_execute_handlers, and is not present at its return, meaning we must cache the need for epoch locally.

This revision is now accepted and ready to land.Feb 3 2020, 4:03 PM

This implicitly depends on atomic_cmpset_acq_int to prevent the compiler from "misbehaving" and generating accesses later. Imo would be best to needs_epoch = atomic_load_int(&ie->ie_hflags) & IH_NET; instead.

@mjg: Thank you. I'll have a look at this again tomorrow.

Use atomic_load_int() where appropriate like suggested by mjg@ .

This revision now requires review to proceed.Feb 4 2020, 8:19 AM

Harbormaster completed remote builds in B29154: Diff 67765.Feb 4 2020, 8:19 AM

kib added a subscriber: kib.Feb 4 2020, 8:38 AM

kib added inline comments.

sys/kern/kern_intr.c
1275 ↗	(On Diff #67765)	Note this access.

• hselasky added inline comments.Feb 4 2020, 8:47 AM

sys/kern/kern_intr.c
1275 ↗	(On Diff #67765)	Can you explain?

jhb@ can you comment if it is safe to access the structure pointed to by "ie" like done in the patch above?

@mav:
Can you comment on this change?

• hselasky added inline comments.Feb 4 2020, 11:48 AM

sys/kern/kern_intr.c
1275 ↗	(On Diff #67765)	I think IT_WAIT flag is set in the free of "ie" case.

I am not sure what is the point of atomic there, except may be preventing compiler from optimizing it out, if that can happen somehow. Otherwise it looks fine to me. I don't remember the interrupt teardown semantics here to say whether use-after-free is possible, but I agree with @gallatin that flags may change in run time, causing epoch leak, so this code must be here.

Looking into this I see one more problem: intr_event_update() called on handler addition/removal first sets ie->ie_hflags = 0 and then traverses through the list of handlers, restoring it. I suppose it creates a time window when the flags are not set, and so handler can be called without expected epoch. It should instead recalculate the new value in local variable and then replace it in one move.

mav accepted this revision.Feb 4 2020, 3:23 PM

This revision is now accepted and ready to land.Feb 4 2020, 3:23 PM

Mark ie_hflags volatile to simplify use of atomic.
Make sure ie_hflags is not cleared before update.

This revision now requires review to proceed.Feb 4 2020, 3:36 PM

Harbormaster completed remote builds in B29160: Diff 67772.Feb 4 2020, 3:36 PM

mjg@, mav@, gallatin@: How does the latest patch version look?

Looks good to me.

This revision is now accepted and ready to land.Feb 4 2020, 3:43 PM

• hselasky retitled this revision from Fix for use after free in interrupt handler code to Fix for unbalanced EPOCH(9) usage in kernel interrupt handler.Feb 4 2020, 3:43 PM

• hselasky edited the summary of this revision. (Show Details)

• hselasky edited the summary of this revision. (Show Details)Feb 4 2020, 3:51 PM

kib added inline comments.Feb 4 2020, 4:31 PM

sys/kern/kern_intr.c
1247 ↗	(On Diff #67772)	Does the 'needs_epoch' calculation need to happen inside the loop ?

But in fact I still do not understand this patch, or rather, what is the case when it could happen that IH_NET is not seen after ithread_execute_handlers(). We never clear IH_NET, and the _intr_drain() code does not allow to free handler/thread until the thread acknowledges IT_WAIT.

kib added inline comments.Feb 4 2020, 5:01 PM

sys/kern/kern_intr.c
1258 ↗	(On Diff #67772)	BTW, I am very suspicious to this back-to-back epochs exits and enters. If the thread runs, I do not think that the window where epoch is exited is large enough for the epoch_wait() sleeping thread to notice it.

In D23483#515796, @kib wrote:

But in fact I still do not understand this patch, or rather, what is the case when it could happen that IH_NET is not seen after ithread_execute_handlers(). We never clear IH_NET, and the _intr_drain() code does not allow to free handler/thread until the thread acknowledges IT_WAIT.

See: intr_event_update()

IH_NET can go away when interrupt handlers are unregistered.

sys/kern/kern_intr.c
1247 ↗	(On Diff #67772)	Not neccessarily.

In D23483#515809, @hselasky wrote:

In D23483#515796, @kib wrote:

But in fact I still do not understand this patch, or rather, what is the case when it could happen that IH_NET is not seen after ithread_execute_handlers(). We never clear IH_NET, and the _intr_drain() code does not allow to free handler/thread until the thread acknowledges IT_WAIT.

See: intr_event_update()

IH_NET can go away when interrupt handlers are unregistered.

intr_event_update() should not be used at all for MSI threaded handlers, right ? So I do not think it was the cause in the bug you are chasing.
And I think intr_event_update() should park the intr thread for the update to be reliable, but this only matters for shared interrupts ?

When interrupt handlers are unregistered, intr thread should be drained first.

In D23483#515820, @kib wrote:

In D23483#515809, @hselasky wrote:

In D23483#515796, @kib wrote:

But in fact I still do not understand this patch, or rather, what is the case when it could happen that IH_NET is not seen after ithread_execute_handlers(). We never clear IH_NET, and the _intr_drain() code does not allow to free handler/thread until the thread acknowledges IT_WAIT.

See: intr_event_update()

IH_NET can go away when interrupt handlers are unregistered.

intr_event_update() should not be used at all for MSI threaded handlers, right ? So I do not think it was the cause in the bug you are chasing.
And I think intr_event_update() should park the intr thread for the update to be reliable, but this only matters for shared interrupts ?

intr_event_update() is used for all kinds of hardware interrupts.

When interrupt handlers are unregistered, intr thread should be drained first.

They have some magic flags to do that, see IT_WAIT. The only problem is this is all done inside the EPOCH(9) section, so if the thread waiting for the configuration to complete runs first, the race happens the IH_NET flag is cleared while inside the net EPOCH section.

imp added inline comments.Feb 4 2020, 8:05 PM

sys/kern/kern_intr.c
1247 ↗	(On Diff #67772)	yea. The IM_NET bit will never turn on after this point due to another thread. It may turn off, which is what this race is fixing. However, I think this needs an atomic_load of some flavor hear is safer than making ie_hflags volatile.
sys/sys/interrupt.h
122 ↗	(On Diff #67772)	why?

• hselasky added inline comments.Feb 4 2020, 8:53 PM

sys/sys/interrupt.h
122 ↗	(On Diff #67772)	I can use atomic op's aswell. Just thought volatile was simpler. Then I need both load and store to be atomic? Which variant of _rel or _acq should I use for the store/fetch?

• hselasky added a reviewer: jeff.Feb 5 2020, 1:48 PM

What we really care about is not re-ordering the load so that we ensure it happens prior to the loop. So I think you want an atomic with acquire semantics:

"the operation must have completed before any subsequent load or store (by program order) is performed"

To be honest, we may have gotten off in the weeds on this. There are other precedents of code which does similar things without atomics or volatiles. See the lro_enabled variable in iflib_rxeof() for example.

glebius added inline comments.Feb 5 2020, 5:08 PM

sys/kern/kern_intr.c
1247 ↗	(On Diff #67772)	Why won't it ever turn on? What if a new network interrupt is added to the interrupt source that already has existing non-network interrupt handler?

In D23483#515821, @hselasky wrote:

In D23483#515820, @kib wrote:

In D23483#515809, @hselasky wrote:

In D23483#515796, @kib wrote:

But in fact I still do not understand this patch, or rather, what is the case when it could happen that IH_NET is not seen after ithread_execute_handlers(). We never clear IH_NET, and the _intr_drain() code does not allow to free handler/thread until the thread acknowledges IT_WAIT.

See: intr_event_update()

IH_NET can go away when interrupt handlers are unregistered.

intr_event_update() should not be used at all for MSI threaded handlers, right ? So I do not think it was the cause in the bug you are chasing.
And I think intr_event_update() should park the intr thread for the update to be reliable, but this only matters for shared interrupts ?

intr_event_update() is used for all kinds of hardware interrupts.

In which way it is used for MSI ? After registration, since this is the only case where we need to consider changed flags.

When interrupt handlers are unregistered, intr thread should be drained first.

They have some magic flags to do that, see IT_WAIT. The only problem is this is all done inside the EPOCH(9) section, so if the thread waiting for the configuration to complete runs first, the race happens the IH_NET flag is cleared while inside the net EPOCH section.

In D23483#516125, @gallatin wrote:
What we really care about is not re-ordering the load so that we ensure it happens prior to the loop. So I think you want an atomic with acquire semantics:
"the operation must have completed before any subsequent load or store (by program order) is performed"
To be honest, we may have gotten off in the weeds on this. There are other precedents of code which does similar things without atomics or volatiles. See the lro_enabled variable in iflib_rxeof() for example.

acq is not operational without the reciprocal rel, i.e. if load_acq is used thre, release fence must be issued for operation that updates the flags. But indeed I am not sure that we need the fences there at all.

In D23483#516214, @kib wrote:

intr_event_update() is used for all kinds of hardware interrupts.

In which way it is used for MSI ? After registration, since this is the only case where we need to consider changed flags.

Every MSIX interrupt vector has its own structure and intr_event_update() is called when setting up every each and one of those IRQ vectors ????

Or do you mean something else?

--HPS

In D23483#516232, @hselasky wrote:

In D23483#516214, @kib wrote:

intr_event_update() is used for all kinds of hardware interrupts.

In which way it is used for MSI ? After registration, since this is the only case where we need to consider changed flags.

Every MSIX interrupt vector has its own structure and intr_event_update() is called when setting up every each and one of those IRQ vectors ????

Or do you mean something else?

Exactly what you said, so that the hflags should be constant after MSI(-X) interrupt is set up. So again, I am not sure how did you see leaked epoch entry for mlx5en interrupt thread.

Exactly what you said, so that the hflags should be constant after MSI(-X) interrupt is set up. So again, I am not sure how did you see leaked epoch entry for mlx5en interrupt thread.

At shutdown of a MSI(-X) vector this is not the case! Do you see that one? Then hflags are updated.

sys/kern/kern_intr.c
1258 ↗	(On Diff #67772)	Epoch uses a 32-bit counter, and should notice this.

In D23483#516508, @hselasky wrote:

Exactly what you said, so that the hflags should be constant after MSI(-X) interrupt is set up. So again, I am not sure how did you see leaked epoch entry for mlx5en interrupt thread.

At shutdown of a MSI(-X) vector this is not the case! Do you see that one? Then hflags are updated.

Yes, thank you, this is what I wanted to see.

sys/sys/interrupt.h
122 ↗	(On Diff #67772)	I believe atomic_load() without any fence is both more correct and not depends on the compiler interpretation of the volatile semantic.

Use atomic_load_int() instead of "volatile" as suggested by kib@ .

This revision now requires review to proceed.Feb 6 2020, 3:48 PM

Harbormaster completed remote builds in B29210: Diff 67868.Feb 6 2020, 3:48 PM

Can the people subscribed to this patch which think the current version is OK accept it again?

kib accepted this revision.Feb 6 2020, 4:21 PM

This revision is now accepted and ready to land.Feb 6 2020, 4:21 PM

mav accepted this revision.Feb 6 2020, 4:23 PM

glebius accepted this revision.Feb 6 2020, 11:30 PM

@jeff: Can you accept this change?

We should go ahead and fix the bug but I would like to revisit the design.

sys/kern/kern_intr.c
1256 ↗	(On Diff #67868)	Not part of this review but intr_epoch_batch really looks too high to me although I am interested to know if it often increments at all. Most NICs now should be on dedicated msix interrupts where they are the only handler. This means that we only re-enter the handler if we re-enable the interrupt and it fires again while we're on the way out to de-schedule the ithread. So this is a small window. I think it is too large a value because the ithread should be polling and doing as much work as it can on each iteration, probably with some moderation, even if minimal. 1000 iterations could be hundreds of ms. This is really a long time to delay garbage collection.
1258 ↗	(On Diff #67772)	It is a versioned quiescence, not a lock.

Closed by commit rS357744: Fix for unbalanced EPOCH(9) usage in the generic kernel interrupt (authored by • hselasky). · Explain WhyFeb 10 2020, 8:23 PM

This revision was automatically updated to reflect the committed changes.

• hselasky added a commit: rS357744: Fix for unbalanced EPOCH(9) usage in the generic kernel interrupt.

Fix for unbalanced EPOCH(9) usage in kernel interrupt handler
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 68093

head/sys/kern/kern_intr.c

Fix for unbalanced EPOCH(9) usage in kernel interrupt handlerClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 68093

head/sys/kern/kern_intr.c

Fix for unbalanced EPOCH(9) usage in kernel interrupt handler
ClosedPublic
Actions

Revision Contents
Changeset List