Page MenuHomeFreeBSD

Fix for unbalanced EPOCH(9) usage in kernel interrupt handler
ClosedPublic

Authored by hselasky on Feb 3 2020, 3:56 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Nov 23, 5:24 PM
Unknown Object (File)
Fri, Nov 22, 1:52 AM
Unknown Object (File)
Oct 2 2024, 6:11 PM
Unknown Object (File)
Oct 1 2024, 5:07 AM
Unknown Object (File)
Oct 1 2024, 5:03 AM
Unknown Object (File)
Sep 30 2024, 7:09 AM
Unknown Object (File)
Sep 29 2024, 10:17 PM
Unknown Object (File)
Sep 29 2024, 7:58 PM
Subscribers

Details

Summary

The issue is that interrupt handlers are removed via intr_event_execute_handlers() when IH_DEAD is set. The thread removing the interrupt is woken up, and calls intr_event_update(). When this happens, the ie_hflags are cleared and re-built from all the remaining handlers sharing the event. When the last IH_NET handler is removed, the IH_NET flag will be cleared from ih_hflags (or ie_hflags may still be being rebuilt in a different context), and the ithread_execute_handlers() may return with ie_hflags missing IH_NET. So we can end up in a scenario where IH_NET was present before calling ithread_execute_handlers, and is not present at its return, meaning we must cache the need for epoch locally.

This can happen when loading and unloading network drivers.
Also make sure the ie_hflags is not cleared before being updated.

This is a regression issue after r357004.

Backtrace:
panic()
_epoch_enter_preempt() # trying to access epoch tracker on stack of dead thread
ifunit_ref()
ifioctl()
fo_ioctl()
kern_ioctl()
sys_ioctl()
syscallenter()
amd64_syscall()

Sponsored by: Mellanox Technologies

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 29160

Event Timeline

To expand on this, the issue is that interrupt handlers are removed via intr_event_execute_handlers() when IH_DEAD is set. The thread removing the intr is woken up, and he calls intr_event_update(). When this happens, the ie_hflags are cleared and re-built from all the remaining handlers sharing the event. When the last IH_NET handler is removed, the IH_NET flag will be cleared from ih_hflags (or ie_hflags may still be being rebuilt in a different context), and the ithread_execute_handlers() may return with ie_hflags missing IH_NET. So we can end up in a scenario where IH_NET was present before calling ithread_execute_handlers, and is not present at its return, meaning we must cache the need for epoch locally.

This revision is now accepted and ready to land.Feb 3 2020, 4:03 PM

This implicitly depends on atomic_cmpset_acq_int to prevent the compiler from "misbehaving" and generating accesses later. Imo would be best to needs_epoch = atomic_load_int(&ie->ie_hflags) & IH_NET; instead.

@mjg: Thank you. I'll have a look at this again tomorrow.

Use atomic_load_int() where appropriate like suggested by mjg@ .

This revision now requires review to proceed.Feb 4 2020, 8:19 AM
kib added inline comments.
sys/kern/kern_intr.c
1275

Note this access.

sys/kern/kern_intr.c
1275

Can you explain?

jhb@ can you comment if it is safe to access the structure pointed to by "ie" like done in the patch above?

sys/kern/kern_intr.c
1275

I think IT_WAIT flag is set in the free of "ie" case.

I am not sure what is the point of atomic there, except may be preventing compiler from optimizing it out, if that can happen somehow. Otherwise it looks fine to me. I don't remember the interrupt teardown semantics here to say whether use-after-free is possible, but I agree with @gallatin that flags may change in run time, causing epoch leak, so this code must be here.

Looking into this I see one more problem: intr_event_update() called on handler addition/removal first sets ie->ie_hflags = 0 and then traverses through the list of handlers, restoring it. I suppose it creates a time window when the flags are not set, and so handler can be called without expected epoch. It should instead recalculate the new value in local variable and then replace it in one move.

This revision is now accepted and ready to land.Feb 4 2020, 3:23 PM

Mark ie_hflags volatile to simplify use of atomic.
Make sure ie_hflags is not cleared before update.

This revision now requires review to proceed.Feb 4 2020, 3:36 PM

mjg@, mav@, gallatin@: How does the latest patch version look?

This revision is now accepted and ready to land.Feb 4 2020, 3:43 PM
hselasky retitled this revision from Fix for use after free in interrupt handler code to Fix for unbalanced EPOCH(9) usage in kernel interrupt handler.Feb 4 2020, 3:43 PM
hselasky edited the summary of this revision. (Show Details)
sys/kern/kern_intr.c
1247

Does the 'needs_epoch' calculation need to happen inside the loop ?

But in fact I still do not understand this patch, or rather, what is the case when it could happen that IH_NET is not seen after ithread_execute_handlers(). We never clear IH_NET, and the _intr_drain() code does not allow to free handler/thread until the thread acknowledges IT_WAIT.

sys/kern/kern_intr.c
1258

BTW, I am very suspicious to this back-to-back epochs exits and enters. If the thread runs, I do not think that the window where epoch is exited is large enough for the epoch_wait() sleeping thread to notice it.

In D23483#515796, @kib wrote:

But in fact I still do not understand this patch, or rather, what is the case when it could happen that IH_NET is not seen after ithread_execute_handlers(). We never clear IH_NET, and the _intr_drain() code does not allow to free handler/thread until the thread acknowledges IT_WAIT.

See: intr_event_update()

IH_NET can go away when interrupt handlers are unregistered.

sys/kern/kern_intr.c
1247

Not neccessarily.

In D23483#515796, @kib wrote:

But in fact I still do not understand this patch, or rather, what is the case when it could happen that IH_NET is not seen after ithread_execute_handlers(). We never clear IH_NET, and the _intr_drain() code does not allow to free handler/thread until the thread acknowledges IT_WAIT.

See: intr_event_update()

IH_NET can go away when interrupt handlers are unregistered.

intr_event_update() should not be used at all for MSI threaded handlers, right ? So I do not think it was the cause in the bug you are chasing.
And I think intr_event_update() should park the intr thread for the update to be reliable, but this only matters for shared interrupts ?

When interrupt handlers are unregistered, intr thread should be drained first.

In D23483#515820, @kib wrote:
In D23483#515796, @kib wrote:

But in fact I still do not understand this patch, or rather, what is the case when it could happen that IH_NET is not seen after ithread_execute_handlers(). We never clear IH_NET, and the _intr_drain() code does not allow to free handler/thread until the thread acknowledges IT_WAIT.

See: intr_event_update()

IH_NET can go away when interrupt handlers are unregistered.

intr_event_update() should not be used at all for MSI threaded handlers, right ? So I do not think it was the cause in the bug you are chasing.
And I think intr_event_update() should park the intr thread for the update to be reliable, but this only matters for shared interrupts ?

intr_event_update() is used for all kinds of hardware interrupts.

When interrupt handlers are unregistered, intr thread should be drained first.

They have some magic flags to do that, see IT_WAIT. The only problem is this is all done inside the EPOCH(9) section, so if the thread waiting for the configuration to complete runs first, the race happens the IH_NET flag is cleared while inside the net EPOCH section.

sys/kern/kern_intr.c
1247

yea. The IM_NET bit will never turn on after this point due to another thread. It may turn off, which is what this race is fixing.

However, I think this needs an atomic_load of some flavor hear is safer than making ie_hflags volatile.

sys/sys/interrupt.h
122

why?

sys/sys/interrupt.h
122

I can use atomic op's aswell. Just thought volatile was simpler.

Then I need both load and store to be atomic? Which variant of _rel or _acq should I use for the store/fetch?

What we really care about is not re-ordering the load so that we ensure it happens prior to the loop. So I think you want an atomic with acquire semantics:

"the operation must have completed before any subsequent load or store (by program order) is performed"

To be honest, we may have gotten off in the weeds on this. There are other precedents of code which does similar things without atomics or volatiles. See the lro_enabled variable in iflib_rxeof() for example.

sys/kern/kern_intr.c
1247

Why won't it ever turn on? What if a new network interrupt is added to the interrupt source that already has existing non-network interrupt handler?

In D23483#515820, @kib wrote:
In D23483#515796, @kib wrote:

But in fact I still do not understand this patch, or rather, what is the case when it could happen that IH_NET is not seen after ithread_execute_handlers(). We never clear IH_NET, and the _intr_drain() code does not allow to free handler/thread until the thread acknowledges IT_WAIT.

See: intr_event_update()

IH_NET can go away when interrupt handlers are unregistered.

intr_event_update() should not be used at all for MSI threaded handlers, right ? So I do not think it was the cause in the bug you are chasing.
And I think intr_event_update() should park the intr thread for the update to be reliable, but this only matters for shared interrupts ?

intr_event_update() is used for all kinds of hardware interrupts.

In which way it is used for MSI ? After registration, since this is the only case where we need to consider changed flags.

When interrupt handlers are unregistered, intr thread should be drained first.

They have some magic flags to do that, see IT_WAIT. The only problem is this is all done inside the EPOCH(9) section, so if the thread waiting for the configuration to complete runs first, the race happens the IH_NET flag is cleared while inside the net EPOCH section.

What we really care about is not re-ordering the load so that we ensure it happens prior to the loop. So I think you want an atomic with acquire semantics:

"the operation must have completed before any subsequent load or store (by program order) is performed"

To be honest, we may have gotten off in the weeds on this. There are other precedents of code which does similar things without atomics or volatiles. See the lro_enabled variable in iflib_rxeof() for example.

acq is not operational without the reciprocal rel, i.e. if load_acq is used thre, release fence must be issued for operation that updates the flags. But indeed I am not sure that we need the fences there at all.

In D23483#516214, @kib wrote:

intr_event_update() is used for all kinds of hardware interrupts.

In which way it is used for MSI ? After registration, since this is the only case where we need to consider changed flags.

Every MSIX interrupt vector has its own structure and intr_event_update() is called when setting up every each and one of those IRQ vectors ????

Or do you mean something else?

--HPS

In D23483#516214, @kib wrote:

intr_event_update() is used for all kinds of hardware interrupts.

In which way it is used for MSI ? After registration, since this is the only case where we need to consider changed flags.

Every MSIX interrupt vector has its own structure and intr_event_update() is called when setting up every each and one of those IRQ vectors ????

Or do you mean something else?

Exactly what you said, so that the hflags should be constant after MSI(-X) interrupt is set up. So again, I am not sure how did you see leaked epoch entry for mlx5en interrupt thread.

Exactly what you said, so that the hflags should be constant after MSI(-X) interrupt is set up. So again, I am not sure how did you see leaked epoch entry for mlx5en interrupt thread.

At shutdown of a MSI(-X) vector this is not the case! Do you see that one? Then hflags are updated.

sys/kern/kern_intr.c
1258

Epoch uses a 32-bit counter, and should notice this.

Exactly what you said, so that the hflags should be constant after MSI(-X) interrupt is set up. So again, I am not sure how did you see leaked epoch entry for mlx5en interrupt thread.

At shutdown of a MSI(-X) vector this is not the case! Do you see that one? Then hflags are updated.

Yes, thank you, this is what I wanted to see.

sys/sys/interrupt.h
122

I believe atomic_load() without any fence is both more correct and not depends on the compiler interpretation of the volatile semantic.

Use atomic_load_int() instead of "volatile" as suggested by kib@ .

This revision now requires review to proceed.Feb 6 2020, 3:48 PM

Can the people subscribed to this patch which think the current version is OK accept it again?

This revision is now accepted and ready to land.Feb 6 2020, 4:21 PM

We should go ahead and fix the bug but I would like to revisit the design.

sys/kern/kern_intr.c
1255

Not part of this review but intr_epoch_batch really looks too high to me although I am interested to know if it often increments at all.

Most NICs now should be on dedicated msix interrupts where they are the only handler. This means that we only re-enter the handler if we re-enable the interrupt and it fires again while we're on the way out to de-schedule the ithread. So this is a small window.

I think it is too large a value because the ithread should be polling and doing as much work as it can on each iteration, probably with some moderation, even if minimal. 1000 iterations could be hundreds of ms. This is really a long time to delay garbage collection.

1258

It is a versioned quiescence, not a lock.