Details

Reviewers

imp
jhb

Commits

rS342170: add support for marking interrupt handlers as suspended

Summary

The goal of this change is to fix a problem with PCI shared interrupts during
suspend and resume.

I have observed a couple of variations of the following scenario. Devices A
and B are on the same PCI bus and share the same interrupt. Device A's driver
is suspended first and the device is powered down. Device B generates an
interrupt. Interrupt handlers of both drivers are called. Device A's interrupt
handler accesses registers of the powered down device and gets back bogus
values (I assume all 0xff). That data is interpreted as interrupt status bits,
etc. So, the interrupt handler gets confused and may produce some noise or
enter an infinite loop, etc.

This change affects only PCI devices. The pci(4) bus driver marks a child's
interrupt handler as suspended after the child's suspend method is called and
before the device is powered down. This is done only for traditional PCI
interrupts, because only they can be shared.

I used rman_get_virtual and rman_set_virtual to associate an interrupt handler
cookie with the corresponding interrupt resource, so that bus drivers could get
the cookie of a child. At present this is done in x86 nexus driver only.
Not sure if this is a good design.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

avg created this revision.Jun 11 2018, 9:36 AM

Harbormaster completed remote builds in B17165: Diff 43588.Jun 11 2018, 9:36 AM

avg mentioned this in D15688: power down devices on pci bus only after having suspended all attached drivers.Jun 11 2018, 9:37 AM

I will appreciate any comments and suggestions on the overall approach, the interrupt code changes and the bus code changes (their design, API extension).
I suspect that there can be a more elegant way to get the desired functionality.

fix inline documentation of suspend_intr and resume_intr bus methods

Harbormaster completed remote builds in B17166: Diff 43589.Jun 11 2018, 9:48 AM

avg added inline comments.Jun 11 2018, 2:12 PM

sys/kern/kern_intr.c
1279 ↗	(On Diff #43589)	I thought that this would be an improvement, but I see some fallout. It looks that the current code (without my change) considers an interrupt to be a stray only if there is no handler installed for it. Is that the correct definition? The change above changed the logic so that the interrupt is considered a stray if no handler acknowledged it. Now I see that during a resume from S3 there is a stray ACPI interrupt, so the interrupt gets masked in the IO-APIC and there are no ACPI interrupts after that (so, e.g. the power button stops working). I also see messages about a stray IRQ for if_alc. Fortunately, that does not seem to affect the device's operation (probably because it's an MSI interrupt).

Generally, I like this path, but I have a couple of questions about locks since I'm having trouble working out the interlock needed.

sys/kern/kern_intr.c
1218 ↗	(On Diff #43589)	What's the locking to prevent the above routines from setting / clearing this bit leading to inconsistent results?
sys/sys/interrupt.h
64 ↗	(On Diff #43589)	This raises the question what lock(s) cover what fields in here?

avg added inline comments.Jun 12 2018, 8:15 AM

sys/kern/kern_intr.c
1218 ↗	(On Diff #43589)	All modifications of `ih_flags` are protected with `ie_lock`, so at the very least, we are protected from modifications stepping on each other and corrupting the flags. Also, the unlocking provides a release barrier for memory modifications. So far so good, I think. The the harder part. The code running in the ISR context does not acquire the lock and does not have any explicit memory barrier. I think that this is not a problem for `IH_DEAD` as the requirement is that it is eventually noticed and acted upon. `IS_SUSP` is less obvious. Maybe `intr_event_suspend_handler` needs to grow logic similar to that in `intr_event_remove_handler`, so that it blocks until a running interrupt thread completes? That would still have the same race with purely fast handlers (aka filters), but I assume that they are considered to be fast enough. Maybe @jhb can chime in.

I suspect it might be useful to get an ack from the interrupt thread that it sees the IH_SUSP flag, so that device power down does not occur while the handler still run ?

In D15755#333364, @kib wrote:

I suspect it might be useful to get an ack from the interrupt thread that it sees the IH_SUSP flag, so that device power down does not occur while the handler still run ?

Yes, that's what I meant in my previous comment when I mentioned the logic similar to that in intr_event_remove_handler.

P.S.
I think intr_event_remove_handler has a problem of its own for handlers that have only a filter and no interrupt thread part.
For those, intr_event_remove_handler simply removes the handler from ie_handlers tailq.
And intr_event_handle iterates that tailq with TAILQ_FOREACH (w/o _SAFE).
So, I think that a race between intr_event_remove_handler and intr_event_handle is not impossible if the interrupt is shared.

rebase

Harbormaster completed remote builds in B18605: Diff 46336.Aug 6 2018, 1:02 PM

add a barrier function to ensure that the interrupt handling code sees that a handler is suspended after intr_event_suspend_handler() returns

add a new interrupt handler flag, IH_CHANGED, that's used to communicate changes to an interrupt handler and acknowledgement of those changes by the interrupt handling code (the interrupt thread)

resume ("un-suspend") an interrupt handler only after calling resume method of the corresponding driver, otherwise the hardware or driver can be in a state where handling an interrupt (shared) would be problematic

Harbormaster completed remote builds in B18898: Diff 46842.Aug 17 2018, 2:57 PM

Generally, this looks good.
If you still have a CardBus laptop, that's a good way to get sharable interrupts, though we do the insanely conservative thing of 'detach/attach' across a suspend to avoid issues like this. The early suspend/resume code for PCI devices was also... somewhat lacking...

So this might help us do additional things with CardBus, if we still cared (which I'm not sure we do enough to enhance cardbus).

sys/x86/x86/nexus.c
603 ↗	(On Diff #46842)	I don't like this. I'd prefer we have a separate interface. There's a fair amount of rman_set_virtual in the tree today. I suspect this is safe today, however, because I think that we only do this today for memory resources. Having a separate interface would let us add asserts to avoid abuse. I'd suggest rman_set_intr_cookie()

avg marked an inline comment as done.Dec 6 2018, 2:35 PM

avg added inline comments.

sys/x86/x86/nexus.c
603 ↗	(On Diff #46842)	I am fine with this suggestion. I guess my original idea was that r_virtual is currently being used only for memory resources, so it is safe to re-purpose it for the interrupt cookie for interrupt resources. What do you think would be better with respect to struct resource_i? just add a new field, e.g., r_irq_cookie make a union out of r_virtual and r_irq_cookie re-use r_virtual, but provide a new couple of accessors as per your suggestion Thanks!

add and make use of rman_{get,set}_irq_cookie() and a new field r_irq_cookie to back them
slide up head

Harbormaster completed remote builds in B21419: Diff 51702.Dec 7 2018, 11:02 AM

This revision was not accepted when it landed; it landed in state Needs Review.Dec 17 2018, 5:11 PM

Closed by commit rS342170: add support for marking interrupt handlers as suspended (authored by avg). · Explain Why

This revision was automatically updated to reflect the committed changes.

add support for marking interrupt handlers as suspended
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 52116

head/sys/dev/pci/pci.c

head/sys/kern/bus_if.m

head/sys/kern/kern_intr.c

head/sys/kern/subr_bus.c

head/sys/kern/subr_rman.c

head/sys/sys/bus.h

head/sys/sys/interrupt.h

head/sys/sys/rman.h

head/sys/x86/x86/nexus.c

add support for marking interrupt handlers as suspendedClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 52116

head/sys/dev/pci/pci.c

head/sys/kern/bus_if.m

head/sys/kern/kern_intr.c

head/sys/kern/subr_bus.c

head/sys/kern/subr_rman.c

head/sys/sys/bus.h

head/sys/sys/interrupt.h

head/sys/sys/rman.h

head/sys/x86/x86/nexus.c

add support for marking interrupt handlers as suspended
ClosedPublic
Actions

Revision Contents
Changeset List