Page MenuHomeFreeBSD

power down devices on pci bus only after having suspended all attached drivers
AbandonedPublic

Authored by avg on Jun 7 2018, 7:15 AM.
Tags
None
Referenced Files
Unknown Object (File)
Jan 14 2024, 5:22 AM
Unknown Object (File)
Jan 9 2024, 6:59 AM
Unknown Object (File)
Dec 20 2023, 5:25 AM
Unknown Object (File)
Dec 10 2023, 12:50 AM
Unknown Object (File)
Dec 5 2023, 10:12 AM
Unknown Object (File)
Nov 23 2023, 7:21 AM
Unknown Object (File)
Nov 22 2023, 7:51 PM
Unknown Object (File)
Nov 13 2023, 9:39 PM
Subscribers
None

Details

Reviewers
imp
jhb
Summary

The goal of this change is to fix a problem with PCI shared interrupts
during suspend and resume.

I have observed a couple of variations of the following scenario.
Devices A and B are on the same PCI bus and share the same interrupt.
Device A's driver is suspended first and the device is powered down.
Device B generates an interrupt. Interrupt handlers of both drivers are
called. Device A's interrupt handler accesses registers of the powered
down device and gets back bogus values (I assume all 0xff). That data
is interpreted as interrupt status bits, etc. So, the interrupt handler
gets confused and may produce some noise or enter an infinite loop, etc.

The problem can be fixed in many ways. The approach in this change
seemed like the least intrusive.

Test Plan

This change fixes a problem on one of my systems where with a HDA
controller and a couple of USB controller share IRQ 16. The audio
controller is suspended first and the USB controller can raise an
interrupt after that.
Then I saw a lot of messages like these:

hdac0: Unexpected unsolicited response from address 0: 0020010b
hdac0: Unexpected unsolicited response from address 0: 0000000b
hdac0: Unexpected unsolicited response from address 0: 1b1a1918
...
hdac0: Unexpected unsolicited response from address 0: 00000000

And the suspend would frequently hang after the message. It seems that
hdac_intr_handler (running in an ithread) would enter an infinite loop
and, since the interrupt is bound to the BSP, the suspend thread, also
bound to BSP, would not get a chance to run.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 17064
Build 16925: arc lint + arc unit

Event Timeline

This breaks devctl suspend/resume which relies on suspend_child and resume_child being self-contained. Also, devctl suspend / resume can create this same issue at runtime, so I think we will instead need to "neuter" any registered INTx interrupt handler (MSI aren't shared).

Thank you for pointing out those problems.
I have an alternative WIP where I added support for marking interrupts as suspended.
Not sure if I did that right. I'll post a review on Monday so that problems with it can be pointed out and better ideas could be suggested.
In that work, I also suspend interrupts only for PCI devices and only legacy interrupts.
Interrupts are suspended and resumed in pci_suspend_child and pci_resume_child.