Page MenuHomeFreeBSD

Wait longer for a previous IPI to be sent
ClosedPublic

Authored by vangyzen on Apr 22 2021, 10:46 PM.

Details

Summary

When sending an IPI, if a previous IPI is still pending delivery,
native_lapic_ipi_vectored() waits for the previous IPI to be sent.
We've seen a few inexplicable panics with the current timeout of 50 ms.
Increase the timeout to 1 second and make it tunable.

No hardware specification mentions a timeout in this case; I checked
the Intel SDM, Intel MP spec, and Intel x2APIC spec. Linux and illumos
wait forever. In Linux, see __default_send_IPI_shortcut() in
arch/x86/kernel/apic/ipi.c. In illumos, see apic_send_ipi() in
usr/src/uts/i86pc/io/pcplusmp/apic_common.c. However, misbehaving hardware
could hang the system if we wait forever.

MFC after: 1 week
Sponsored by: Dell EMC Isilon

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

I don't have specific objections. I can't recall I ever saw this panic and not sure how the interrupt delivery can take more that 50ms. But it reminds me that we do have spinlock timeout, which is also not required. Do you see any benefits from this comparing to just increasing the timeout back to 1s as it was before f418f79ce299 to keep ability to debug misbehaving hardware?

On one hand I agree that hardware might need some time to recover from previous send. On the other hand, you already mentioned that the patch effectively changes the mode of reporting of faulty hardware from some visible panic to silent hang. Perhaps it would triple some 'spinlock held too long' panic later. Why do you think it would be useful for your situation?

That said, why don't you use x2APIC mode?

IMO you should change the hardcoded constant into the sysctl/tunable controlled variable.

In D29942#671819, @mav wrote:

I don't have specific objections. I can't recall I ever saw this panic and not sure how the interrupt delivery can take more that 50ms. But it reminds me that we do have spinlock timeout, which is also not required. Do you see any benefits from this comparing to just increasing the timeout back to 1s as it was before f418f79ce299 to keep ability to debug misbehaving hardware?

In D29942#671908, @kib wrote:

On one hand I agree that hardware might need some time to recover from previous send. On the other hand, you already mentioned that the patch effectively changes the mode of reporting of faulty hardware from some visible panic to silent hang. Perhaps it would triple some 'spinlock held too long' panic later. Why do you think it would be useful for your situation?

That said, why don't you use x2APIC mode?

IMO you should change the hardcoded constant into the sysctl/tunable controlled variable.

I'll add the sysctl/tunable and increase the default to 1s.

Since the most popular bare-metal Unix doesn't have a timeout, misbehaving hardware must not be very common. However, a long, tunable timeout is a perfectly reasonable hedge.

We do use x2APIC on recent platforms, but we still hit this rare panic on older generations.

  • add sysctl/tunable and wait 1s by default
vangyzen retitled this revision from Wait forever for a previous IPI to be sent to Wait longer for a previous IPI to be sent.Apr 26 2021, 3:12 PM
vangyzen edited the summary of this revision. (Show Details)
sys/x86/x86/local_apic.c
216

It is not an IPI timeout. There is some frivolous naming of the internal variable, but for user it is definitely confusing.

I think the name should be something like APIC ready timeout, and perhaps the description should mention that it is only for xAPIC.

This revision is now accepted and ready to land.Apr 26 2021, 3:37 PM
vangyzen added inline comments.
sys/x86/x86/local_apic.c
216

I used names from Intel SDM 10.6.1.

@mav How does this look?

It looks good to me. I am just thinking whether it worth to mark the variable as __read_mostly.

In D29942#673010, @mav wrote:

It looks good to me. I am just thinking whether it worth to mark the variable as __read_mostly.

It's very unlikely to hurt. I'll add it.

This revision now requires review to proceed.Apr 30 2021, 4:00 PM
This revision was not accepted when it landed; it landed in state Needs Review.Apr 30 2021, 6:34 PM
This revision was automatically updated to reflect the committed changes.