Page MenuHomeFreeBSD

<<Two NVMe Bootup hang>
Needs ReviewPublic

Authored by somalapuram_gmail.com on Nov 6 2018, 8:34 AM.
Tags
None
Referenced Files
Unknown Object (File)
Feb 4 2024, 2:47 PM
Unknown Object (File)
Nov 19 2023, 9:56 AM
Unknown Object (File)
Oct 6 2023, 9:57 PM
Unknown Object (File)
Sep 17 2023, 12:21 PM
Unknown Object (File)
Aug 28 2023, 11:38 AM
Unknown Object (File)
Aug 9 2023, 4:52 AM
Unknown Object (File)
Jun 28 2023, 8:37 AM
Unknown Object (File)
Jun 1 2023, 12:58 PM

Details

Reviewers
mav
mmacy
kib
jhb
Summary

when 2 NVMe are used during boot time system hangs at sched_bind
Hardware used:
CPU: AMD EPYC 7501 32-Core Processor
ACPI APIC Table: <AMD DIESEL >

sched_bind not required while boot up as IRQ's still getting assigned

Test Plan

Connect 2 NVMe on AMD EPYC 7501 numa-domain 0 and 1 bootup hangs sometime

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 20654
Build 20069: arc lint + arc unit

Event Timeline

Why isn’t this hit elsewhere? Are you compiling without EARLY_AP ?

Why isn’t this hit elsewhere? Are you compiling without EARLY_AP ?

options EARLY_AP_STARTUP is enabled..
this is not easy to reproduce, as AMD EPYC 7501 has 64 core's some how sched_bind creating problem during pci_release_msi(dev)
I am think sched_bind is not required during IRQ's assignment.
The actual root cause is unknown.

I have feeling it is a workaround of some other problem. Shouldn't we have fully operational scheduler at that point when EARLY_AP_STARTUP is enabled? This change may open the races this code intended to handle.

I agree that this seems like a workaround for some other problem. Without EARLY_AP_STARTUP the IRQs aren't actually moved until much later during boot (the late SI_SUB_SMP), so you can only see this earlier in boot if EARLY_AP_STARTUP is enabled, but then sched_bind should work fine. Useful strategies for debugging this would be to add KTR to your kernel config and break into DDB when it hangs and examine KTR traces. KTR_INTR and KTR_PROC are probably the useful masks to trace.

Well, exactly the same symptom (hanging sched_bind()) happen on Apollo Lake due to the hardware bug, see r333026. Problem was that a write did not wake up the core in MWAIT, and no interrupts were generated yet.

I do not know what idling method is selected for your machine, try to switch to something different. For instance, Ryzens have an errata 1057 (and possibly 1109) which is almost identical to Apollo Lake issue.

In D17860#384464, @kib wrote:

Well, exactly the same symptom (hanging sched_bind()) happen on Apollo Lake due to the hardware bug, see r333026. Problem was that a write did not wake up the core in MWAIT, and no interrupts were generated yet.

I do not know what idling method is selected for your machine, try to switch to something different. For instance, Ryzens have an errata 1057 (and possibly 1109) which is almost identical to Apollo Lake issue.

With your inputs:
I have tested with Bios disabled Global C-state Contro and

Set the boot option with:
set machdep.idle=spin
set machdep.idle_mwait=0

the issue is still reproduced.