when 2 NVMe are used during boot time system hangs at sched_bind
Hardware used:
CPU: AMD EPYC 7501 32-Core Processor
ACPI APIC Table: <AMD DIESEL >
sched_bind not required while boot up as IRQ's still getting assigned
Differential D17860
<<Two NVMe Bootup hang> somalapuram_gmail.com on Nov 6 2018, 8:34 AM. Authored by Tags None Referenced Files
Subscribers
Details when 2 NVMe are used during boot time system hangs at sched_bind sched_bind not required while boot up as IRQ's still getting assigned Connect 2 NVMe on AMD EPYC 7501 numa-domain 0 and 1 bootup hangs sometime
Diff Detail
Event TimelineComment Actions options EARLY_AP_STARTUP is enabled.. Comment Actions I have feeling it is a workaround of some other problem. Shouldn't we have fully operational scheduler at that point when EARLY_AP_STARTUP is enabled? This change may open the races this code intended to handle. Comment Actions I agree that this seems like a workaround for some other problem. Without EARLY_AP_STARTUP the IRQs aren't actually moved until much later during boot (the late SI_SUB_SMP), so you can only see this earlier in boot if EARLY_AP_STARTUP is enabled, but then sched_bind should work fine. Useful strategies for debugging this would be to add KTR to your kernel config and break into DDB when it hangs and examine KTR traces. KTR_INTR and KTR_PROC are probably the useful masks to trace. Comment Actions Well, exactly the same symptom (hanging sched_bind()) happen on Apollo Lake due to the hardware bug, see r333026. Problem was that a write did not wake up the core in MWAIT, and no interrupts were generated yet. I do not know what idling method is selected for your machine, try to switch to something different. For instance, Ryzens have an errata 1057 (and possibly 1109) which is almost identical to Apollo Lake issue. Comment Actions With your inputs: Set the boot option with: the issue is still reproduced. |