<<Two NVMe Bootup hang>
Needs ReviewPublic
Actions

Authored by somalapuram_gmail.com on Nov 6 2018, 8:34 AM.

Details

Reviewers

mav
mmacy
kib
jhb

Summary

when 2 NVMe are used during boot time system hangs at sched_bind
Hardware used:
CPU: AMD EPYC 7501 32-Core Processor
ACPI APIC Table: <AMD DIESEL >

sched_bind not required while boot up as IRQ's still getting assigned

Test Plan

Connect 2 NVMe on AMD EPYC 7501 numa-domain 0 and 1 bootup hangs sometime

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 20654
Build 20069: arc lint + arc unit

Event Timeline

somalapuram_gmail.com created this revision.Nov 6 2018, 8:34 AM

Herald added subscribers: Contributor Reviews (src), imp. · View Herald TranscriptNov 6 2018, 8:34 AM

Harbormaster completed remote builds in B20654: Diff 50057.Nov 6 2018, 8:34 AM

somalapuram_gmail.com added reviewers: mav, mmacy.Nov 6 2018, 8:40 AM

Why isn’t this hit elsewhere? Are you compiling without EARLY_AP ?

In D17860#384153, @mmacy wrote:

Why isn’t this hit elsewhere? Are you compiling without EARLY_AP ?

options EARLY_AP_STARTUP is enabled..
this is not easy to reproduce, as AMD EPYC 7501 has 64 core's some how sched_bind creating problem during pci_release_msi(dev)
I am think sched_bind is not required during IRQ's assignment.
The actual root cause is unknown.

mav added reviewers: kib, jhb.Nov 14 2018, 3:56 PM

Seems reasonable to me.

I have feeling it is a workaround of some other problem. Shouldn't we have fully operational scheduler at that point when EARLY_AP_STARTUP is enabled? This change may open the races this code intended to handle.

I agree that this seems like a workaround for some other problem. Without EARLY_AP_STARTUP the IRQs aren't actually moved until much later during boot (the late SI_SUB_SMP), so you can only see this earlier in boot if EARLY_AP_STARTUP is enabled, but then sched_bind should work fine. Useful strategies for debugging this would be to add KTR to your kernel config and break into DDB when it hangs and examine KTR traces. KTR_INTR and KTR_PROC are probably the useful masks to trace.

Well, exactly the same symptom (hanging sched_bind()) happen on Apollo Lake due to the hardware bug, see r333026. Problem was that a write did not wake up the core in MWAIT, and no interrupts were generated yet.

I do not know what idling method is selected for your machine, try to switch to something different. For instance, Ryzens have an errata 1057 (and possibly 1109) which is almost identical to Apollo Lake issue.

In D17860#384464, @kib wrote:

Well, exactly the same symptom (hanging sched_bind()) happen on Apollo Lake due to the hardware bug, see r333026. Problem was that a write did not wake up the core in MWAIT, and no interrupts were generated yet.

I do not know what idling method is selected for your machine, try to switch to something different. For instance, Ryzens have an errata 1057 (and possibly 1109) which is almost identical to Apollo Lake issue.

With your inputs:
I have tested with Bios disabled Global C-state Contro and

Set the boot option with:
set machdep.idle=spin
set machdep.idle_mwait=0

the issue is still reproduced.

Revision Contents
Changeset List

Path

Size

sys/

x86/

local_apic.c

4 lines

Diff 50057

View Options

<<Two NVMe Bootup hang>Needs ReviewPublicActions