Page MenuHomeFreeBSD

Do not sleep in vm_wait() if pagedaemon is not yet started.
ClosedPublic

Authored by kib on Nov 2 2016, 1:50 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 8, 4:56 PM
Unknown Object (File)
Sun, Nov 3, 2:32 AM
Unknown Object (File)
Thu, Oct 31, 5:31 PM
Unknown Object (File)
Tue, Oct 29, 2:26 AM
Unknown Object (File)
Tue, Oct 29, 2:26 AM
Unknown Object (File)
Oct 23 2024, 6:02 PM
Unknown Object (File)
Oct 22 2024, 10:27 PM
Unknown Object (File)
Oct 20 2024, 10:50 AM
Subscribers

Details

Summary

Panic instead. Example of the change in action, for damaged kernel binary file:

        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.0-CURRENT #18 r308105+56db418(lld-buildworld)-dirty: Wed Nov  2 10:07:27 EDT 2016
    emaste@feynman:/tank/emaste/obj/tank/emaste/src/freebsd-xlld/sys/GENERIC amd64
FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on LLVM 3.8.0)
WARNING: WITNESS option enabled, expect reduced performance.
VT(efifb): resolution 800x600
CPU: QEMU Virtual CPU version 2.5+ (3430.58-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x663  Family=0x6  Model=0x6  Stepping=3
  Features=0x783fbfd<FPU,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2>
  Features2=0x80002001<SSE3,CX16,HV>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x5<LAHF,SVM>
  SVM: NAsids=16
real memory  = 268238848 (255 MB)
avail memory = 213946368 (204 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <OVMF   OVMFEDK2>
panic: vm_wait in early boot
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff81e01a40
vpanic() at vpanic+0x182/frame 0xffffffff81e01ac0
panic() at panic+0x43/frame 0xffffffff81e01b20
vm_wait() at vm_wait+0xd6/frame 0xffffffff81e01b40
kmem_alloc_contig() at kmem_alloc_contig+0x1bd/frame 0xffffffff81e01c00
contigmalloc() at contigmalloc+0x33/frame 0xffffffff81e01c40
x86bios_modevent() at x86bios_modevent+0x21a/frame 0xffffffff81e01c60
module_register_init() at module_register_init+0xb0/frame 0xffffffff81e01c90
mi_startup() at mi_startup+0x118/frame 0xffffffff81e01cb0
btext() at btext+0x2c
KDB: enter: panic
[ thread pid 0 tid 0 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db>

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib retitled this revision from to Do not sleep in vm_wait() if pagedaemon is not yet started..
kib updated this object.
kib edited the test plan for this revision. (Show Details)
kib set the repository for this revision to rS FreeBSD src repository - subversion.
kib added a subscriber: emaste.

Use freebsd cdefs.h name.

kib added reviewers: alc, markj.

FWIW this is what prompted the contigmalloc failure:

real memory  = 268238848 (255 MB)
Physical memory chunk(s):
0x0000000000001000 - 0x0000000000000fff, 0 bytes (0 pages)
0x0000000001e40000 - 0x000000000e776fff, 210989056 bytes (51511 pages)
0x000000000ee4e000 - 0x000000000fecdfff, 17301504 bytes (4224 pages)
0x000000000ff32000 - 0x000000000ffb7fff, 548864 bytes (134 pages)
avail memory = 213946368 (204 MB)

Shouldn't the policy be determined by the contigmalloc() caller? Not all contigmalloc failures during boot are fatal, see D7417 for instance.

In D8421#175264, @markj wrote:

Shouldn't the policy be determined by the contigmalloc() caller? Not all contigmalloc failures during boot are fatal, see D7417 for instance.

Yes, in the backtrace I posted, the reasonable action by the caller is to use M_NOWAIT if in early boot stage. Note that cold state != pagedaemon active state, there are fine differences. Even pageproc != NULL is not ideal, but is IMO better approximation than cold.

That said, D7417 is really complimentary to this change. D7417 tries to auto-correct the callers in a way I formulated in the paragraph above, while my change provides just a better message for the failure mode where either caller is not fixed or auto-correction was not applied. I.e. right now the system panics anyway if any (vm_page_alloc_contig or vm_page_alloc) allocation failed from the waitable caller, but with a cryptic message about thread not runnable, which requires somebody else than the driver author, to interpret.

markj edited edge metadata.
In D8421#175283, @kib wrote:

D7417 tries to auto-correct the callers in a way I formulated in the paragraph above, while my change provides just a better message for the failure mode where either caller is not fixed or auto-correction was not applied. I.e. right now the system panics anyway if any (vm_page_alloc_contig or vm_page_alloc) allocation failed from the waitable caller, but with a cryptic message about thread not runnable, which requires somebody else than the driver author, to interpret.

Indeed, this change makes sense to me as a safety belt.

This revision is now accepted and ready to land.Nov 2 2016, 5:33 PM
This revision was automatically updated to reflect the committed changes.