Page MenuHomeFreeBSD

reduce vm_map consistency assertions
ClosedPublic

Authored by dougm on Oct 26 2019, 8:58 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 20 2023, 1:49 AM
Unknown Object (File)
Dec 8 2023, 10:45 PM
Unknown Object (File)
Dec 8 2023, 10:45 PM
Unknown Object (File)
Dec 8 2023, 10:45 PM
Unknown Object (File)
Dec 8 2023, 10:45 PM
Unknown Object (File)
Nov 30 2023, 6:31 PM
Unknown Object (File)
Oct 4 2023, 11:07 AM
Unknown Object (File)
Sep 6 2023, 7:03 AM
Subscribers

Details

Summary

Turning on full assertion-based consistency checking slows performance dramatically. This change reduces the number of assertions checked by considering only those associated with the most recent map operation.

Something like this has been requested by pho. The original inclusion of these checks was requested by kib.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

This is a KPTI test. This looks unrelated to me, but ...

20191027 15:16:52 all (469/677): kpti.sh
Fata
lpanic: Assertion p->tp_row < t->t_winsize.tp_row failed at ../../../teken/teken.c:105
cpuid = 10
time = 1572185813
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffff807d872fd60
vpanic() at vpanic+0x19d/frame 0xfffff807d872fdb0
panic() at panic+0x43/frame 0xfffff807d872fe10
teken_subr_do_putchar() at teken_subr_do_putchar+0x138/frame 0xfffff807d872fe60
teken_subr_regular_character() at teken_subr_regular_character+0x27a/frame 0xfffff807d872fea0
teken_input_char() at teken_input_char+0xa7/frame 0xfffff807d872fec0
teken_input() at teken_input+0x111/frame 0xfffff807d872fef0
termcn_cnputc() at termcn_cnputc+0x6e/frame 0xfffff807d872ff20
cnputc() at cnputc+0x7d/frame 0xfffff807d872ff50
cnputs() at cnputs+0x7a/frame 0xfffff807d872ff80
putchar() at putchar+0x150/frame 0xfffff807d8730000
kvprintf() at kvprintf+0x145/frame 0xfffff807d8730120
??() at 0xffffffff00013051/frame 0xfffff807d87301f0
printf() at printf+0x43/frame 0xfffff807d8730250
trap_fatal() at trap_fatal+0x9c/frame 0xfffff807d87302b0
trap_pfault() at trap_pfault+0x99/frame 0xfffff807d8730310
trap() at trap+0x46f/frame 0xfffff807d8730420
calltrap() at calltrap+0x8/frame 0xfffff807d8730420
--- trap 0x246, rip = 0x80020ac38, rsp = 0x7fffffffdc90, rbp = 0x7fffffffe760 ---
KDB: enter: panic
[ thread pid 256166277 tid 3224064 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db>

https://people.freebsd.org/~pho/stress/log/dougm061.txt

In D22163#484493, @pho wrote:

This is a KPTI test. This looks unrelated to me, but ...

20191027 15:16:52 all (469/677): kpti.sh
Fata
lpanic: Assertion p->tp_row < t->t_winsize.tp_row failed at ../../../teken/teken.c:105
cpuid = 10
time = 1572185813
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffff807d872fd60
vpanic() at vpanic+0x19d/frame 0xfffff807d872fdb0
panic() at panic+0x43/frame 0xfffff807d872fe10
teken_subr_do_putchar() at teken_subr_do_putchar+0x138/frame 0xfffff807d872fe60
teken_subr_regular_character() at teken_subr_regular_character+0x27a/frame 0xfffff807d872fea0
teken_input_char() at teken_input_char+0xa7/frame 0xfffff807d872fec0
teken_input() at teken_input+0x111/frame 0xfffff807d872fef0
termcn_cnputc() at termcn_cnputc+0x6e/frame 0xfffff807d872ff20
cnputc() at cnputc+0x7d/frame 0xfffff807d872ff50
cnputs() at cnputs+0x7a/frame 0xfffff807d872ff80
putchar() at putchar+0x150/frame 0xfffff807d8730000
kvprintf() at kvprintf+0x145/frame 0xfffff807d8730120
??() at 0xffffffff00013051/frame 0xfffff807d87301f0
printf() at printf+0x43/frame 0xfffff807d8730250
trap_fatal() at trap_fatal+0x9c/frame 0xfffff807d87302b0
trap_pfault() at trap_pfault+0x99/frame 0xfffff807d8730310
trap() at trap+0x46f/frame 0xfffff807d8730420
calltrap() at calltrap+0x8/frame 0xfffff807d8730420
--- trap 0x246, rip = 0x80020ac38, rsp = 0x7fffffffdc90, rbp = 0x7fffffffe760 ---
KDB: enter: panic
[ thread pid 256166277 tid 3224064 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db>

https://people.freebsd.org/~pho/stress/log/dougm061.txt

I am not sure. There are two panics. First one is the most intetersting, it occured while handling usermode page fault. Second one is something bad occuring in the terminal emulator, do you use video console ?

Please show the line number for trap_pfault+0x99.

Serial console on mercat1

(kgdb) l *trap_pfault+0x99
0xffffffff8104d129 is in trap_pfault (../../../amd64/amd64/trap.c:743).
738                      * lock, then it is most likely a fatal kernel page fault.
739                      * If WITNESS is enabled, then it's going to whine about
740                      * bogus LORs with various VM locks, so just skip to the
741                      * fatal trap handling directly.
742                      */
743                     if (td->td_critnest != 0 ||
744                         WITNESS_CHECK(WARN_SLEEPOK | WARN_GIANTOK, NULL,
745                         "Kernel page fault") != 0) {
746                             trap_fatal(frame, eva);
747                             return (-1);
(kgdb)

I spoke with Doug about this in person. I suggested to him that instead of completely eliminating the check over the entire map that we try to reduce the frequency with which it occurs. In particular, I suggested that the consistency check be deferred until we are releasing the write lock on the map, so that, for example, an mprotect that coalesced multiple map entries would only do the consistency check once. In addition, I suggested that we might perform the consistency check only 1 out of N times that we modify the vm map.

For cases when DIAGNOSTIC is #defined, add a counter to vm_map, to count the number of map updates since last write-unlock assertion check. Add a potential thread-unlock assertion check everywhere the map lock is freed or downgraded after a potential map update, where 'potential' means 'only if the operation count exceeds the number of map entries'. Make the default behavior, when DIAGNOSTIC is defined, to only to assertion checks in these unlock cases, but leave in the possibility of doing all the assertion checks we do now, for the sake of the person trying to debug a problem highlighted by the infrequent assertion checks.

sys/vm/vm_map.h
211

The existing "timestamp" has essentially the same meaning.

Initialize nupdates in init.

nupdates counts map modifications, not locks/unlocks, and one lock/unlock might encompass several map modifications, as in the mprotect case mentioned above. Also, 'timestamp' can't be arbitrarily reset to zero.

Get it to compile in GENERIC, with DIAGNOSTIC defined or not.

I ran all of the stress2 tests on Diff 63804 without seeing any problems.

I'll commit this over the upcoming weekend if there are no further comments.

This revision is now accepted and ready to land.Nov 7 2019, 6:07 PM