Page MenuHomeFreeBSD

reduce vm_map consistency assertions
ClosedPublic

Authored by dougm on Oct 26 2019, 8:58 PM.

Details

Summary

Turning on full assertion-based consistency checking slows performance dramatically. This change reduces the number of assertions checked by considering only those associated with the most recent map operation.

Something like this has been requested by pho. The original inclusion of these checks was requested by kib.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

dougm created this revision.Oct 26 2019, 8:58 PM
pho added a comment.Oct 27 2019, 3:44 PM

This is a KPTI test. This looks unrelated to me, but ...

20191027 15:16:52 all (469/677): kpti.sh
Fata
lpanic: Assertion p->tp_row < t->t_winsize.tp_row failed at ../../../teken/teken.c:105
cpuid = 10
time = 1572185813
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffff807d872fd60
vpanic() at vpanic+0x19d/frame 0xfffff807d872fdb0
panic() at panic+0x43/frame 0xfffff807d872fe10
teken_subr_do_putchar() at teken_subr_do_putchar+0x138/frame 0xfffff807d872fe60
teken_subr_regular_character() at teken_subr_regular_character+0x27a/frame 0xfffff807d872fea0
teken_input_char() at teken_input_char+0xa7/frame 0xfffff807d872fec0
teken_input() at teken_input+0x111/frame 0xfffff807d872fef0
termcn_cnputc() at termcn_cnputc+0x6e/frame 0xfffff807d872ff20
cnputc() at cnputc+0x7d/frame 0xfffff807d872ff50
cnputs() at cnputs+0x7a/frame 0xfffff807d872ff80
putchar() at putchar+0x150/frame 0xfffff807d8730000
kvprintf() at kvprintf+0x145/frame 0xfffff807d8730120
??() at 0xffffffff00013051/frame 0xfffff807d87301f0
printf() at printf+0x43/frame 0xfffff807d8730250
trap_fatal() at trap_fatal+0x9c/frame 0xfffff807d87302b0
trap_pfault() at trap_pfault+0x99/frame 0xfffff807d8730310
trap() at trap+0x46f/frame 0xfffff807d8730420
calltrap() at calltrap+0x8/frame 0xfffff807d8730420
--- trap 0x246, rip = 0x80020ac38, rsp = 0x7fffffffdc90, rbp = 0x7fffffffe760 ---
KDB: enter: panic
[ thread pid 256166277 tid 3224064 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db>

https://people.freebsd.org/~pho/stress/log/dougm061.txt

kib added a comment.Oct 27 2019, 6:16 PM
In D22163#484493, @pho wrote:

This is a KPTI test. This looks unrelated to me, but ...

20191027 15:16:52 all (469/677): kpti.sh
Fata
lpanic: Assertion p->tp_row < t->t_winsize.tp_row failed at ../../../teken/teken.c:105
cpuid = 10
time = 1572185813
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffff807d872fd60
vpanic() at vpanic+0x19d/frame 0xfffff807d872fdb0
panic() at panic+0x43/frame 0xfffff807d872fe10
teken_subr_do_putchar() at teken_subr_do_putchar+0x138/frame 0xfffff807d872fe60
teken_subr_regular_character() at teken_subr_regular_character+0x27a/frame 0xfffff807d872fea0
teken_input_char() at teken_input_char+0xa7/frame 0xfffff807d872fec0
teken_input() at teken_input+0x111/frame 0xfffff807d872fef0
termcn_cnputc() at termcn_cnputc+0x6e/frame 0xfffff807d872ff20
cnputc() at cnputc+0x7d/frame 0xfffff807d872ff50
cnputs() at cnputs+0x7a/frame 0xfffff807d872ff80
putchar() at putchar+0x150/frame 0xfffff807d8730000
kvprintf() at kvprintf+0x145/frame 0xfffff807d8730120
??() at 0xffffffff00013051/frame 0xfffff807d87301f0
printf() at printf+0x43/frame 0xfffff807d8730250
trap_fatal() at trap_fatal+0x9c/frame 0xfffff807d87302b0
trap_pfault() at trap_pfault+0x99/frame 0xfffff807d8730310
trap() at trap+0x46f/frame 0xfffff807d8730420
calltrap() at calltrap+0x8/frame 0xfffff807d8730420
--- trap 0x246, rip = 0x80020ac38, rsp = 0x7fffffffdc90, rbp = 0x7fffffffe760 ---
KDB: enter: panic
[ thread pid 256166277 tid 3224064 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db>

https://people.freebsd.org/~pho/stress/log/dougm061.txt

I am not sure. There are two panics. First one is the most intetersting, it occured while handling usermode page fault. Second one is something bad occuring in the terminal emulator, do you use video console ?

Please show the line number for trap_pfault+0x99.

pho added a comment.EditedOct 27 2019, 6:21 PM

Serial console on mercat1

(kgdb) l *trap_pfault+0x99
0xffffffff8104d129 is in trap_pfault (../../../amd64/amd64/trap.c:743).
738                      * lock, then it is most likely a fatal kernel page fault.
739                      * If WITNESS is enabled, then it's going to whine about
740                      * bogus LORs with various VM locks, so just skip to the
741                      * fatal trap handling directly.
742                      */
743                     if (td->td_critnest != 0 ||
744                         WITNESS_CHECK(WARN_SLEEPOK | WARN_GIANTOK, NULL,
745                         "Kernel page fault") != 0) {
746                             trap_fatal(frame, eva);
747                             return (-1);
(kgdb)
alc added a comment.EditedOct 30 2019, 4:35 AM

I spoke with Doug about this in person. I suggested to him that instead of completely eliminating the check over the entire map that we try to reduce the frequency with which it occurs. In particular, I suggested that the consistency check be deferred until we are releasing the write lock on the map, so that, for example, an mprotect that coalesced multiple map entries would only do the consistency check once. In addition, I suggested that we might perform the consistency check only 1 out of N times that we modify the vm map.

dougm updated this revision to Diff 63794.Oct 30 2019, 2:59 PM

For cases when DIAGNOSTIC is #defined, add a counter to vm_map, to count the number of map updates since last write-unlock assertion check. Add a potential thread-unlock assertion check everywhere the map lock is freed or downgraded after a potential map update, where 'potential' means 'only if the operation count exceeds the number of map entries'. Make the default behavior, when DIAGNOSTIC is defined, to only to assertion checks in these unlock cases, but leave in the possibility of doing all the assertion checks we do now, for the sake of the person trying to debug a problem highlighted by the infrequent assertion checks.

dougm updated this revision to Diff 63797.Oct 30 2019, 3:27 PM

Add a missing "while(0)"

alc added inline comments.Oct 30 2019, 3:37 PM
sys/vm/vm_map.h
211 ↗(On Diff #63797)

The existing "timestamp" has essentially the same meaning.

dougm updated this revision to Diff 63800.Oct 30 2019, 3:49 PM

Initialize nupdates in init.

nupdates counts map modifications, not locks/unlocks, and one lock/unlock might encompass several map modifications, as in the mprotect case mentioned above. Also, 'timestamp' can't be arbitrarily reset to zero.

pho added a comment.Oct 30 2019, 4:18 PM

Still doesn't compile.

dougm updated this revision to Diff 63804.Oct 30 2019, 5:19 PM

Get it to compile in GENERIC, with DIAGNOSTIC defined or not.

pho added a comment.Nov 2 2019, 10:04 AM

I ran all of the stress2 tests on Diff 63804 without seeing any problems.

dougm added a comment.Nov 7 2019, 3:00 AM

I'll commit this over the upcoming weekend if there are no further comments.

kib accepted this revision.Nov 7 2019, 6:07 PM
This revision is now accepted and ready to land.Nov 7 2019, 6:07 PM
alc accepted this revision.Sat, Nov 9, 4:53 PM
This revision was automatically updated to reflect the committed changes.