Page MenuHomeFreeBSD

riscv: Fix pmap_kextract racing with concurrent superpage promotion/demotion
ClosedPublic

Authored by jrtc27 on Wed, Jul 21, 2:09 AM.

Details

Summary

This repeats amd64's cfcbf8c6fd3b (r180498) and i386's cf3508519c5e
(r202894) but for riscv; pmap_kextract must be lock-free and so it can
race with superpage promotion and demotion, thus the L2 entry must only
be loaded once to avoid using inconsistent state.

PR: 250866

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Since updating my Unmatched tree with this patch 11 days ago I haven't heard of any more panics from zBeeble (dgilbert) who's been continuing to build various ports locally, as well as possibly even a full buildworld+buildkernel. I've asked for confirmation of the distinct lack of panics (compared with one every day or so before), but strongly believe this was indeed the issue, especially since the amd64 and i386 commits explicitly mention the bug causing panics with ZFS (though despite trawling the web I couldn't find any record of *what* the panics were), which is what was seen here.

EDIT: Lack of panics with this patch applied has been confirmed.

Since updating my Unmatched tree with this patch 11 days ago I haven't heard of any more panics from zBeeble (dgilbert) who's been continuing to build various ports locally, as well as possibly even a full buildworld+buildkernel. I've asked for confirmation of the distinct lack of panics (compared with one every day or so before), but strongly believe this was indeed the issue, especially since the amd64 and i386 commits explicitly mention the bug causing panics with ZFS (though despite trawling the web I couldn't find any record of *what* the panics were), which is what was seen here.

What is the panic in this case?

This revision is now accepted and ready to land.Wed, Jul 21, 1:00 PM

Since updating my Unmatched tree with this patch 11 days ago I haven't heard of any more panics from zBeeble (dgilbert) who's been continuing to build various ports locally, as well as possibly even a full buildworld+buildkernel. I've asked for confirmation of the distinct lack of panics (compared with one every day or so before), but strongly believe this was indeed the issue, especially since the amd64 and i386 commits explicitly mention the bug causing panics with ZFS (though despite trawling the web I couldn't find any record of *what* the panics were), which is what was seen here.

What is the panic in this case?

For example:

panic: pmap_l2_to_l3: PA out of range, PA: 0x0
cpuid = 1
time = 1625512247
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
kdb_backtrace() at kdb_backtrace+0x2c
vpanic() at vpanic+0x148
panic() at panic+0x2a
pmap_remove_write() at pmap_remove_write+0x56a
vm_object_page_collect_flush() at vm_object_page_collect_flush+0xf8
vm_object_page_clean() at vm_object_page_clean+0x144
vinactivef() at vinactivef+0x90
vput_final() at vput_final+0x2ea
vput() at vput+0x32
vn_close1() at vn_close1+0x13c
vn_closefile() at vn_closefile+0x44
_fdrop() at _fdrop+0x18
closef() at closef+0x1b8
closefp_impl() at closefp_impl+0x78
closefp() at closefp+0x52
kern_close() at kern_close+0x134
sys_close() at sys_close+0xe
do_trap_user() at do_trap_user+0x208
cpu_exception_handler_user() at cpu_exception_handler_user+0x72
--- exception 8, tval = 0x6
KDB: enter: panic

For example:

panic: pmap_l2_to_l3: PA out of range, PA: 0x0
cpuid = 1
time = 1625512247
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
kdb_backtrace() at kdb_backtrace+0x2c
vpanic() at vpanic+0x148
panic() at panic+0x2a
pmap_remove_write() at pmap_remove_write+0x56a
vm_object_page_collect_flush() at vm_object_page_collect_flush+0xf8
vm_object_page_clean() at vm_object_page_clean+0x144
vinactivef() at vinactivef+0x90
vput_final() at vput_final+0x2ea
vput() at vput+0x32
vn_close1() at vn_close1+0x13c
vn_closefile() at vn_closefile+0x44
_fdrop() at _fdrop+0x18
closef() at closef+0x1b8
closefp_impl() at closefp_impl+0x78
closefp() at closefp+0x52
kern_close() at kern_close+0x134
sys_close() at sys_close+0xe
do_trap_user() at do_trap_user+0x208
cpu_exception_handler_user() at cpu_exception_handler_user+0x72
--- exception 8, tval = 0x6
KDB: enter: panic

The change itself is easy enough to understand, but I don't see exactly how the issue correlates to the panic. Are you able to explain it?

From a quick look:
If pmap_kextract() races with demotion, then it's possible that the pa returned points to the l3 table, rather than the expected physical address corresponding to va. There aren't a ton of callers of pmap_kextract(), but one interesting one is pcpu_page_free(), which looks like it could inadvertently free the wrong vm page if the race happens as I described. Could this lead to the panics observed?

For example:

panic: pmap_l2_to_l3: PA out of range, PA: 0x0
cpuid = 1
time = 1625512247
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
kdb_backtrace() at kdb_backtrace+0x2c
vpanic() at vpanic+0x148
panic() at panic+0x2a
pmap_remove_write() at pmap_remove_write+0x56a
vm_object_page_collect_flush() at vm_object_page_collect_flush+0xf8
vm_object_page_clean() at vm_object_page_clean+0x144
vinactivef() at vinactivef+0x90
vput_final() at vput_final+0x2ea
vput() at vput+0x32
vn_close1() at vn_close1+0x13c
vn_closefile() at vn_closefile+0x44
_fdrop() at _fdrop+0x18
closef() at closef+0x1b8
closefp_impl() at closefp_impl+0x78
closefp() at closefp+0x52
kern_close() at kern_close+0x134
sys_close() at sys_close+0xe
do_trap_user() at do_trap_user+0x208
cpu_exception_handler_user() at cpu_exception_handler_user+0x72
--- exception 8, tval = 0x6
KDB: enter: panic

The change itself is easy enough to understand, but I don't see exactly how the issue correlates to the panic. Are you able to explain it?

Not really, it was a shot in the dark that seemed relevant to ZFS, so the justification for why it fixes the bug is rather empirical.

From a quick look:
If pmap_kextract() races with demotion, then it's possible that the pa returned points to the l3 table, rather than the expected physical address corresponding to va. There aren't a ton of callers of pmap_kextract(), but one interesting one is pcpu_page_free(), which looks like it could inadvertently free the wrong vm page if the race happens as I described. Could this lead to the panics observed?

I didn't really chase it through. But yes, that kind of thing is what I was thinking might be happening, where you end up doing manipulations on the "wrong" page due to pmap_kextract giving you back the wrong address and corrupt the pmap, only discovering at a later date when you come to do another operation on a now-corrupted part of the pmap.