Page MenuHomeFreeBSD

Enable superpage promotion within the kernel pmap on arm64
ClosedPublic

Authored by alc on Aug 3 2019, 5:32 PM.

Details

Summary

r350004 changed data_abort() such that it generally invokes pmap_fault() on virtual addresses within the kernel pmap. (Previously, pmap_fault() was only invoked on kernel virtual addresses within the direct map.)

During the window of time when pmap_update_entry() is performing the "break-before-make" sequence, the affected pmap is locked and the only data being accessed is either within a page table page or the underlying thread's stack. Neither the mapping to the page table page nor the mapping to the thread's stack could be undergoing concurrent promotion or demotion. In particular, the kernel pmap lock would prevent the concurrent demotion of the direct map entry that provides access to the page table page that is being modified by pmap_update_entry().

And, because the kernel pmap is locked during promotion, if another processor traps during the "break-before-make" sequence, it is delayed by the pmap lock in pmap_fault() until the "make" has completed.

Test Plan

I've been running with this change for a little while on Cortex-A72 processors and seen no problems.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

alc created this revision.Aug 3 2019, 5:32 PM
markj accepted this revision.Aug 3 2019, 11:47 PM
This revision is now accepted and ready to land.Aug 3 2019, 11:47 PM
This revision was automatically updated to reflect the committed changes.
andrew added a comment.Aug 5 2019, 8:59 AM

Hasn't this opened a raise between promotion/demotion and pmap_kextract? pmap_kextract walks the page table so may find a zero entry from the break-before-make sequence.

alc added a comment.Aug 5 2019, 6:19 PM

Hasn't this opened a raise between promotion/demotion and pmap_kextract? pmap_kextract walks the page table so may find a zero entry from the break-before-make sequence.

Yes, it has. Moreover, we use pmap_kextract() in places where the pmap lock can't be acquired, e.g., its_cmd_prepare(). Sigh.

In order to maintain a lock-less implementation of pmap_kextract(), here is what I suggest. While we are required to break the mapping during demotion/promotion, that does not mean that the L2 entry has to be zero, only that it has to be invalid before we perform the TLB invalidations. In other words, when we break the L2 entry, rather than zeroing the entire entry, I suggest that we simply clear bit 0. Then, pmap_kextract()'s page table walk can be modified to ignore bit 0, regarding any non-zero entry as "valid". Bit 1 will still indicate whether the entry is a table or block. Thoughts?

This causes panics in ZFS. Just doing an installkernel or cp -r /boot/kernel /boot/kernel.something results in:

Fatal data abort:
  x0:                0
  x1: fffffd00077bf560
  x2:               21
  x3:                3
  x4: fffffd0103e503c0
  x5:                8
  x6: fffffd0103e503c0
  x7: ffff0000716db4b4
  x8:                0
  x9:                1
 x10:               21
 x11:                0
 x12:                0
 x13:               c0
 x14:               54
 x15:         ffffffff
 x16: ffff000001548238
 x17: ffff0000003e0c78
 x18: ffff0000716db970
 x19: ffff0000015480c0
 x20: ffff00007f500000
 x21: ffff00007f500000
 x22: fffffd01005f3640
 x23: ffff000000a16e04
 x24:             1000
 x25: fffffd000e21c180
 x26: ffff000000a17000
 x27: fffffd0007640ba8
 x28: fffffd0007640830
 x29: ffff0000716db990
  sp: ffff0000716db970
  lr: ffff0000003e0ca4
 elr: ffff0000003e0ca4
spsr:         60000145
 far:                8
 esr:         96000005
panic: vm_fault failed: ffff0000003e0ca4
cpuid = 3
time = 1565096451
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
	 pc = 0xffff0000007450e8  lr = 0xffff00000010302c
	 sp = 0xffff0000716db360  fp = 0xffff0000716db570

db_trace_self_wrapper() at vpanic+0x18c
	 pc = 0xffff00000010302c  lr = 0xffff000000406ff4
	 sp = 0xffff0000716db580  fp = 0xffff0000716db630

vpanic() at panic+0x44
	 pc = 0xffff000000406ff4  lr = 0xffff000000406e64
	 sp = 0xffff0000716db640  fp = 0xffff0000716db6c0

panic() at data_abort+0x1d8
	 pc = 0xffff000000406e64  lr = 0xffff000000760770
	 sp = 0xffff0000716db6d0  fp = 0xffff0000716db780

data_abort() at do_el1h_sync+0x128
	 pc = 0xffff000000760770  lr = 0xffff000000760494
	 sp = 0xffff0000716db790  fp = 0xffff0000716db7c0

do_el1h_sync() at handle_el1h_sync+0x74
	 pc = 0xffff000000760494  lr = 0xffff000000747874
	 sp = 0xffff0000716db7d0  fp = 0xffff0000716db8e0

handle_el1h_sync() at free+0x28
	 pc = 0xffff000000747874  lr = 0xffff0000003e0ca0
	 sp = 0xffff0000716db8f0  fp = 0xffff0000716db990

free() at abd_free+0x164
	 pc = 0xffff0000003e0ca0  lr = 0xffff0000016dea40
	 sp = 0xffff0000716db9a0  fp = 0xffff0000716db9d0

abd_free() at zio_done+0x53c
	 pc = 0xffff0000016dea40  lr = 0xffff000001793b70
	 sp = 0xffff0000716db9e0  fp = 0xffff0000716dba50

zio_done() at zio_execute+0x94
	 pc = 0xffff000001793b70  lr = 0xffff00000178f854
	 sp = 0xffff0000716dba60  fp = 0xffff0000716dbaa0

zio_execute() at taskqueue_run_locked+0x14c
	 pc = 0xffff00000178f854  lr = 0xffff000000464cec
	 sp = 0xffff0000716dbab0  fp = 0xffff0000716dbb20

taskqueue_run_locked() at taskqueue_thread_loop+0xa8
	 pc = 0xffff000000464cec  lr = 0xffff0000004661e8
	 sp = 0xffff0000716dbb30  fp = 0xffff0000716dbb50

taskqueue_thread_loop() at fork_exit+0x90
	 pc = 0xffff0000004661e8  lr = 0xffff0000003c6074
	 sp = 0xffff0000716dbb60  fp = 0xffff0000716dbb90

fork_exit() at fork_trampoline+0x10
	 pc = 0xffff0000003c6074  lr = 0xffff0000007601ec
	 sp = 0xffff0000716dbba0  fp = 0x0000000000000000

KDB: enter: panic
[ thread pid 0 tid 100240 ]
Stopped at      free+0x2c:      ldr     x2, [x0, #8]

Reverting this change fixes that.

alc added a comment.Aug 6 2019, 3:20 PM

Greg,

Please try D21169 in combination with this change.

Seems fine so far, thanks