Page MenuHomeFreeBSD

powerpc64: Implement Radix MMU for POWER9 CPUs
Needs ReviewPublic

Authored by jhibbits on Mar 9 2019, 3:45 AM.



POWER9 supports two MMU formats: traditional hashed page tables, and Radix
page tables, similar to what's presesnt on most other architectures. The
PowerISA also specifies a process table -- a table of page table pointers--
which on the POWER9 is only available with the Radix MMU, so we can take
advantage of it with the Radix MMU driver.

Written by Matt Macy.

There might be more room for diff reduction, too. The goal of this is to get
input on overall design approach.

Test Plan

Boot with radix MMU enabled on POWER9, and buildworld as a basic
stress test. Boot with radix disabled on POWER9, and buildworld. Boot on
non-radix CPU and buildworld.

Diff Detail

Lint Skipped
Unit Tests Skipped

Event Timeline

jhibbits created this revision.Mar 9 2019, 3:45 AM
mmacy added a comment.Mar 9 2019, 4:47 AM

FWIW ZFS works fine on the POWER9 branch. However, there are a few things needed to make it play nice with HPT. Have you tracked them down?

Still looking into it, it's probably something in vmparam.h that needs to be switchable. The problem seen is init crashing with a bus error. I haven't debugged that further yet.

I'm really puzzled why ZFS isn't working, I'd expect some errors, but just get silence. No errors, no warnings, nothing. With this diff reverted and using HPT ZFS works just fine.

kib added inline comments.Mar 9 2019, 11:30 AM

x86 pmaps' page_is_mapped() return false for unmanaged pages.


These should go into cpufunc.h and provide functional definitions ?


Why radix needs DI ? Matt told me that ISAv3.0 has broadcast TLB invalidation instructions.


Style: space around binary op, there and below.


Indent looks wrong.


Why this invalidation ? For whatever reasons it done (probably to emulate IA32 behaviour of flushing TLB on fault),, would it be enough to flush only the faulting address ?


Don't you need to set referenced bit for write faults as well, if not set yet ?


If startpte == newpte, does it still make sense to store newpte into *pte, with the follow-up ptesync ?

Also, note that you do not flush TLB for this case.




Does TLB tolerate simultaneous l3 and l4 entries for the same address ?


Why ppc needs so large kernel stacks ? Does it cause KVA fragmentation issues ?


The comment should be moved to #else part of the ifdef below, I think.

793 ↗(On Diff #54866)

This chunk should be committed separately.

kib added a reviewer: markj.Mar 9 2019, 11:31 AM
jhibbits marked 4 inline comments as done.Mar 9 2019, 7:48 PM

Hi @kib, thanks for your review. There's still a bit of work to do, but this definitely helps.


This is literally just moved from a #define into a function here, no other change. If it needs fixed, that'll be separate.


We do already have powerpc_sync() and eieio(), but not tlbsync(). On top of that, these macros aren't even used anywhere in the code.


Yeah, I think it should be sufficient to just flush the faulting address.


No idea. I asked about this, and apparently recursion is much more likely with radix tables. I bumped it from 4 to 8 ~5 years ago because ZFS is a stack pig. Didn't think radix tables would also be a stack pig. @mmacy, any input on this?

793 ↗(On Diff #54866)

The code that makes this necessary should really be fixed instead.

jhibbits edited the summary of this revision. (Show Details)Mar 12 2019, 3:40 AM
jhibbits updated this revision to Diff 54957.Mar 12 2019, 3:43 AM
jhibbits marked an inline comment as done.

Fix HPT. ZFS has been confirmed working, when loaded as a module.

jhibbits added inline comments.Mar 12 2019, 3:46 AM
793 ↗(On Diff #54957)

This is obviously unnecessary, and will be reverted. A case of over-reverting the previous change.

bdragon added inline comments.Mar 12 2019, 10:57 PM

I think pmap_private.h is missing from the diff.

jhibbits updated this revision to Diff 55004.Mar 13 2019, 12:42 AM

Add pmap_private.h to the diff, fix more style issues in mmu_radix.c.