Page MenuHomeFreeBSD

powerpc64: Implement Radix MMU for POWER9 CPUs

Authored by jhibbits on Mar 9 2019, 3:45 AM.



POWER9 supports two MMU formats: traditional hashed page tables, and Radix
page tables, similar to what's presesnt on most other architectures. The
PowerISA also specifies a process table -- a table of page table pointers--
which on the POWER9 is only available with the Radix MMU, so we can take
advantage of it with the Radix MMU driver.

Written by Matt Macy.

There might be more room for diff reduction, too. The goal of this is to get
input on overall design approach.

Test Plan

Boot with radix MMU enabled on POWER9, and buildworld as a basic
stress test. Boot with radix disabled on POWER9, and buildworld. Boot on
non-radix CPU and buildworld.

Diff Detail

rS FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

FWIW ZFS works fine on the POWER9 branch. However, there are a few things needed to make it play nice with HPT. Have you tracked them down?

Still looking into it, it's probably something in vmparam.h that needs to be switchable. The problem seen is init crashing with a bus error. I haven't debugged that further yet.

I'm really puzzled why ZFS isn't working, I'd expect some errors, but just get silence. No errors, no warnings, nothing. With this diff reverted and using HPT ZFS works just fine.

1109 ↗(On Diff #54866)

x86 pmaps' page_is_mapped() return false for unmanaged pages.

562 ↗(On Diff #54866)

These should go into cpufunc.h and provide functional definitions ?

593 ↗(On Diff #54866)

Why radix needs DI ? Matt told me that ISAv3.0 has broadcast TLB invalidation instructions.

646 ↗(On Diff #54866)

Style: space around binary op, there and below.

698 ↗(On Diff #54866)

Indent looks wrong.

719 ↗(On Diff #54866)

Why this invalidation ? For whatever reasons it done (probably to emulate IA32 behaviour of flushing TLB on fault),, would it be enough to flush only the faulting address ?

746 ↗(On Diff #54866)

Don't you need to set referenced bit for write faults as well, if not set yet ?

766 ↗(On Diff #54866)

If startpte == newpte, does it still make sense to store newpte into *pte, with the follow-up ptesync ?

Also, note that you do not flush TLB for this case.

2945 ↗(On Diff #54866)


4789 ↗(On Diff #54866)

Does TLB tolerate simultaneous l3 and l4 entries for the same address ?

127 ↗(On Diff #54866)

Why ppc needs so large kernel stacks ? Does it cause KVA fragmentation issues ?

203 ↗(On Diff #54866)

The comment should be moved to #else part of the ifdef below, I think.

793 ↗(On Diff #54866)

This chunk should be committed separately.

Hi @kib, thanks for your review. There's still a bit of work to do, but this definitely helps.

1109 ↗(On Diff #54866)

This is literally just moved from a #define into a function here, no other change. If it needs fixed, that'll be separate.

562 ↗(On Diff #54866)

We do already have powerpc_sync() and eieio(), but not tlbsync(). On top of that, these macros aren't even used anywhere in the code.

719 ↗(On Diff #54866)

Yeah, I think it should be sufficient to just flush the faulting address.

127 ↗(On Diff #54866)

No idea. I asked about this, and apparently recursion is much more likely with radix tables. I bumped it from 4 to 8 ~5 years ago because ZFS is a stack pig. Didn't think radix tables would also be a stack pig. @mmacy, any input on this?

793 ↗(On Diff #54866)

The code that makes this necessary should really be fixed instead.

jhibbits marked an inline comment as done.

Fix HPT. ZFS has been confirmed working, when loaded as a module.

793 ↗(On Diff #54957)

This is obviously unnecessary, and will be reverted. A case of over-reverting the previous change.

88 ↗(On Diff #54957)

I think pmap_private.h is missing from the diff.

Add pmap_private.h to the diff, fix more style issues in mmu_radix.c.

Update diff track latest head changes.

So why is this not yet committed ? What input do you wait for ?

593 ↗(On Diff #54866)

This is still unanswered.

In D19516#545217, @kib wrote:

So why is this not yet committed ? What input do you wait for ?

It's not committed yet because I felt it was too unstable. It still is too unstable for normal use, but as you can see, it kinda bitrotted over the last year, so it's taken a bit of work to update it. I think the current problem is TLB related, given the panics and errors we've seen (me, @bdragon, @pkubaj).

593 ↗(On Diff #54866)

I'm not sure why. I know ISAv3.0 has broadcast TLB invalidation, which even dates back to the beginning. However, it may impact execution on other processors. Radix mode does have support for invalidating ranges, which we may want to use here instead. I'll need to look at this bit more.

94 ↗(On Diff #71573)

Probably should have a leading 0, and should probably be vaugely sorted.

135 ↗(On Diff #71573)

SRR1_ISI_PP line is duplicated now that D23731 is in.

136 ↗(On Diff #71573)

These should probably have a leading 0 for uniformity with the other sprs.

This revision was not accepted when it landed; it landed in state Needs Review.May 11 2020, 2:33 AM
This revision was automatically updated to reflect the committed changes.