Page MenuHomeFreeBSD

powerpc/pmap: NUMA-ize vm_page_array on powerpc
ClosedPublic

Authored by jhibbits on Aug 28 2019, 3:34 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Apr 20, 11:03 AM
Unknown Object (File)
Jan 27 2024, 11:45 AM
Unknown Object (File)
Jan 27 2024, 11:45 AM
Unknown Object (File)
Jan 27 2024, 11:44 AM
Unknown Object (File)
Jan 27 2024, 11:44 AM
Unknown Object (File)
Jan 27 2024, 11:32 AM
Unknown Object (File)
Dec 20 2023, 4:25 AM
Unknown Object (File)
Oct 14 2023, 6:13 PM
Subscribers

Details

Summary

This matches r351198 from amd64. This only applies to AIM64 and Book-E.
On AIM64 it short-circuits with one domain, to behave similar to
existing. Otherwise it will allocate 16MB huge pages to hold the page
array, across all NUMA domains. On the first domain it will shift the
page array base up, to "upper-align" the page array in that domain, so
as to reduce the number of pages from the next domain appearing in this
domain. After the first domain, subsequent domains will be allocated in
full 16MB pages, until the final domain, which can be short. This means
some inner domains may have pages accounted in earlier domains.

On Book-E the page array is setup at MMU bootstrap time so that it's
always mapped in TLB1, on both 32-bit and 64-bit. This reduces the TLB0
overhead for touching the vm_page_array, which reduces up to one TLB
miss per array access.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

sys/powerpc/booke/pmap.c
130 ↗(On Diff #61387)

Is it intentional to leave this in?

This change appears to give me a 15-20% improvement with building llvm on my Talos (144 thread POWER9). No change at all with buildworld.

sys/powerpc/booke/pmap.c
130 ↗(On Diff #61387)

Nope. Thought I cleaned that out. Guess not.

Please take a look at D21491, which moves the page array into KVA on amd64. It looks like you're doing that already, but I don't see what's preventing the vm_page address range from being allocated from the kernel map's vmem arena(s).

On Book-E it's already allocated out of KVA, as part of the bootstrap carving. Looks like on AIM I could just adjust virtual_avail as you mention in your comment in D21491, there would be up to 32MB (less 2 pages) slop, (15MB less 1 page on either side) which is probably fine on NUMA machines. But we do only have 32GB KVA available currently, and if the vm_page_array is ~3% of total memory, on a 256GB machine that's ~8GB, which is pretty big. I think I'd rather hold off on it until we either increase KVA or get actual working dump support.

Please take a look at D21491, which moves the page array into KVA on amd64. It looks like you're doing that already, but I don't see what's preventing the vm_page address range from being allocated from the kernel map's vmem arena(s).

I don't see a benefit in burning KVA on this on AIM64 (already being done on Book-E, which is much more limited in physmem). @luporl has the minidump scanner dumping all mapped memory, which would include this block anyway, so we free up the precious KVA for other stuff.

Please take a look at D21491, which moves the page array into KVA on amd64. It looks like you're doing that already, but I don't see what's preventing the vm_page address range from being allocated from the kernel map's vmem arena(s).

I don't see a benefit in burning KVA on this on AIM64 (already being done on Book-E, which is much more limited in physmem). @luporl has the minidump scanner dumping all mapped memory, which would include this block anyway, so we free up the precious KVA for other stuff.

Ok. I am not familiar with the power kernel's memory layout so I'm not sure what other problems might come up. You might for example verify whether kernacc() DTRT for addresses within the page array.

sys/powerpc/include/vmparam.h
254 ↗(On Diff #61387)

s/KRENEL/KERNEL/

Overall the changes look ok to me.

I agree with @jhibbits, that, at least while we're limited to 32GB KVA, it is better to leave the page array in a separate address range.

sys/powerpc/aim/mmu_oea64.c
665 ↗(On Diff #61387)

It would be better to use "large page" or "huge page" in the panic message.

690 ↗(On Diff #61387)

pte_lo may be used uninitialized here, at first loop iteration.

694 ↗(On Diff #61387)

pte_lo may be used uninitialized here, at first loop iteration.

696 ↗(On Diff #61387)

pte_lo will be used uninitialized here, at first loop iteration.

697 ↗(On Diff #61387)

I guess this should be at the beginning of the loop body, shouldn't it?

jhibbits added inline comments.
sys/powerpc/aim/mmu_oea64.c
697 ↗(On Diff #61387)

Yes. Don't know how/why I moved it.

jhibbits marked an inline comment as done.

Address comments. Update diff.

This revision is now accepted and ready to land.Dec 2 2019, 12:51 PM
This revision was automatically updated to reflect the committed changes.