Page MenuHomeFreeBSD

moea64 Performance improvements
Needs ReviewPublic

Authored by jhibbits on Sep 30 2020, 3:00 AM.

Details

Reviewers
markj
Summary
  • Cache the vm page in the PVO instead of looking it up on every action.
  • Search pmap's ESID tree lockless

VSIDs are only created, never destroyed, during a process's lifetime.
Rather than hold the PMAP lock for the entire duration of lookup and
create-if-needed, only take the lock if the first lookup fails, and
release as necessary.

Cache the vm page in the PVO instead of looking it up on every action

Usually we have the page already on PVO insert, so keep that around to
avoid calling PHYS_TO_VM_PAGE(), which could be costly.

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 33895
Build 31098: arc lint + arc unit

Event Timeline

jhibbits created this revision.

When I did some very basic lock profiling I saw that there is a lot of contention on the PVO zone lock. Basically, UMA's per-CPU caches are not large enough to contain a working set even at their maximum size. This makes some sense, since for short-lived processes the pmap is going to allocate a whole bunch of PVOs as the process is starting up and faulting, and when it exits it will free all of them at once.

As an experiment you could try increasing the maximum cache size by adding entries to bucket_zones[].

sys/powerpc/aim/mmu_oea64.c
118

How many vm_phys segments do you have that this is a significant optimization? sysctl vm.phys_segs will show you.

sys/powerpc/aim/slb.c
302

BTW, this function acquires a global mutex but it looks like its return value is discarded if the ESID has an entry in the tree.

317

I'm probably missing something but this doesn't really match the review description since we hold the pmap lock during the initial lookup.

Both callers of allocate_user_vsid() already try to look up the ESID first.

This is mostly a series of experiments to see what can reduce the locking contention on the 'page pv' lock (PV_LOCK()/PV_PAGE_LOCK()) since those seem to be pretty heavily contended, as measured with lockstat(1).

sys/powerpc/aim/mmu_oea64.c
118

Mine has 9 segments (0-9). I'm not seeing any significant improvement yet. It looked like low hanging fruit, since if it has to search the segment list every time while holding a lock, that's doing work and blocking other users.

sys/powerpc/aim/slb.c
317

You're right. I should have returned the new ESID to the pool.

The purpose of this is to first look up the ESID lockless, since VSIDs are never destroyed within a pmap once created. If that fails, generate a new VSID, then see if it lost a race on adding a new one, while locked now. Since the vast majority will be already present, there's no point in holding the lock while searching (we already know the pmap's not going away by the nature of its callers, we only need to protect concurrent additions to the ESID/VSID tree).