Page MenuHomeFreeBSD

amd64 pmap: fine-grained pv list locking
Needs ReviewPublic

Authored by kib on Mar 29 2020, 3:36 PM.

Details

Reviewers
alc
markj
jeff
Summary

The patch adds individual blockable lock for each small page' pv list. OBM (one byte mutex). This way, there is no aliasing and no false blocking on other page pv list. For lock, I only need single byte in md_page. The turnstiles used to block are still shared per superpage/hash function as it was in the stock kernel, i.e. blocked thread gets wakeups from the siblings. This seems to occur rare (I do not have numbers).

For superpages promotion and demotion, I lock the whole run of small pages which constitutes the superpage. This was the largest hesitation, but I convinced myself that the code touches 512 pv entries anyway, so touching 512 bytes from vm_pages that are hot anyway is fine. For obscure semantic of pde pv lock, see the herald comment above pmap_pv_list_lock_pde().

Unfortunate is that I cannot easily implement shared mode for OBM, or at least it would need a 'saturated' shared mode when there are more active readers than the space to count them. It is unfortunate because I think with fine-grained locking there is actually the opportunity to really see shared mode utilized for good.

OBM would also benefit from the lock debugging and profiling support.

This is extracted from my WIP branch for pv lists handling rework. Apparently make UMA to outperform highly-optimized specialized pv chunk allocator is hard, and arguably not needed.

Next, I eliminate the global (per-domain) pv chunk list lock. For this, a note is that we can get away with the global pmap list and rotate both pmap list and chunks list in the pmap to approach pc_lru ordering. My reasonable belief is that strict LRU does not matter if we are in situation where chunks reclamation is needed.

I do not have large machine, on 16 CPU / 32 threads box, with buildkernel -s -j 40 over tmpfs objdir, I got

stock
      237.53 real      6782.41 user       567.74 sys
      238.08 real      6769.18 user       569.86 sys
      237.39 real      6783.97 user       570.91 sys

patched
      227.86 real      6730.43 user       307.43 sys
      226.40 real      6737.99 user       304.76 sys
      227.33 real      6733.44 user       305.36 sys

(look for sys time).

On pig1 with NUMA enabled, the results are not that dramatic. This needs to be checked on large box.

stock
      116.11 real      7901.83 user       446.20 sys
      116.03 real      7945.72 user       447.52 sys
      117.55 real      7902.05 user       450.55 sys
pig1 (only obm):
      114.19 real      7879.51 user       438.05 sys
      116.39 real      7882.03 user       436.02 sys
pig1 (obm + removal of pvc locks):
      115.82 real      7911.22 user       429.60 sys
      116.64 real      7898.30 user       433.64 sys
      115.68 real      7918.47 user       430.82 sys

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 30687

Event Timeline

kib created this revision.Mar 29 2020, 3:36 PM
kib edited the summary of this revision. (Show Details)Mar 29 2020, 3:37 PM
kib added reviewers: alc, markj, jeff.
kib edited the summary of this revision. (Show Details)
kib added a subscriber: mjg.
kib edited the summary of this revision. (Show Details)Mar 29 2020, 3:41 PM
markj added inline comments.Mar 30 2020, 12:47 PM
sys/amd64/amd64/pmap.c
450

Duplicate field?

sys/kern/kern_obm.c
76

Do you plan to add WITNESS integration?

86

Presumably this is just for testing?

sys/sys/obm.h
51

Did you consider making a structure to wrap the uint8_t lock state, instead of passing a naked uint8_t around?

kib marked an inline comment as done.Mar 30 2020, 3:39 PM
kib added inline comments.
sys/amd64/amd64/pmap.c
450

Indeed, double-patched while moved the patch around and then did not tested NUMA.

sys/kern/kern_obm.c
76

I do not have strong plans now. The pv list locks are leaf, so I think they could go away with much simpler checks.

But if the patch is going to be productuzed, then sure.

86

It probably should go under lock profile option, or something similar. BTW, right now after around dozen buildkernels on the machine, I see

debug.obm.slow_unlock: 22920
debug.obm.slow_lock: 111444
kib updated this revision to Diff 70002.Mar 30 2020, 3:41 PM
kib marked an inline comment as done.

Remove extra member.

markj added inline comments.Mar 30 2020, 4:24 PM
sys/amd64/amd64/pmap.c
509

I don't see how this assertion is true when pmap_pv_list_pde() is called with lockp == NULL.

kib added inline comments.Mar 30 2020, 5:45 PM
sys/amd64/amd64/pmap.c
509

Yes, practically we always demote or promote with the pv list lock held. Also I think it is safe to assume that PHYS_TO_PAGE(pa) in pmap_pv_list_lock_pde() never returns NULL, since we can be only called for managed pages.

kib updated this revision to Diff 70009.Mar 30 2020, 5:47 PM

Fix/improve assertions for pde pv locks.

markj added inline comments.Mar 30 2020, 9:21 PM
sys/amd64/amd64/pmap.c
539

The return value appears to be unused.

7616

Now, pmap_demote_pde_locked() may release the lock if pmap_pv_list_lock_pde1() fails to lock the first page in the superpage (i.e., the fast attempt fails). Don't we need to restart the loop in that case?

BTW I think there is a rare existing bug here, since pmap_demote_pde_locked() may also release the lock when reclaiming PV entries. In particular, I think the assertion below can fail.

kib updated this revision to Diff 70037.Mar 30 2020, 10:57 PM
kib marked 2 inline comments as done.

Remove return value from pmap_pv_list_lock_pde().
Handle possible pv list lock change after demote_locked().

markj added inline comments.Mar 31 2020, 2:36 PM
sys/amd64/amd64/pmap.c
7811

I think we need to retry in this case too.

kib updated this revision to Diff 70060.Mar 31 2020, 3:16 PM
kib marked an inline comment as done.

Retry in pmap_ts_referenced() in case demotion unlocked pv lock.

markj added inline comments.Mar 31 2020, 4:25 PM
sys/amd64/amd64/pmap.c
7593

Wouldn't it be simpler to lock the base page of the superpage when removing large mappings, here and in pmap_remove_all()? Then switch to the page's PV list lock for small mappings below.

kib added inline comments.Mar 31 2020, 4:42 PM
sys/amd64/amd64/pmap.c
7593

I am not sure, due to 'then switch to the small mappings lock''. Having superpage mapping is quite rare, actually. As result we typically would need to unlock and relock quite often.

kib updated this revision to Diff 70133.Apr 3 2020, 12:06 AM

Add type for obm locks.
Inline fast path for lock and unlock.
Add TD_LOCKS_INC.
Make global counters conditional under OBM_DEBUG.

kib updated this revision to Diff 70200.Apr 4 2020, 5:54 PM

Eliminate pv chunk locks.
This is not properly tested, but the patch is stable enough to allow for buildkernel benchmarks.

kib edited the summary of this revision. (Show Details)Apr 4 2020, 5:55 PM
markj added inline comments.Apr 6 2020, 3:53 PM
sys/amd64/amd64/pmap.c
3784

pmap_release() also needs to be modified to remove the queue entry.

kib added inline comments.Apr 6 2020, 4:16 PM
sys/amd64/amd64/pmap.c
3784

I left the released pmap on the list on purpose. I want to be able to walk the list in reclaim function without adding a marker for pmap, because pmap placeholder would be rather large and I do not want allocate it when low on chunks (it cannot be on stack due to size).

This is why I check for the pmap resident page count after obtaining pmap lock, to quickly skip such pmaps.

markj added inline comments.Apr 6 2020, 4:18 PM
sys/amd64/amd64/pmap.c
3784

But then we cannot unconditionally enqueue here, since the pmap may already reside on the list.

kib updated this revision to Diff 70263.Apr 6 2020, 5:20 PM
kib marked an inline comment as done.

Insert the new pmap into the global list in vmspace constructor.

kib edited the summary of this revision. (Show Details)Apr 22 2020, 4:43 AM
markj added inline comments.Apr 23 2020, 2:50 PM
sys/amd64/amd64/pmap.c
327

Can't it be static?

3903

I still do not understand how this can work. pmap_pinit() is called every time a vmspace is allocated. The insertion has to be done in a vmspace zone init method instead.

kib marked 2 inline comments as done.Apr 23 2020, 5:11 PM
kib added inline comments.
sys/amd64/amd64/pmap.c
3903

Oops, this is slightly out-of-date version of the patch. Look for pmap_lock_init(), but I uploaded the patch where I still did not removed this chunk

kib updated this revision to Diff 70915.Apr 23 2020, 5:23 PM
kib marked an inline comment as done.

Remove forgotten chunk.
Staticize pmap list.