Page MenuHomeFreeBSD

amd64: use INVLPGB for kernel pmap invalidations
AcceptedPublic

Authored by kib on Mon, May 13, 11:34 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, May 21, 8:06 PM
Unknown Object (File)
Tue, May 21, 2:36 PM
Unknown Object (File)
Sun, May 19, 4:10 PM
Unknown Object (File)
Sun, May 19, 4:09 PM
Unknown Object (File)
Sun, May 19, 2:49 PM
Unknown Object (File)
Thu, May 16, 5:10 PM
Unknown Object (File)
Tue, May 14, 11:24 AM
Unknown Object (File)
Tue, May 14, 10:11 AM
Subscribers

Details

Reviewers
emaste
alc
markj
Summary

Newest AMD CPUs have the new broadcast invpg instruction INVLPGB. It performs invalidation locally and broadcasts the request for same invalidation to all logical CPUs. I tried to apply it to our pmap, but it is not convenient due to our use of the sliding PCID algorithm.

But I do think that one use of the instruction could be very profitable, esp. on large machines. We can do kernel pmap invalidations without IPIs. These invalidations always use PCID 0, and are always totally broadcast. This seems to be an ideal application.

I suspect that intense kmalloc/malloc(9) loads like ZFS might get significant speedup due to elimination of IPIs.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.Mon, May 13, 11:34 PM

Not tested at all, machine is not yet available. I want to discuss this before debugging.

sys/amd64/amd64/mp_machdep.c
735

I don't understand the subtraction here.

740

Should the second parameter logically be PMAP_PCID_KERN?

sys/amd64/include/cpufunc.h
544

The parameter names should reflect how they are actually used.

kib marked 2 inline comments as done.Tue, May 14, 11:27 PM
kib added inline comments.
sys/amd64/amd64/mp_machdep.c
740

We request the global invalidation, and do not request PCID-specific. So it should not be PMAP_PCID_KERN even formally, I believe.

sys/amd64/include/cpufunc.h
544

The doc calls both rax and edx descriptors, unfortunately. And ecx is not a counter as such.

If you have suggestions I will apply it, for me the direct register names are easiest to match with the documentation.

kib marked an inline comment as done.

Fix cnt calculation for 4k step in invlrng.
Flip default for invlpgb_works to 1.

Correct cnt, one more time.

On amd64, translation invalidation routines don't specify whether intermediate TLB entries need to be invalidated, but on arm64 they do. Now that we have INVLPGB_FIN, I wonder if (in a separate patch certainly) it would be worth augmenting amd64's pmap_invalidate_page() etc. to specify whether intermediate entries need to be invalidated.

sys/amd64/amd64/mp_machdep.c
736

How can cnt == 0 be true? va is not required to be page-aligned?

739

This is the maximum count of additional pages; since we subtract 1 below, from my reading of AMD's spec we can technically use cnt = invlpgb_maxcnt + 1 here.

sys/amd64/amd64/pmap.c
555
sys/amd64/include/cpufunc.h
544

I see, maybe va, desc, and count. But yes, the reader will have to look at invlpgb documentation to fully understand what is going on.

kib marked 3 inline comments as done.Fri, May 17, 4:27 PM
kib added inline comments.
sys/amd64/amd64/mp_machdep.c
736

That was left from the previous invalid formula.

739

I initialized maxcnt by the cpuid val + 1.

sys/amd64/amd64/pmap.c
555

Note the above two sysctl descr line styles.

kib marked 3 inline comments as done.

+1 for maxcnt
remove cnt == 0 check

sys/amd64/amd64/mp_machdep.c
745

Although it makes no functional difference, I would contend that NPTEPG is the correct constant here, as it represents the number of PTEs in a page table page referenced by a 2MB PDE.

751

I think multiplication by the same constant that we divided by above makes this easier to understand. I suspect that the compiler will implement this as a shift anyway.

kib marked 2 inline comments as done.

Use NPTEPG

markj added inline comments.
sys/amd64/amd64/mp_machdep.c
739

IMO this is a bit confusing. It would be cleaner to keep the global invlpgb_maxcnt equal to the "official" value, and handle the bias locally.

This revision is now accepted and ready to land.Mon, May 20, 2:44 PM
kib marked an inline comment as done.

Use raw maxcnt value

This revision now requires review to proceed.Mon, May 20, 4:54 PM
This revision is now accepted and ready to land.Mon, May 20, 4:55 PM