Page MenuHomeFreeBSD

amd64 pmap: reorder IPI send and local TLB flush in TLB invalidations.
ClosedPublic

Authored by kib on Jun 8 2020, 1:37 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 19, 11:26 AM
Unknown Object (File)
Tue, Nov 5, 5:22 AM
Unknown Object (File)
Mon, Oct 28, 10:09 AM
Unknown Object (File)
Sun, Oct 27, 4:54 PM
Unknown Object (File)
Sun, Oct 27, 2:07 PM
Unknown Object (File)
Oct 16 2024, 7:41 PM
Unknown Object (File)
Sep 27 2024, 7:10 AM
Unknown Object (File)
Sep 26 2024, 2:19 PM

Details

Summary

Right now code first flushes all local TLB entries that needs to be flushed, then signals IPI to remote cores, and then waits for acknowledgements while spinning idle. In the VMWare article 'Don’t shoot down TLB shootdowns!' it was noted that the time spent spinning is lost, and can be more usefully used doing local TLB invalidation.

We could use the same invalidation handler for local TLB as for remote, but typically for pmap == curpmap we can use INVLPG for locals instead of INVPCID on remotes, since we cannot control context switches on them. Due to that, keep the local code and provide the callbacks to be called from smp_targeted_tlb_shootdown() after IPIs are fired but before spin wait starts.

Tested by: pho

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib requested review of this revision.Jun 8 2020, 1:37 PM
cem added inline comments.
sys/x86/x86/mp_x86.c
1743–1745 ↗(On Diff #72825)

Bike-shed: The bool return feels like a weird abstraction to me. We seem to be avoiding uglying the non-IPI case of smp_targeted_tlb_shootdown_pinned with curcpu_cb().

You could replace the return (false)s with goto noipi_out; and conclude the function something like:

...
    mtx_unlock_spin(&smp_ipi_mtx);
    return;

noipi_out:
    if (curcpu_cb != NULL)
        curcpu_cb(pmap, addr1, addr2);
}

That has two return paths, but isn't unbearably ugly; IMO it is less ugly than the new construction where we are obliged to call the callback IFF the the return value was false.

Functionally, the change is fine as-is.

This revision is now accepted and ready to land.Jun 8 2020, 2:21 PM
markj added inline comments.
sys/x86/x86/mp_x86.c
1797 ↗(On Diff #72825)

Why not perform wbinvd() the same way? Then you can eliminate a couple of conditional branches on curcpu_cb != NULL, I believe.

Looks reasonable to me. Good idea.

alc added inline comments.
sys/x86/x86/mp_x86.c
1743–1745 ↗(On Diff #72825)

I agree with cem.

kib edited the summary of this revision. (Show Details)

Review feedback:

  • handle cache invalidation.
  • drop wrapper, use goto.

Handle i386 with dummy curcpu_cb.

This revision now requires review to proceed.Jun 9 2020, 9:40 AM
This revision is now accepted and ready to land.Jun 9 2020, 1:39 PM
cem added inline comments.
sys/i386/i386/pmap.c
1313–1318 ↗(On Diff #72851)

Could this one be the same as amd64 (wbinvd callback), rather than using the dummy?

kib marked 4 inline comments as done.

Handle i386 invalidate_cache() slightly more optimal.

This revision now requires review to proceed.Jun 9 2020, 5:02 PM
This revision is now accepted and ready to land.Jun 9 2020, 5:13 PM
sys/x86/x86/mp_x86.c
1679–1681 ↗(On Diff #72871)

One thing about this code that might surprise or confuse newcomers who try to understand it or use it is that the callback function is called unconditionally on the caller's underlying processor, surprisingly even when the caller's underlying processor is not set in the mask. So, the callback function must be prepared to handle such spurious invocations. I think that there should be a comment here at the head of this function explaining this.

kib edited the summary of this revision. (Show Details)

Add herald comment for smp_targeted_tlb_shootdown().
i386 build fixes.

This revision now requires review to proceed.Jun 9 2020, 10:13 PM
sys/x86/x86/mp_x86.c
1682 ↗(On Diff #72895)

"which are to be signalled"

1684 ↗(On Diff #72895)

I would simplify a bit and write, "The curcpu_cb callback is invoked on the calling CPU while waiting for remote CPUs to complete the operation."

sys/x86/x86/mp_x86.c
1688 ↗(On Diff #72895)

You might replace the second "caller's underlying processor" with "this processor"

kib marked 3 inline comments as done.

Reword the comment.

This revision is now accepted and ready to land.Jun 10 2020, 8:59 PM