Details

Reviewers

alc
markj
jhb
royger

Commits

rS363195: amd64: allow parallel shootdown IPIs

Summary

Stop using smp_ipi_mtx to protect global shootdown state, and move/multiply the global state into pcpu. Now each CPU can initiate shootdown IPI independently from other CPUs. Initiator enters critical section, then fills its local PCPU shootdown info (pc_smp_tlb_XXX), then clears scoreboard generation at location (cpu, my_cpuid) for each target cpu. After that IPI is sent to all targets which scan for zeroed scoreboard generation words. Upon finding such word the shootdown data is read from corresponding cpu' pcpu, and generation is set. Meantime initiator loops waiting for all zeroed generations in scoreboard to update.

Initiator does not disable interrupts, which should allow non-invalidation IPIs from deadlockin, it only needs to disable preemption to pin itself to the instance of the pcpu smp_tlb data.

Handlers loop until they do not see zeroed scoreboard generations. This, together with hardware keeping one pending IPI in LAPIC IRR should prevent lost shootdowns.

Notes.

The generation is set before the actual invalidation is performed in handler. It is safe because target CPU cannot return to userspace before handler finishes. In principle only NMI can preempt the handler, but NMI would see the kernel handler frame and not touch not-invalidated user page table.

The code does touch LAPIC ICR without exclusion. I believe this is fine because we in fact do not send IPIs from interrupt handlers. More for !x2APIC mode where ICR access for write requires two registers write, we disable interrupts around it. If considered incorrect, I can add per-cpu spinlock around ipi_send().

Scoreboard lines owned by given target CPU can be padded to the cache line, to reduce ping-pong. This is not done in the prototype.

Tested by: pho

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib created this revision.Jun 29 2020, 9:51 PM

Herald added subscribers: rpokala, imp. · View Herald TranscriptJun 29 2020, 9:51 PM

kib requested review of this revision.Jun 29 2020, 9:51 PM

Harbormaster completed remote builds in B32046: Diff 73886.Jun 29 2020, 9:51 PM

kib edited the summary of this revision. (Show Details)Jun 29 2020, 10:01 PM

kib edited the summary of this revision. (Show Details)Jun 29 2020, 10:04 PM

kib edited the summary of this revision. (Show Details)

emaste added a subscriber: emaste.Jun 30 2020, 2:35 PM

markj accepted this revision.Jul 6 2020, 6:10 PM

markj added inline comments.

sys/amd64/amd64/mp_machdep.c
649 ↗	(On Diff #73886)	Can we assert that the old value is non-zero?
1017 ↗	(On Diff #73886)	__assert_unreachable()? You have an anonymous enum for the op type. The type could be named and used for `smp_tlb_op` in this function, so you'll get a warning if an op type is missed.
1039 ↗	(On Diff #73886)	Aren't "orig" and "req" referring to the same thing, the initiating CPU? The terminology should be consistent. Initiator/target seems best to me but I have no strong preference.
1047 ↗	(On Diff #73886)	Why two fences?
1048 ↗	(On Diff #73886)	atomic_store_int()

This revision is now accepted and ready to land.Jul 6 2020, 6:10 PM

kib marked 5 inline comments as done.Jul 6 2020, 8:27 PM

kib added inline comments.

sys/amd64/amd64/mp_machdep.c
1047 ↗	(On Diff #73886)	I added a comment explaining. It is formal fence, I need to prevent the write to scoreboard to become visible until we read all pc_smp_tlb values. It is formal (or rather, a compiler barrier) because x86 does not allow such reordering.

'initiator'
use enum in switch
add comments noting why interrupts must be enabled in shootdown initiator, what for are both fences in shootdown handler, and some other minor details.

This revision now requires review to proceed.Jul 6 2020, 8:28 PM

Harbormaster completed remote builds in B32169: Diff 74137.Jul 6 2020, 8:28 PM

Fix i386 and XEN/amd64 builds.

Harbormaster completed remote builds in B32180: Diff 74155.Jul 7 2020, 4:14 PM

kib added a reviewer: royger.Jul 7 2020, 4:16 PM

markj accepted this revision.Jul 8 2020, 1:58 PM

markj added inline comments.

sys/amd64/amd64/mp_machdep.c
643 ↗	(On Diff #74155)	This can be KASSERT().

This revision is now accepted and ready to land.Jul 8 2020, 1:58 PM

Use KASSERT.

BTW I initially considered only extracting internal part of smp_targeted_tlb_shootdown into mp_machdep.c<amd64,i386>, i.e. the part between critical_enter/exit and mtx_lock_spin/mtx_unlock_spin, and did that again now after changing the block to KASSERT. But in both times I convinced myself that it is better to not split the function.

This revision now requires review to proceed.Jul 8 2020, 5:44 PM

Harbormaster completed remote builds in B32204: Diff 74206.Jul 8 2020, 5:44 PM

kib mentioned this in D25483: amd64 pmap: microoptimize local shootdowns for PCID PTI configurations.Jul 10 2020, 9:35 AM

kib edited the summary of this revision. (Show Details)Jul 10 2020, 7:20 PM

kib edited the summary of this revision. (Show Details)Jul 10 2020, 7:48 PM

kib added a subscriber: pho.

alc added inline comments.Jul 11 2020, 10:43 PM

sys/amd64/amd64/apic_vector.S
175 ↗	(On Diff #74206)	"IPI handler for cache and TLB shootdowns"
sys/amd64/amd64/mp_machdep.c
206 ↗	(On Diff #74206)	"... for cache and TLB invalidations."
541–543 ↗	(On Diff #74206)	I would suggest a comment saying something along the lines of these variables are initialized at startup to reflect how each of the different kinds of invalidations should be performed on the current machine and environment.
546 ↗	(On Diff #74206)	It would be good to incorporate text from the summary that explains the scoreboard's operation.
594 ↗	(On Diff #74206)	"Used by the pmap to request cache or TLB invalidation on ..."
596 ↗	(On Diff #74206)	It's not clear what "by vector" refers to. "As an optimization, the curcpu_cb callback ..."
606 ↗	(On Diff #74206)	"which" -> "that" The use of "which" here implies that all IPIs are protected by spinlocks. In contrast, "that" says that you are talking about the subset of IPIs, which is what you intend. Here is a good rule of thumb: https://www.writersdigest.com/write-better-fiction/which-vs-that
643 ↗	(On Diff #74206)	It would be good to incorporate some of the text from the summary that explains the "synchronization" here, e.g., blocking preemption but not interrupts, avoiding deadlock.
664 ↗	(On Diff #74206)	It wouldn't hurt to remind people that the IPI itself acts as a fence between writing to the scoreboard above and reading from it below.
676 ↗	(On Diff #74206)	Shouldn't this now be spelled IPI_INVLOP on amd64?
719–720 ↗	(On Diff #74206)	I speculate that ordering the parameters to smp_targeted_tlb_shootdown() consistently with smp_masked_invlpg_range() with the "op" being last to the latter will eliminate some move instructions.
730 ↗	(On Diff #74206)	This blank line is no longer required.
840 ↗	(On Diff #74206)	smp_tlb_pmap is not used.
898 ↗	(On Diff #74206)	smp_tlb_pmap is not used.
1057 ↗	(On Diff #74206)	I was briefly confused by the use of the word "dual" here. Initially, I read it as relating to the fact that there are two acquire fences in this code. I would suggest that you instead say, "This acquire fence and its corresponding release fence in ..."
1072 ↗	(On Diff #74206)	Again, I would use "corresponding" here rather than "dual".
1073 ↗	(On Diff #74206)	"... in the IPI ..."
1074 ↗	(On Diff #74206)	"... before the APIC ..."
1078 ↗	(On Diff #74206)	It would be good to explicitly say that it is okay to ack the request before performing the invalidation.
sys/i386/i386/mp_machdep.c
482–486 ↗	(On Diff #74206)	See earlier comments under amd64.
600 ↗	(On Diff #74206)	This blank line is no longer required.
627–630 ↗	(On Diff #74206)	Dead code?
647 ↗	(On Diff #74206)	Redundant?
670 ↗	(On Diff #74206)	Redundant?

Handle Alan' notes.

Harbormaster completed remote builds in B32270: Diff 74350.Jul 12 2020, 10:32 AM

Cosmetics: initialize pc_smp_tlb_gen of BSP to 1, same as for AP. It is not too important because gen number is incremented before use.

Harbormaster completed remote builds in B32274: Diff 74356.Jul 12 2020, 2:50 PM

This revision was not accepted when it landed; it landed in state Needs Review.Jul 14 2020, 8:37 PM

Closed by commit rS363195: amd64: allow parallel shootdown IPIs (authored by kib). · Explain Why

This revision was automatically updated to reflect the committed changes.

kib added a commit: rS363195: amd64: allow parallel shootdown IPIs.

amd64: allow parallel shootdown IPIs
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 74459

head/sys/amd64/amd64/apic_vector.S

head/sys/amd64/amd64/db_interface.c

head/sys/amd64/amd64/machdep.c

head/sys/amd64/amd64/mp_machdep.c

head/sys/amd64/include/pcpu.h

head/sys/amd64/include/smp.h

head/sys/i386/i386/mp_machdep.c

head/sys/i386/include/smp.h

head/sys/x86/include/apicvar.h

head/sys/x86/include/x86_smp.h

head/sys/x86/x86/mp_x86.c

head/sys/x86/xen/xen_apic.c

amd64: allow parallel shootdown IPIsClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 74459

head/sys/amd64/amd64/apic_vector.S

head/sys/amd64/amd64/db_interface.c

head/sys/amd64/amd64/machdep.c

head/sys/amd64/amd64/mp_machdep.c

head/sys/amd64/include/pcpu.h

head/sys/amd64/include/smp.h

head/sys/i386/i386/mp_machdep.c

head/sys/i386/include/smp.h

head/sys/x86/include/apicvar.h

head/sys/x86/include/x86_smp.h

head/sys/x86/x86/mp_x86.c

head/sys/x86/xen/xen_apic.c

amd64: allow parallel shootdown IPIs
ClosedPublic
Actions

Revision Contents
Changeset List