Page MenuHomeFreeBSD

Move i386 sysmaps to MD per-cpu region
ClosedPublic

Authored by jah on Dec 18 2016, 12:52 AM.

Details

Summary

Move the objects used to create temporary mappings for pmap zero and copy operations to the MD PCPU region. Change sysmap initialization to only allocate KVA pages for cpus that are actually present. As a minor optimization, this also prevents false sharing between adjacent sysmap objects as the PCPU region is cacheline-aligned.

While here, move pc_qmap_addr initialization for BSP into pmap_bootstrap, which allows use of pmap_quick* functions during early boot.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

jah retitled this revision from to Move i386 sysmaps to MD per-cpu region.
jah updated this object.
jah edited the test plan for this revision. (Show Details)

I agree with the proposed move of CMAPs into pcpu area, but see comments about improving the implementation.

sys/i386/i386/pmap.c
526 ↗(On Diff #23041)

I suggest to split the initialization into per-cpu allocations for AP. The sys/i386/i386/mp_machdep.c:init_secondary() is the natural place to do the init, right after lidt().

529 ↗(On Diff #23041)

Why CADDR1 can be non-zero there ?

sys/i386/include/pcpu.h
72 ↗(On Diff #23041)

I do not see why do we need struct sysmaps, after the migration to pcpu. Before, it was needed to use it for array definition, now this purpose is gone. I suggest to inline the members into PCPU_MD_FIELDS.

75 ↗(On Diff #23041)

Please use explicit number for pad array size.

sys/i386/i386/pmap.c
529 ↗(On Diff #23041)

It would be non-zero for the BSP. This check won't be necessary if initialization is moved to init_secondary().

sys/i386/include/pcpu.h
72 ↗(On Diff #23041)

I'm fine with the idea of inlining, but I kept struct sysmaps around for 2 reasons:

--logical grouping of the related members in a single struct

--reduction in number of calls to PCPU_GET(), which looks like it's a bit more expensive than a simple struct dereference (e.g. it does some size checks and re-reads %fs on i386, it requires a cp15 access on arm). The sysmaps object can be obtained with a single PCPU_GET(), while use of inlined fields would require 3-5 such calls for a typical pmap copy operation.

I could still get the same reduction in PCPU_GET() calls using pcpu_find(cpuid) as is done in the rmlock implementation, but that seems more convoluted.

75 ↗(On Diff #23041)

The only reason I didn't leave it an explicit number is that this change adds an opaque element (struct mtx) to the pcpu data. Is there any concern about changes to struct mtx breaking the compile-time assertion on sizeof(pcpu) being the denominator of PAGE_SIZE?

kib edited edge metadata.

As I said, I agree with the overall direction, and my notes are stylistic or minor. Decide yourself which parts of the suggestions you disagree less.

sys/i386/include/pcpu.h
72 ↗(On Diff #23041)

This is i386, so arm cp15 is not relevant there. AFAIK, %fs/%gs prefixes provide 1 cycle penalty in prefetcher at worst.

Stylistically, we inline members of pcpu, this is why it strike the eye. I do not request the inlining strongly, but it feels somewhat unusual to me.

75 ↗(On Diff #23041)

Struct mtx currently should have constant size for any kernel config options. That keeps the KBI invariant under config changes, but also bloats struct mtx for obvious reasons. Using sizeof for pagesize pad allows the sloppiness in the pcpu maintenance, IMO

This revision is now accepted and ready to land.Dec 18 2016, 8:17 PM
sys/i386/i386/pmap.c
526 ↗(On Diff #23041)

I like your idea to move this to init_secondary(), but I don't know if I can get away with it.
I see a panic when I try it out because the kva_alloc() path takes a critical section, while scheduler and thread state haven't been initialized for the AP yet. VM layer should already be initialized by BSP, so I don't think bumping virtual_avail as is done in the bootstrap path is possible.
Is there some other allocation strategy I can use?

sys/i386/include/pcpu.h
75 ↗(On Diff #23041)

That makes sense. I'll change it to a constant.

sys/i386/i386/pmap.c
526 ↗(On Diff #23041)

You are absolutely right. Yes, we cannot take any sleepable locks or normal mutexes until scheduler takes over the AP initialization. Sorry.

I think that this could be fixed by providing some place for APs to perform the potentially blocking initialization before running any other code. E.g. we might create a runnable highest priority thread bound to AP just for that. I will think about this.

sys/i386/i386/pmap.c
526 ↗(On Diff #23041)

Would it be worth the trouble to do that though? If we only need this for per-CPU mappings, then it seems easier to just use the SI_SUB_CPU sysinit as done here. Are there other pieces of the system that could benefit from such a stage?

sys/i386/i386/pmap.c
529 ↗(On Diff #23041)

My main motivation for that is there we get non obvious hidden dependency in the initialization order. With the efforts to start APs much earlier than it is done now (see EARLY_AP_STARTUP recent work), this IMO becomes fragile. After I wrote this paragraph, I started wondering should asserts like CADDR1 != 0, CADDR2 != 0 added to consumers of sysmaps.

As I said above, do not postpone the change waiting for the initialization rework.

jah edited edge metadata.

Use constant for pcpu padding. Collapse sysmaps struct.

This revision now requires review to proceed.Dec 23 2016, 5:01 AM
kib edited edge metadata.
This revision is now accepted and ready to land.Dec 23 2016, 9:07 AM
This revision was automatically updated to reflect the committed changes.