Page MenuHomeFreeBSD

factor out PT page allocation/freeing
ClosedPublic

Authored by jah on Mar 9 2021, 5:55 PM.

Details

Summary

As follow-on work to e4b8deb222278b2a, move page table page
allocation and freeing into their own functions. Use these
functions to provide separate kernel vs. user page table page
accounting, and to wrap common tasks such as management of
zero-filled page state.

Requested by: markj, kib

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

jah requested review of this revision.Mar 9 2021, 5:55 PM
sys/amd64/amd64/pmap.c
4209

I haven't rolled resident count management into these functions, for a couple of reasons:

  1. We don't seem to uniformly charge user PT pages to the resident count: it looks as though pmap_pinit() intentionally avoids accounting the top-level page(s).
  2. pmap_resident_count_inc[dec]() carries a requirement to hold the pmap lock. I'd rather avoid coupling this requirement to page table allocation, as it seems like this could make future changes to the locking scheme unnecessarily difficult.
sys/amd64/amd64/pmap.c
540

This removal should be split out and committed separately.

4209

For #1, nothing prevents us from starting accounting the root page.
You would need to change pmap_remove() and pmap_release(), of course.

For #2, you can try to switch to atomics and stop require pmap lock.

sys/amd64/amd64/pmap.c
540

Oops, I'd meant not to include this change in the review, but I accidentally added the commit to the new branch.

4209

We can definitely do those things, but are they worth doing for the small amount of cleanup?

The atomics especially would just be unnecessary overhead right now, since we already hold the lock (but they would be necessary if we started accounting the root table, unless pmap_pinit()/pmap_release() started holding the lock).

sys/amd64/amd64/pmap.c
4209

Really, neither atomics nor the lock would be necessary for pmap_pinit()/pmap_release() since nothing else should be using the pmap during those calls, but it would mean that pmap_alloc_pt_page()/pmap_free_pt_page() would need to do something different for resident count during pmap_pinit()/pmap_release().

sys/amd64/amd64/pmap.c
4209

So pmap_pinit() is the only places which change resident_count unlocked?
Why not initialize the counter with 1 instead of 0. pmap_remove() and pmap_release() would still need change.

sys/amd64/amd64/pmap.c
4209

We could do that, or we could keep doing what we're already doing and just not charge the top-level page(s) to resident_count. That would be simpler, since the top-level page count is either 1 or 2 depending on PTI, which would complicate the check in pmap_remove() and pmap_release().

In either case, we'd need a flag to pmap_alloc_pt_page()/pmap_free_pt_page() to tell them not to update resident_count when called from pmap_pinit()/pmap_release().

sys/amd64/amd64/pmap.c
4209

You can pass NULL pmap for the case where the counting is not needed. Or reuse some VM_ALLOC flag that is never passed to pmap_alloc_pt_page() otherwise.

sys/amd64/amd64/pmap.c
4209

I'd be ok with using a NULL pmap for this case; I like that somewhat better than adding yet another boolean. This ability to opt out of counting would also alleviate my concern over coupling resident_count's locking requirements to page allocation.

Revert accidentally-committed PCID counter removal, incorporate resident_count

sys/amd64/amd64/pmap.c
116

This is not needed. sys/param.h already includes sys/types.h. Was there a reason like a compiler error that caused you to add the include?

117

Why did you decided to add the include now?

1650

In principle, all early-allocated kernel page table pages should be accounted in the kernel pmap pages counter.

2048

LA57 root pages should be also accounted for kernel_pmap.

sys/amd64/amd64/pmap.c
116

This is not needed. sys/param.h already includes sys/types.h. Was there a reason like a compiler error that caused you to add the include?

Not a compile error, just a result of reading counter(9). I realized we were coincidentally getting counter.h from *somewhere* else, and I prefer to explicitly include dependency headers to avoid problems later. I'm fine removing sys/types.h if we can guarantee it'll always be included, but I'd prefer to keep the explicit include of sys/counter.h

1650

How important do you think it is to account for these? I've avoided tracking these early-boot allocations since they're fixed and unlikely to be useful in tracking runtime changes in kernel PT page footprint. Also, direct updates to kernel_pt_page_count at this point will be lost since pmap_bootstrap runs before SI_SUB_COUNTER.

If needed, we can save off the count and add it to kernel_pt_page_count later in boot.

sys/amd64/amd64/pmap.c
116

sys/param.h is guaranteed to include sys/types.h, we should not include sys/types.h then.

I am fine with explicitly adding sys/counter.h.

1650

Not too much important.

I thought that switching from bootstrap to proper counter preserves the accumulated value, but apparently I am wrong. Not sure if it worth fixing.

--remove unnecessary include
--add la57 bootstrap accounting
--further cleanup to improve code reuse

kib added inline comments.
sys/amd64/amd64/pmap.c
2077

It should be 1, not 5. Everything except the top level page is deallocated later in this

This revision is now accepted and ready to land.Sat, Mar 13, 6:54 AM
sys/amd64/amd64/pmap.c
1650

I wondered the same thing about fixing counter(9). It would be easy enough to make COUNTER_U64_DEFINE_EARLY() declare a regular uint64_t for early use and then copy that value to the per-CPU location during the sysinit. But callers would still either need to know to use the regular variable by name during early boot, or would need a COUNTER_U64_ADD_EARLY() wrapper to check the sysinit phase and use either the early variable or per-CPU variable as appropriate (making the regular counter_u64_add() handle this automatically would just penalize the 99+% case where the counters are used after SI_SUB_COUNTER). This didn't seem worth doing for the likely very few use cases.

2077

Thanks for catching this, will fix.

Fix la57 table accounting

This revision now requires review to proceed.Sat, Mar 13, 5:28 PM
kib added inline comments.
sys/amd64/amd64/pmap.c
1650

Right now counter is single uintptr_t value. Assume we change it to two uintptr_t, first is the pointer, second is the early value accumulator. For early initialization, first word should be set to &(second word) - &__pcpu[0]. After actual counter initialization, we move the second word into counter[0].

Then one word per counter would be left unused. Might be, we could find some other use for it.

This revision is now accepted and ready to land.Sat, Mar 13, 5:46 PM
sys/amd64/amd64/pmap.c
1650

All counters used here are known to be there at compilation time and present for the duration of the kernel being up. A counter-like interface utilizing this fact would avoid the entire problem with supporting 'early' states at least in pmap and probably for all other users. An additional win would be cheaper increments for not having to load a pointer with the offset.

sys/amd64/amd64/pmap.c
1650

That seems like a better approach, since all the other options involve some form of waste in the common case. What would this look like? Would it be something like a 'PCPU' ELF segment that gets replicated at boot for each CPU (so assuming BSP is always cpu 0, the runtime pointer would be "<static_addr> + <cpu_num> * <segment_size>)?

That would also get these static fields out of the UMA per-CPU zone, which if I'm reading correctly is limited to a page for each CPU...

sys/amd64/amd64/pmap.c
1650

Counter's elements are per-cpu, and having a simple scheme where VA for curcpu is trivially calculated from VA for cpu 0, requires a lot of setup at runtime. This is why this EARLY_COUNTER business appeared at all.

Static configuration of known counters cannot be done before we at least enumerate CPUs we would have, before kernel pmap is fully set up, and perhaps before all the initial allocations (that abuse the BIOS/EFI memory maps, instead of being able to use normal allocators) are done. So I really not sure what is being proposed there.

sys/amd64/amd64/pmap.c
1650

My proposal would put them in struct pcpu, just like pc_pm_save_cnt. Making this into something resembling a facility instead of open-coded support may require some effort, unless a zero-tech solution is implemented. The most trivial take I have on this is to create a dedicated defining struct static_counter (or similar) header where all the counters get explicitly specified, then said struct would be a part of struct pcpu.

sys/amd64/amd64/pmap.c
1650

I'm still not convinced we need to do anything to make early counters work. SI_SUB_COUNTER is pretty early in boot. Are there really that many things that are likely to have counter values they really care about tracking that early? Code that runs that early in boot tends to be specialized anyway, so surely it could just stash the count into a regular variable and update the counter later.

I realize early counters would be a secondary benefit to the main idea of reworking static counters to make their access more efficient, but I'm not sure we need to worry about the early counter case at all.

This revision was automatically updated to reflect the committed changes.