Page MenuHomeFreeBSD

Fine grain lock reservations
ClosedPublic

Authored by jeff on Mar 15 2018, 11:38 PM.

Details

Summary

This eliminates page free lock from use in reservations except immediately around phys allocator calls. I do this by protecting reservations with their own lock and using atomics for free_count. The allocation functions 'reserve' their free count before going off to find the pages. They return it if they are unsuccessful. This possibly does result in some false blocking but it should be minimal and well tolerated.

I originally tried to protect reservations with the page lock but this resulted in page -> pmap -> page lock reversals.

The alloc/extend function split up now doesn't make sense. I did not predict correctly how this would all end up. I will make this more attractive in a later diff.

The majority of the contention is now on the lock that protects partpopq. I believe we can place some limit on how frequently we update the list and make it 'fuzzy'. Either by time with ticks or by requiring N page adjustments before shuffling the list position.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

vm/vm_phys.c
76–77 ↗(On Diff #40335)

These are unrelated mostly but observed through profiling. I can do a separate commit for this.

vm/vm_reserv.c
361–362 ↗(On Diff #40335)

KTR are not necessary but they were very helpful for debugging.

vm/vm_page.c
1754 ↗(On Diff #40335)

Why did this come out of the previous if statement? vm_reserv_reclaim_inactive() won't touch vmd_free_count, so if vm_domain_allocate() fails we'll break a reservation for no gain.

3148 ↗(On Diff #40335)

Without this change, vm_page_free_prep() is idempotent. I don't think that's a crucial property of this function, but it might give rise to some subtle problems.

I would put a newline after the pmap_page_set_memattr() call.

vm/vm_reserv.c
1471 ↗(On Diff #40335)

Doesn't this mean that early allocations from kernel_object won't come from reservations? What does that affect?

vm/vm_page.c
1754 ↗(On Diff #40335)

I see. You are right. I can move this along with the freecnt_inc.

3148 ↗(On Diff #40335)

I'm not sure what you mean. if vm_reserv_free_page() handles the page it tells free_prep() callers to ignore it because it is actually free. The current callers use the return value to determine if they should free it or not. It really needs to happen in this context and not a deferred context because the object lock is held at this time and may not be later. It would take some new caller that maintained a page reference after the call returns false to behave badly.

vm/vm_reserv.c
1471 ↗(On Diff #40335)

Yes, it affects anything that can not be allocated from the direct map. But it is for a very short period. Just long enough for witness to be setup. So I doubt there are many, if any, multi-page kmem allocations in this time. They will simply be slightly slower if so. We could change the witness bootstrap order if it affected something critical. We could also conditionalize the startup order on witness being compiled in, although that feels kinda gross.

Without this, the witness pending list needs to be huge. One entry for every superpage in the system.

vm/vm_page.c
3148 ↗(On Diff #40335)

I was really just thinking out loud, I don't think this change will cause problems for current consumers of the API.

vm/vm_reserv.c
1471 ↗(On Diff #40335)

It might also inhibit the ability of later allocations to benefit from superpages, no?

I'm a bit confused though: is the entire purpose of moving vm_reserv_init() later just to be able to use counter_u64_alloc()?

vm/vm_reserv.c
1471 ↗(On Diff #40335)

It is possible that you may make as much as 2MB of memory unavailable for superpages. I would have to look and see exactly what happens on startup.

It is because witness can't initialize the locks right now. There area lot of them, one for every reservation. So it places these early locks on a pend list. But the pend list becomes an unmanageable size.

Fix review feedback. Move witness initialization into the vm startup so that we can allocate large numbers of locks prior to bringing up malloc. Use fcmpset.

kern/subr_witness.c
746 ↗(On Diff #40580)

this needs to be in a header.

767 ↗(On Diff #40580)

this needs to be in a header.

vm/vm_page.c
134–137 ↗(On Diff #40580)

this needs to be in a header.

vm/vm_page.c
134–137 ↗(On Diff #40580)

I felt the same way but oddly the uma_startup and vmem_startup_count were not done that way.

vm/vm_page.c
134–137 ↗(On Diff #40580)

They should be too :)

What is pending_cnt after booting with this change? Can we drop the default value of WITNESS_PENDLIST?

kern/subr_witness.c
752 ↗(On Diff #40580)

Style: no space after "sizeof".

752 ↗(On Diff #40580)

I don't quite see why we need the roundup. sizeof(struct witness) should be a multiple of __alignof(struct witness).

779 ↗(On Diff #40580)

Style: no space after "sizeof".

This revision is now accepted and ready to land.Mar 22 2018, 5:41 PM
This revision was automatically updated to reflect the committed changes.