Page MenuHomeFreeBSD

Fine grain lock reservations
ClosedPublic

Authored by jeff on Mar 15 2018, 11:38 PM.
Tags
None
Referenced Files
Unknown Object (File)
Feb 26 2024, 11:21 PM
Unknown Object (File)
Dec 26 2023, 2:15 AM
Unknown Object (File)
Dec 23 2023, 10:03 AM
Unknown Object (File)
Aug 2 2023, 12:21 AM
Unknown Object (File)
Aug 2 2023, 12:19 AM
Unknown Object (File)
Aug 2 2023, 12:19 AM
Unknown Object (File)
Aug 2 2023, 12:16 AM
Unknown Object (File)
Aug 2 2023, 12:07 AM
Subscribers

Details

Summary

This eliminates page free lock from use in reservations except immediately around phys allocator calls. I do this by protecting reservations with their own lock and using atomics for free_count. The allocation functions 'reserve' their free count before going off to find the pages. They return it if they are unsuccessful. This possibly does result in some false blocking but it should be minimal and well tolerated.

I originally tried to protect reservations with the page lock but this resulted in page -> pmap -> page lock reversals.

The alloc/extend function split up now doesn't make sense. I did not predict correctly how this would all end up. I will make this more attractive in a later diff.

The majority of the contention is now on the lock that protects partpopq. I believe we can place some limit on how frequently we update the list and make it 'fuzzy'. Either by time with ticks or by requiring N page adjustments before shuffling the list position.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

vm/vm_phys.c
76–77

These are unrelated mostly but observed through profiling. I can do a separate commit for this.

vm/vm_reserv.c
361–362

KTR are not necessary but they were very helpful for debugging.

vm/vm_page.c
1754

Why did this come out of the previous if statement? vm_reserv_reclaim_inactive() won't touch vmd_free_count, so if vm_domain_allocate() fails we'll break a reservation for no gain.

3148

Without this change, vm_page_free_prep() is idempotent. I don't think that's a crucial property of this function, but it might give rise to some subtle problems.

I would put a newline after the pmap_page_set_memattr() call.

vm/vm_reserv.c
1471

Doesn't this mean that early allocations from kernel_object won't come from reservations? What does that affect?

vm/vm_page.c
1754

I see. You are right. I can move this along with the freecnt_inc.

3148

I'm not sure what you mean. if vm_reserv_free_page() handles the page it tells free_prep() callers to ignore it because it is actually free. The current callers use the return value to determine if they should free it or not. It really needs to happen in this context and not a deferred context because the object lock is held at this time and may not be later. It would take some new caller that maintained a page reference after the call returns false to behave badly.

vm/vm_reserv.c
1471

Yes, it affects anything that can not be allocated from the direct map. But it is for a very short period. Just long enough for witness to be setup. So I doubt there are many, if any, multi-page kmem allocations in this time. They will simply be slightly slower if so. We could change the witness bootstrap order if it affected something critical. We could also conditionalize the startup order on witness being compiled in, although that feels kinda gross.

Without this, the witness pending list needs to be huge. One entry for every superpage in the system.

vm/vm_page.c
3148

I was really just thinking out loud, I don't think this change will cause problems for current consumers of the API.

vm/vm_reserv.c
1471

It might also inhibit the ability of later allocations to benefit from superpages, no?

I'm a bit confused though: is the entire purpose of moving vm_reserv_init() later just to be able to use counter_u64_alloc()?

vm/vm_reserv.c
1471

It is possible that you may make as much as 2MB of memory unavailable for superpages. I would have to look and see exactly what happens on startup.

It is because witness can't initialize the locks right now. There area lot of them, one for every reservation. So it places these early locks on a pend list. But the pend list becomes an unmanageable size.

Fix review feedback. Move witness initialization into the vm startup so that we can allocate large numbers of locks prior to bringing up malloc. Use fcmpset.

kern/subr_witness.c
746

this needs to be in a header.

767

this needs to be in a header.

vm/vm_page.c
134–137

this needs to be in a header.

vm/vm_page.c
134–137

I felt the same way but oddly the uma_startup and vmem_startup_count were not done that way.

vm/vm_page.c
134–137

They should be too :)

What is pending_cnt after booting with this change? Can we drop the default value of WITNESS_PENDLIST?

kern/subr_witness.c
752

Style: no space after "sizeof".

752

I don't quite see why we need the roundup. sizeof(struct witness) should be a multiple of __alignof(struct witness).

779

Style: no space after "sizeof".

This revision is now accepted and ready to land.Mar 22 2018, 5:41 PM
This revision was automatically updated to reflect the committed changes.