Page MenuHomeFreeBSD

Add a deferred mechanism for deleting swap space without the object lock held.
ClosedPublic

Authored by jeff on Dec 3 2019, 9:31 PM.

Details

Summary

This allows me to do more unlocked cases in fault and tmpfs. The idea is to check whether you're the first to dirty a swap object and free the space if so. This could be optimized with another page flag that says whether or not there is any swap space allocated.

Another approach would be to simply look for dirty pages with allocated swap space in the active/inactive queues. It makes me a little uncomfortable that every place that calls vm_page_dirty() needs to know if it should clear swap or not. I can't convince myself that are gaps but there are a lot of calls to dirty().

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

jeff added reviewers: alc, dougm, kib, markj.
jeff set the repository for this revision to rS FreeBSD src repository - subversion.
sys/vm/vm_fault.c
256 ↗(On Diff #65178)

Would the results of the atomic swap allow us to avoid vm_page_lock here?

I don't understand the case where we're not dirtying the page but we do need to set PGA_NOSYNC. Do we need to do this if !need_dirty? Can't we wait for the first dirtying fault?

sys/vm/vm_page.c
4614 ↗(On Diff #65178)

I don't really like the duplication of the atomic macros now that we have set, clear, and swap. I didn't see a clean way to consolidate them.

4748 ↗(On Diff #65178)

Could trylock here as well.

Don't unlock the page until the dirtying is complete.

I do believe it would be possible to omit the page lock here and instead use the result of the atomic swap to determine whether we should set nosync or not.

After discussion with kib I believe this permits the removal of the page lock in vm_fault_dirty(). Kib was also concerned about delayed freeing of swap space impacting behavior under memory pressure and requested counters. I believe the page daemon will be active enough in this time that the delay should not be a significant problem.

I beleive that we need a counter of the delayed-unswapped pages.

I think we might need to wake up pageout daemon when swap becomes full and there are delayed-unswapped pages. E.g. if we have a load where something large was swapped out and then left alone, with random pages marked as delayed-unswapped, then we want the swap space recovered as much as possible if we suddently get a load peak.

sys/vm/vm_pageout.c
1313 ↗(On Diff #65380)

Don't we need to recheck PGA_UNSWAPPED after locking the object ? It might be coincidental, but clearing happens only under the object lock, and this prevents unlikely race.

sys/vm/vm_page.c
4765–4766 ↗(On Diff #65415)

Most of the time, there won't be any allocated swap space for the page. So, I would suggest that you unconditionally call vm_pager_page_unswapped() and let it conditionally set the flag. Essentially, I'm suggesting that we combine D22324 with this change. In D22324, @dougm tries to upgrade the object lock so as to be able to free the swap space. Instead, of attempting to upgrade the lock, we set the flag.

sys/vm/vm_page.c
4765–4766 ↗(On Diff #65415)

I should have asked: Are you intending to call this function without any lock on the object? vm_pager_page_unswapped() needs a read lock, unless the radix tree used to manage an object's swap space is also made to support lock-free lookups.

sys/vm/vm_page.c
4765–4766 ↗(On Diff #65415)

I would prefer not to go down the rabbit hole of also making swap radix lockless for now. Although in principal it is very simple. I have a substantial patch queue to work down.

We could just use another bit to indicate that there is allocated swap space. We are not so pressed for page flag space.

How do you feel about a separate paging queue that is drained quickly for pages with both of these bits set? Kib & Mark feel this may create out of swap conditions because the backpressure won't work quickly enough.

sys/vm/vm_fault.c
291–292 ↗(On Diff #65415)

Was this not leaking swap space in the !excl && need_dirty case?

Did we not encounter this due to the conditions under which we call vm_fault_soft_fast?

sys/vm/vm_fault.c
291–292 ↗(On Diff #65415)

It's not precisely a leak, because the swap space can eventually be reclaimed, e.g., when the vm object is destroyed. However, I do agree that the current behavior is less than ideal, and it probably occurs more often than we would like. In such cases, your patch is already an improvement. Addressing this problem was also the objective of D22324.

sys/vm/vm_fault.c
291–292 ↗(On Diff #65415)

If you are comfortable with the 'two bit' solution (HASSWAP & UNSWAPPED) I can code that and push the locking/bit set/bit clear into swap_pager_unswapped().

Given that we were that imprecise about clearing it before do you think we need an additional page queue to lower the latency or should existing paging systems be adequate with the delayed bit detection?

sys/vm/vm_page.c
4765–4766 ↗(On Diff #65415)

My claim was that there is no backpressure, because we only scan the active queue aggressively when there is a shortage of pages in the inactive and laundry queues. After revisiting the code, I'm not sure that this will be a major problem since the laundry thread moves pages out of PQ_LAUNDRY and into PQ_UNSWAPPABLE when a swap pageout fails because we couldn't allocate blocks. In other words, if we run out of swap space and are under memory pressure we should rapidly run down the laundry queue and trigger an all-out scan of the active queue, which will hopefully reclaim enough swap space if any is available.

sys/vm/vm_page.h
428 ↗(On Diff #65178)

Can you describe what synchronization is needed to set and clear this flag?

sys/vm/vm_pageout.c
1561 ↗(On Diff #65415)

Don't we need to do this while laundering as well? Suppose a page in the laundry queue is dirty and has UNSWAPPED set. We may page out its contents and mark it clean, but I don't see how PGA_UNSWAPPED gets cleared at that point.

Use an extra bit of state to refine the mechanism. Handle the majority of the
work in the swap pager itself.

sys/vm/swap_pager.c
507 ↗(On Diff #65541)

accidental whitespace. I will remove before commit.

1149 ↗(On Diff #65541)

I prefer not to trylock here. In my experience if you trylock from a great number of threads you can end up with degenerate behavior. By pushing it into the page daemon we have only a couple of threads that are potentially adding contention to the object lock.

1478 ↗(On Diff #65541)

This is a bug. I am running stress2 now.

1628 ↗(On Diff #65541)

The page isn't exactly unswappable but the block is dead. On read both are. The comment above "If an error occurs I'd love to..." may be possible now.

sys/vm/vm_page.h
446 ↗(On Diff #65541)

Presumably these should not be the same value.

sys/vm/vm_pageout.c
1313 ↗(On Diff #65380)

I think you are right. There is also no guarantee that the object pointer is not stale, so that must be rechecked as well.

sys/vm/vm_pageout.c
1313 ↗(On Diff #65380)

I see I will fix this. Incidentally with recent changes it would be rendered harmless. We would see that object is not locked and just make sure the bits were right.

vm.stats.swap_free_completed: 3486066
vm.stats.swap_free_deferred: 3489914

My system seems to steadily have a small gap of unfreed pages. I can write a script to grab this over time and trace and see how large it gets.

sys/vm/swap_pager.c
179 ↗(On Diff #65541)

Could we put these under vm.stats.swap instead, so there is a little more harmony with vm.stats.page and vm.stats.object?

Address feedback. This version looks much cleaner IMO and passes stress2.

sys/vm/swap_pager.c
1149 ↗(On Diff #65609)

Assert that either the object is wlocked or the page is xbusied here? I believe that is true among existing callers, or else it is not safe to dereference obj.

sys/vm/vm_pager.h
182 ↗(On Diff #65609)

This comment isn't true anymore.

sys/vm/swap_pager.c
1149 ↗(On Diff #65609)

Yes that is true, I can add this assert.

sys/vm/vm_page.c
1588 ↗(On Diff #65609)

Now that I commit this I wonder if this should just be PGA_SWAP_SPACE not SWAP_FREE. It would catch potential leaks before the object was destroyed. replace/rename would remove valid swap but this should only happen with a valid page.

Any comments?

This revision was not accepted when it landed; it landed in state Needs Review.Dec 15 2019, 3:15 AM
This revision was automatically updated to reflect the committed changes.