Details

Reviewers

alc
dougm
kib
markj

Commits

rS355765: Add a deferred free mechanism for freeing swap space that does not require

Summary

This allows me to do more unlocked cases in fault and tmpfs. The idea is to check whether you're the first to dirty a swap object and free the space if so. This could be optimized with another page flag that says whether or not there is any swap space allocated.

Another approach would be to simply look for dirty pages with allocated swap space in the active/inactive queues. It makes me a little uncomfortable that every place that calls vm_page_dirty() needs to know if it should clear swap or not. I can't convince myself that are gaps but there are a lot of calls to dirty().

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

jeff created this revision.Dec 3 2019, 9:31 PM

Harbormaster completed remote builds in B27921: Diff 65178.Dec 3 2019, 9:31 PM

jeff edited the summary of this revision. (Show Details)Dec 3 2019, 9:53 PM

jeff added reviewers: alc, dougm, kib, markj.

jeff set the repository for this revision to rS FreeBSD src repository - subversion.

Herald added a subscriber: imp. · View Herald TranscriptDec 3 2019, 9:53 PM

jeff added inline comments.Dec 3 2019, 9:53 PM

sys/vm/vm_fault.c
256 ↗	(On Diff #65178)	Would the results of the atomic swap allow us to avoid vm_page_lock here? I don't understand the case where we're not dirtying the page but we do need to set PGA_NOSYNC. Do we need to do this if !need_dirty? Can't we wait for the first dirtying fault?
sys/vm/vm_page.c
4614 ↗	(On Diff #65178)	I don't really like the duplication of the atomic macros now that we have set, clear, and swap. I didn't see a clean way to consolidate them.
4748 ↗	(On Diff #65178)	Could trylock here as well.

Don't unlock the page until the dirtying is complete.

I do believe it would be possible to omit the page lock here and instead use the result of the atomic swap to determine whether we should set nosync or not.

Harbormaster completed remote builds in B28032: Diff 65380.Dec 7 2019, 11:01 PM

After discussion with kib I believe this permits the removal of the page lock in vm_fault_dirty(). Kib was also concerned about delayed freeing of swap space impacting behavior under memory pressure and requested counters. I believe the page daemon will be active enough in this time that the delay should not be a significant problem.

Harbormaster completed remote builds in B28038: Diff 65415.Dec 8 2019, 9:55 PM

I beleive that we need a counter of the delayed-unswapped pages.

I think we might need to wake up pageout daemon when swap becomes full and there are delayed-unswapped pages. E.g. if we have a load where something large was swapped out and then left alone, with random pages marked as delayed-unswapped, then we want the swap space recovered as much as possible if we suddently get a load peak.

sys/vm/vm_pageout.c
1313 ↗	(On Diff #65380)	Don't we need to recheck PGA_UNSWAPPED after locking the object ? It might be coincidental, but clearing happens only under the object lock, and this prevents unlikely race.

alc added inline comments.Dec 9 2019, 4:08 PM

sys/vm/vm_page.c
4765–4766 ↗	(On Diff #65415)	Most of the time, there won't be any allocated swap space for the page. So, I would suggest that you unconditionally call vm_pager_page_unswapped() and let it conditionally set the flag. Essentially, I'm suggesting that we combine D22324 with this change. In D22324, @dougm tries to upgrade the object lock so as to be able to free the swap space. Instead, of attempting to upgrade the lock, we set the flag.

alc added inline comments.Dec 9 2019, 4:27 PM

sys/vm/vm_page.c
4765–4766 ↗	(On Diff #65415)	I should have asked: Are you intending to call this function without any lock on the object? vm_pager_page_unswapped() needs a read lock, unless the radix tree used to manage an object's swap space is also made to support lock-free lookups.

jeff added inline comments.Dec 10 2019, 10:05 PM

sys/vm/vm_page.c
4765–4766 ↗	(On Diff #65415)	I would prefer not to go down the rabbit hole of also making swap radix lockless for now. Although in principal it is very simple. I have a substantial patch queue to work down. We could just use another bit to indicate that there is allocated swap space. We are not so pressed for page flag space. How do you feel about a separate paging queue that is drained quickly for pages with both of these bits set? Kib & Mark feel this may create out of swap conditions because the backpressure won't work quickly enough.

jeff added inline comments.Dec 10 2019, 10:07 PM

sys/vm/vm_fault.c
291–292 ↗	(On Diff #65415)	Was this not leaking swap space in the !excl && need_dirty case? Did we not encounter this due to the conditions under which we call vm_fault_soft_fast?

alc added inline comments.Dec 11 2019, 12:41 AM

sys/vm/vm_fault.c
291–292 ↗	(On Diff #65415)	It's not precisely a leak, because the swap space can eventually be reclaimed, e.g., when the vm object is destroyed. However, I do agree that the current behavior is less than ideal, and it probably occurs more often than we would like. In such cases, your patch is already an improvement. Addressing this problem was also the objective of D22324.

jeff added inline comments.Dec 11 2019, 12:46 AM

sys/vm/vm_fault.c
291–292 ↗	(On Diff #65415)	If you are comfortable with the 'two bit' solution (HASSWAP & UNSWAPPED) I can code that and push the locking/bit set/bit clear into swap_pager_unswapped(). Given that we were that imprecise about clearing it before do you think we need an additional page queue to lower the latency or should existing paging systems be adequate with the delayed bit detection?

markj added inline comments.Dec 12 2019, 2:07 AM

sys/vm/vm_page.c
4765–4766 ↗	(On Diff #65415)	My claim was that there is no backpressure, because we only scan the active queue aggressively when there is a shortage of pages in the inactive and laundry queues. After revisiting the code, I'm not sure that this will be a major problem since the laundry thread moves pages out of PQ_LAUNDRY and into PQ_UNSWAPPABLE when a swap pageout fails because we couldn't allocate blocks. In other words, if we run out of swap space and are under memory pressure we should rapidly run down the laundry queue and trigger an all-out scan of the active queue, which will hopefully reclaim enough swap space if any is available.
sys/vm/vm_page.h
428 ↗	(On Diff #65178)	Can you describe what synchronization is needed to set and clear this flag?
sys/vm/vm_pageout.c
1561 ↗	(On Diff #65415)	Don't we need to do this while laundering as well? Suppose a page in the laundry queue is dirty and has UNSWAPPED set. We may page out its contents and mark it clean, but I don't see how PGA_UNSWAPPED gets cleared at that point.

Use an extra bit of state to refine the mechanism. Handle the majority of the
work in the swap pager itself.

Harbormaster completed remote builds in B28099: Diff 65541.Dec 12 2019, 8:48 AM

jeff added inline comments.Dec 12 2019, 8:52 AM

sys/vm/swap_pager.c
507 ↗	(On Diff #65541)	accidental whitespace. I will remove before commit.
1149 ↗	(On Diff #65541)	I prefer not to trylock here. In my experience if you trylock from a great number of threads you can end up with degenerate behavior. By pushing it into the page daemon we have only a couple of threads that are potentially adding contention to the object lock.
1478 ↗	(On Diff #65541)	This is a bug. I am running stress2 now.
1628 ↗	(On Diff #65541)	The page isn't exactly unswappable but the block is dead. On read both are. The comment above "If an error occurs I'd love to..." may be possible now.

markj added inline comments.Dec 12 2019, 1:53 PM

sys/vm/vm_page.h
446 ↗	(On Diff #65541)	Presumably these should not be the same value.
sys/vm/vm_pageout.c
1313 ↗	(On Diff #65380)	I think you are right. There is also no guarantee that the object pointer is not stale, so that must be rechecked as well.

jeff added inline comments.Dec 12 2019, 5:05 PM

sys/vm/vm_pageout.c
1313 ↗	(On Diff #65380)	I see I will fix this. Incidentally with recent changes it would be rendered harmless. We would see that object is not locked and just make sure the bits were right.

vm.stats.swap_free_completed: 3486066
vm.stats.swap_free_deferred: 3489914

My system seems to steadily have a small gap of unfreed pages. I can write a script to grab this over time and trace and see how large it gets.

markj added inline comments.Dec 12 2019, 8:19 PM

sys/vm/swap_pager.c
179 ↗	(On Diff #65541)	Could we put these under vm.stats.swap instead, so there is a little more harmony with vm.stats.page and vm.stats.object?

Address feedback. This version looks much cleaner IMO and passes stress2.

Harbormaster completed remote builds in B28131: Diff 65609.Dec 13 2019, 7:06 PM

markj added inline comments.Dec 13 2019, 7:11 PM

sys/vm/swap_pager.c
1149 ↗	(On Diff #65609)	Assert that either the object is wlocked or the page is xbusied here? I believe that is true among existing callers, or else it is not safe to dereference `obj`.
sys/vm/vm_pager.h
182 ↗	(On Diff #65609)	This comment isn't true anymore.

jeff added inline comments.Dec 13 2019, 8:44 PM

sys/vm/swap_pager.c
1149 ↗	(On Diff #65609)	Yes that is true, I can add this assert.

jeff added inline comments.Dec 15 2019, 3:14 AM

sys/vm/vm_page.c
1588 ↗	(On Diff #65609)	Now that I commit this I wonder if this should just be PGA_SWAP_SPACE not SWAP_FREE. It would catch potential leaks before the object was destroyed. replace/rename would remove valid swap but this should only happen with a valid page. Any comments?

This revision was not accepted when it landed; it landed in state Needs Review.Dec 15 2019, 3:15 AM

Closed by commit rS355765: Add a deferred free mechanism for freeing swap space that does not require (authored by jeff). · Explain Why

This revision was automatically updated to reflect the committed changes.

jeff added a commit: rS355765: Add a deferred free mechanism for freeing swap space that does not require.

Add a deferred mechanism for deleting swap space without the object lock held.
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 65677

head/sys/dev/md/md.c

head/sys/fs/tmpfs/tmpfs_subr.c

head/sys/kern/uipc_shm.c

head/sys/vm/swap_pager.c

head/sys/vm/vm_fault.c

head/sys/vm/vm_page.h

head/sys/vm/vm_page.c

head/sys/vm/vm_pageout.c

head/sys/vm/vm_pager.h

Add a deferred mechanism for deleting swap space without the object lock held.ClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 65677

head/sys/dev/md/md.c

head/sys/fs/tmpfs/tmpfs_subr.c

head/sys/kern/uipc_shm.c

head/sys/vm/swap_pager.c

head/sys/vm/vm_fault.c

head/sys/vm/vm_page.h

head/sys/vm/vm_page.c

head/sys/vm/vm_pageout.c

head/sys/vm/vm_pager.h

Add a deferred mechanism for deleting swap space without the object lock held.
ClosedPublic
Actions

Revision Contents
Changeset List