Page MenuHomeFreeBSD

contig allocs: Don't retry forever on M_WAITOK.
ClosedPublic

Authored by bdrewery on Dec 8 2020, 5:35 AM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 23 2023, 12:26 AM
Unknown Object (File)
Nov 3 2023, 6:40 PM
Unknown Object (File)
Nov 3 2023, 6:39 PM
Unknown Object (File)
Nov 3 2023, 6:39 PM
Unknown Object (File)
Nov 3 2023, 6:39 PM
Unknown Object (File)
Nov 3 2023, 6:39 PM
Unknown Object (File)
Nov 3 2023, 6:39 PM
Unknown Object (File)
Nov 2 2023, 11:51 AM
Subscribers

Details

Summary

This restores behavior from before domain iterators were added in
r327895 and r327896.

The vm_domainset_iter_policy() will do a vm_wait_doms() and then
restart its iterator when M_WAITOK is set. It will also force
the containing loop to have M_NOWAIT. So we get an unbounded
retry loop rather than the intended bounded retries that
kmem_alloc_contig_pages() already handles.

This also restores M_WAITOK to the vmem_alloc() call in
kmem_alloc_attr_domain() and kmem_alloc_contig_domain().

MFC after: 2 weeks
Sponsored by: Dell EMC

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

sys/vm/vm_kern.c
351 ↗(On Diff #80430)

This was the trivial way. I was going the route of vm_domainset_iter_policy_nowait[_init]() at first. Let me know how the API should look as it is very clean and I don't want to muddy it up.

  • Set di_flags|=M_NOWAIT consistently
sys/vm/vm_domainset.c
269 ↗(On Diff #80431)

Rather than adding some additional complexity to this very frequently executed code, I think it would be better to modify kmem_alloc_contig_domainset() to just hide M_WAITOK from the domainset iterator. That is, add a separate oflags or so that gets updated by vm_domainset_iter_policy_init(), but pass the original wait flag to kmem_alloc_contig_domain().

sys/vm/vm_domainset.c
269 ↗(On Diff #80431)

I did that initially but felt it was a layer violation as it is working around an implementation detail of the code being called and was a bit obscure on *why* the flag was being unset. That's why I was thinking of the nowait API so it was self-documenting, like vm_domainset_iter_nowait_policy and vm_domainset_iter_nowait_init.
But I'm open to whichever way is preferred here.

  • Move the fix out of the iterator.
sys/vm/vm_kern.c
269 ↗(On Diff #80447)

It would be better to explain why we don't want the iterator to handle M_WAITOK.

sys/vm/vm_kern.c
269 ↗(On Diff #80447)

How about this?

Do not let the iterator sleep nor unset M_WAITOK.  This uses
kmem_alloc_contig_pages() which has special reclamation needs
for page shortage situations.

Unsure what is considered obvious and non-obvious here.

sys/vm/vm_kern.c
269 ↗(On Diff #80447)

It might be worth mentioning that the iterator restarts rather than gives up too.

sys/vm/vm_kern.c
269 ↗(On Diff #80447)

I would suggest something like, Do not allow the domainset iterator to override wait flags. The contiguous memory allocator defines special semantics for M_WAITOK that do not match the iterator's implementation.

bdrewery retitled this revision from contig allocs: Don't sleep forever on M_WAITOK. to contig allocs: Don't retry forever on M_WAITOK..Dec 8 2020, 11:34 PM
This revision is now accepted and ready to land.Dec 9 2020, 12:55 AM

I wonder if it is better to move the retry policy into iterators instead.

In D27507#615134, @kib wrote:

I wonder if it is better to move the retry policy into iterators instead.

I won't pretend to understand everything here but I had some thoughts.

  1. There is the vmem_alloc in kmem_alloc_contig_domain that seemingly supports M_WAITOK as it doesn't unset it. Should that be retried? Currently it's only tried once per domain and then it calls kmem_alloc_contig_pages which handles retries.
  2. I do wonder about which retry algorithm has the best performance or chance-of-success here. The change here does each domain 3 times while reclaiming before trying the next domain, but I would think if we moved on to the next domain before retry we might avoid the reclaiming. On the other hand staying in each domain for the reclaim/retry might help locality but I don't understand the full picture there. Performance testing on low contig memory may be a good idea to pick the best route.
  3. I think the code would be cleaner with a dedicated contig or reclaim iterator, rather than default wait iterator.
In D27507#615134, @kib wrote:

I wonder if it is better to move the retry policy into iterators instead.

I won't pretend to understand everything here but I had some thoughts.

  1. There is the vmem_alloc in kmem_alloc_contig_domain that seemingly supports M_WAITOK as it doesn't unset it. Should that be retried? Currently it's only tried once per domain and then it calls kmem_alloc_contig_pages which handles retries.

Right, this call allocates a virtual address space range used to map the contiguous pages. This may require allocating memory, and it's also possible to run out of address space and have to sleep waiting for something else to free some, though unlikely on any NUMA platform. So with this change, we may block in vmem_alloc() for a specific domain, which is contrary to the spirit of the iterators.

  1. I do wonder about which retry algorithm has the best performance or chance-of-success here. The change here does each domain 3 times while reclaiming before trying the next domain, but I would think if we moved on to the next domain before retry we might avoid the reclaiming. On the other hand staying in each domain for the reclaim/retry might help locality but I don't understand the full picture there. Performance testing on low contig memory may be a good idea to pick the best route.

I suspect it's hard to find a stable workload where contig reclaim will fail upon the first try but succeed upon the second or third tries. It does seem likely that trying all domains once is a better strategy that trying each domain three times before moving on to the next, so perhaps a dedicated iterator type is worth adding.

  1. I think the code would be cleaner with a dedicated contig or reclaim iterator, rather than default wait iterator.

I think a dedicated iterator would be reasonable. I just wanted to avoid polluting the common case with checks needed to handle rare contig allocations. You could add a vm_domainset_iter_contig() and use it here. Note that vm_page_alloc_contig() uses an iterator as well, but it takes VM_ALLOC_* flags rather than M_* flags. It doesn't handle reclaim though, so probably doesn't need to change.

I think a dedicated iterator would be reasonable. I just wanted to avoid polluting the common case with checks needed to handle rare contig allocations.

Certainly. I am going to commit this for now to at least fix the regression and have a smaller piece to MFC.

This revision was automatically updated to reflect the committed changes.