Page MenuHomeFreeBSD

Only update the domain cursor once in keg_fetch_slab().
ClosedPublic

Authored by markj on Sep 17 2018, 8:32 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 5, 5:40 AM
Unknown Object (File)
Oct 19 2024, 10:32 PM
Unknown Object (File)
Sep 28 2024, 9:27 PM
Unknown Object (File)
Sep 21 2024, 1:16 PM
Unknown Object (File)
Sep 19 2024, 1:37 PM
Unknown Object (File)
Sep 19 2024, 11:44 AM
Unknown Object (File)
Sep 18 2024, 3:00 AM
Unknown Object (File)
Sep 17 2024, 5:39 PM

Details

Summary

We drop the keg lock when we go to actually allocate the slab, allowing
other threads to advance the cursor. This can in principle cause us to
exit the round-robin loop before having attempted allocations from all
domains.

Suppose one domain, N, is depleted and its page daemon cannot reclaim
any memory (e.g., because virtually all of the memory in the domain is
wired). Suppose keg_fetch_slab() attempts to allocate from that domain
first, and fails, and that while the keg lock was dropped a different
thread advanced the cursor to N - 1. Upon re-acquiring the keg lock, we
will then set domain = N and retry the loop, resulting in a blocking
allocation which will never return.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

cem added a subscriber: cem.

This changes keg cursor advancement behavior slightly. I'm not sure that matters.

This revision is now accepted and ready to land.Sep 18 2018, 12:29 AM
In D17209#366880, @cem wrote:

This changes keg cursor advancement behavior slightly. I'm not sure that matters.

On my 32core EPYC server this simple change makes the difference between a stable system and ZFS hanging waiting to allocate memory for writes under load with NUMA enabled until the ZFS deadman switch triggers a panic after 1000 seconds.

In D17209#366880, @cem wrote:

This changes keg cursor advancement behavior slightly. I'm not sure that matters.

Yes, but only in the case where the first allocation attempt fails.

This revision was automatically updated to reflect the committed changes.