Page MenuHomeFreeBSD

Only update the domain cursor once in keg_fetch_slab().
ClosedPublic

Authored by markj on Sep 17 2018, 8:32 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 27 2023, 9:17 PM
Unknown Object (File)
Dec 20 2023, 6:07 AM
Unknown Object (File)
Dec 11 2023, 12:02 AM
Unknown Object (File)
Nov 23 2023, 3:30 PM
Unknown Object (File)
Nov 23 2023, 3:29 PM
Unknown Object (File)
Nov 5 2023, 1:52 AM
Unknown Object (File)
Sep 25 2023, 7:04 PM
Unknown Object (File)
Sep 24 2023, 10:48 AM

Details

Summary

We drop the keg lock when we go to actually allocate the slab, allowing
other threads to advance the cursor. This can in principle cause us to
exit the round-robin loop before having attempted allocations from all
domains.

Suppose one domain, N, is depleted and its page daemon cannot reclaim
any memory (e.g., because virtually all of the memory in the domain is
wired). Suppose keg_fetch_slab() attempts to allocate from that domain
first, and fails, and that while the keg lock was dropped a different
thread advanced the cursor to N - 1. Upon re-acquiring the keg lock, we
will then set domain = N and retry the loop, resulting in a blocking
allocation which will never return.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 19631
Build 19202: arc lint + arc unit

Event Timeline

cem added a subscriber: cem.

This changes keg cursor advancement behavior slightly. I'm not sure that matters.

This revision is now accepted and ready to land.Sep 18 2018, 12:29 AM
In D17209#366880, @cem wrote:

This changes keg cursor advancement behavior slightly. I'm not sure that matters.

On my 32core EPYC server this simple change makes the difference between a stable system and ZFS hanging waiting to allocate memory for writes under load with NUMA enabled until the ZFS deadman switch triggers a panic after 1000 seconds.

In D17209#366880, @cem wrote:

This changes keg cursor advancement behavior slightly. I'm not sure that matters.

Yes, but only in the case where the first allocation attempt fails.

This revision was automatically updated to reflect the committed changes.