Only update the domain cursor once in keg_fetch_slab().
ClosedPublic
Actions

Authored by markj on Sep 17 2018, 8:32 PM.

Details

Reviewers

cem
alc
jeff
kib

Commits

rS338755: Only update the domain cursor once in keg_fetch_slab().

Summary

We drop the keg lock when we go to actually allocate the slab, allowing
other threads to advance the cursor. This can in principle cause us to
exit the round-robin loop before having attempted allocations from all
domains.

Suppose one domain, N, is depleted and its page daemon cannot reclaim
any memory (e.g., because virtually all of the memory in the domain is
wired). Suppose keg_fetch_slab() attempts to allocate from that domain
first, and fails, and that while the keg lock was dropped a different
thread advanced the cursor to N - 1. Upon re-acquiring the keg lock, we
will then set domain = N and retry the loop, resulting in a blocking
allocation which will never return.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

markj created this revision.Sep 17 2018, 8:32 PM

Harbormaster completed remote builds in B19631: Diff 48138.Sep 17 2018, 8:32 PM

markj mentioned this in D17059: Enable options NUMA on amd64 GENERIC/MINIMAL:.Sep 17 2018, 8:34 PM

This changes keg cursor advancement behavior slightly. I'm not sure that matters.

This revision is now accepted and ready to land.Sep 18 2018, 12:29 AM

In D17209#366880, @cem wrote:

This changes keg cursor advancement behavior slightly. I'm not sure that matters.

On my 32core EPYC server this simple change makes the difference between a stable system and ZFS hanging waiting to allocate memory for writes under load with NUMA enabled until the ZFS deadman switch triggers a panic after 1000 seconds.