Improve I/O throttling during memory shortage.
ClosedPublic
Actions

Authored by imp on Sep 1 2015, 5:57 PM.

Details

Reviewers

kib
scottl
ken
kan
jhb
mav

Summary

After the introduction of direct dispatch, the pacing code in g_down()
broke in two ways. One, the pacing variable was accessed in multiple
threads in an unsafe way. Two, since large numbers of I/O could come
down from the buf layer at one time, large numbers of allocation
failures could happen all at once, resulting in a huge pace value that
would limit I/Os to 10 IOPS for minutes (or even hours) at a
time. While a real solution to these problems requires substantial
work (to go to a no-allocation after the first model, or to have some
way to wait for more memory with some kind of reserve for pager and
swapper requests), it is relatively easy to make this simplistic
pacing less pathological.

Use atomic operations to cope with the first issue. Memory shortages
are rare enough that the additional atomic ops won't significantly
slow down the fast path. Second, sleep for 1ms (or one tick, whichever
is larger) instead of 100ms. This removes the artificial 10 IOPS limit
while still easing up on new I/Os during memory shortages. Remove
tying the amount of time we do this to the number of failed requests
and do it only as long as we keep failing requests.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 347
Build 347: arc lint + arc unit

Event Timeline

imp updated this revision to Diff 8397.Sep 1 2015, 5:57 PM

imp retitled this revision from to Improve I/O throttling during memory shortage..

imp updated this object.

imp edited the test plan for this revision. (Show Details)

imp added reviewers: kib, kan, mav, scottl, jhb, ken.Sep 1 2015, 5:58 PM

emaste added a subscriber: emaste.Sep 1 2015, 6:04 PM

kib accepted this revision.Sep 1 2015, 6:04 PM

kib edited edge metadata.

This revision is now accepted and ready to land.Sep 1 2015, 6:04 PM

emaste added inline comments.Sep 1 2015, 6:06 PM

sys/geom/geom_io.c
792	s/have/has/
807	solved

While I agree with reduction of pause time (100ms is probably overkill now), the rest is less obvious to me:

atomic increase of pace IMHO pointless here, since all that is interesting to us at the end is zero or non-zero values. If you wish atomic there, it could be atomic assignment.
as I understand, this pace now works only for queued dispatch. May be it would be reasonable to disable direct dispatch during the pace intervals to make all consumers equal?

In D3546#72856, @mav wrote:

While I agree with reduction of pause time (100ms is probably overkill now), the rest is less obvious to me:

atomic increase of pace IMHO pointless here, since all that is interesting to us at the end is zero or non-zero values. If you wish atomic there, it could be atomic assignment.

This is more from a mechanical translation. You are right and an atomic store would suffice.

as I understand, this pace now works only for queued dispatch. May be it would be reasonable to disable direct dispatch during the pace intervals to make all consumers equal?

Correct. Pacing has only ever worked with q_down() and never with direct dispatch. I see some merit to expanding this to include direct dispatch, but I've not thought through all the consequences of doing so. Right now, we recursively call g_io_request if the g_clone_bio() call fails which slowly eats up the stack until we queue the I/O once the stack depth is long enough through start -> g_io_deliver() -> g_io_request() -> start ... iterations. This is somewhat non-obvious without careful study of the code, so there may be merit in this suggestion.

Update to include checking pace for the direct dispatch case too. I'm
torn on it. This is still racy even using the atomic_load and
atomic_store operations, which will slow down the fast path
potentially (though these are just memory barriers, so it shouldn't be
too bad). While not stricktly needed, it will stop recursively calling
start -> io_deliver -> io_request -> start until we run out of stack.

Then again, maybe I'm worried too much and should just let it go and
focus on the real fix.

This revision now requires review to proceed.Sep 1 2015, 8:09 PM

kib added inline comments.Sep 1 2015, 8:23 PM

sys/geom/geom_io.c
535	There is no sense in using _acq there, and corresponding _rel in other place. You do not need any happens-before relations for code after acq/before rel.
702	Atomic_set() does not what you probably want it to do. Simple assignment is guaranteed to be atomic on all arches supported by FreeBSD (i.e. readers are guaranteed to read either previous, or the written value from the assigned location).
792	Note that this is indeed racy, by allowing stores of '1' to be ignored.

Per kib@, use volatile. Still on the fence about the direct case.
Scottl@ suggested on IRC that it's a bridge too far.

Looks good to me.

This revision is now accepted and ready to land.Sep 2 2015, 7:59 AM

kib accepted this revision.Sep 2 2015, 10:23 AM

kib edited edge metadata.

Not that commit message still talks about use of atomic operations. Please re-word before change his the tree.

Is this a candidate for MFC?

In D3546#75698, @brd wrote:

Is this a candidate for MFC?

Yes. In fact, I've done all my testing on Netflix's FreeBSD, which is currently based on 10-stable.

r287405 / 3f2e5b8584ec984cc36a80d06e0012c2ddf37a37 landed this fix.

Revision Contents
Changeset List

Path

Size

sys/

geom/

geom_io.c

47 lines

Diff 8409

View Options

Improve I/O throttling during memory shortage.ClosedPublicActions