Page MenuHomeFreeBSD

Mark swapgeom as direct dispatch safe and fix a reference leak
ClosedPublic

Authored by imp on Sep 1 2015, 11:07 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Apr 23, 12:43 PM
Unknown Object (File)
Mar 10 2024, 9:15 PM
Unknown Object (File)
Feb 8 2024, 2:25 AM
Unknown Object (File)
Jan 7 2024, 6:05 AM
Unknown Object (File)
Dec 26 2023, 2:44 AM
Unknown Object (File)
Dec 25 2023, 10:02 PM
Unknown Object (File)
Dec 20 2023, 3:56 AM
Unknown Object (File)
Dec 15 2023, 2:45 AM
Subscribers
None

Details

Reviewers
kib
scottl
jhb
mav
Summary

The swap pager (swapgeom at least) appears to be compatible with
direct dispatch. Flag the consumer we create as such. In addition,
decrement the in flight index when we have an out of memory
error. This would prevent the swap file from being able to be unloaded
since it would always be busy after that.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 389
Build 389: arc lint + arc unit

Event Timeline

imp retitled this revision from to Mark swapgeom as direct dispatch safe and fix a reference leak.
imp updated this object.
imp edited the test plan for this revision. (Show Details)
mav edited edge metadata.

The leak fix looks fine. Theoretically there can be a race with simultaneous close, but it is probably too rare to bother much.

What's for direct dispatch support -- I know little about the swapper code, but if you say it is good -- I have no objections.

This revision is now accepted and ready to land.Sep 2 2015, 7:50 AM
sys/vm/swap_pager.c
2607

As I understand, the index field of g_consumer counts in-flight transactions. When index == 0 and private is not NULL, the consumer should be 'closed' (I am not sure that this is right term).

This is not done there for case when index reaches 0.

Also, wouldn't it me cleaner to initialize index with 1 and do not touch private at all, just changing the construct in swapgeom_orphan to cp->index--; destroy = cp->index == 0; ?

2715

Note that swapgeom_strategy() uses sleepable bio allocation call g_alloc_bio() for read requests. I think this is incompatible with the direct dispatch, but I may be wrong.

sys/vm/swap_pager.c
2607

You may be right about this. Let me study the code further to see if I can simplify it and have things still work.

2715

Since this is the strategy routine, before we get into geom, I think it is OK. We aren't even dispatched yet and haven't sent the bio into geom yet.

imp edited edge metadata.

Update per kib's suggestion.

This revision now requires review to proceed.Sep 3 2015, 5:06 PM

Other then the issue at line 2584, the rest looks fine.

sys/vm/swap_pager.c
2607

This fragment still lacks handling of cp->index == 0.

sys/vm/swap_pager.c
2607

Why does this segment need cp->index == 0 handling? It should never be zero here because we can't start new I/O while closing. Se set sp->sw_id to NULL in swapgeom_orphan when we drop the last reference. This means that as soon as we get the lock in swapgeom_strategy, cp->index can't be zero when we increment it because sp->sw_id is not NULL. I can add a kassert to this effect befpre the cp->index++. That is, unless I've misunderstood something and you see some path I'm missing where this wouldn't be true.

imp edited edge metadata.

Add kassert because cp->index can never be 0 if sp->sw_id is non-NULL.

sys/vm/swap_pager.c
2607

Because there could be in-flight bios while we are in _orphan() ? Otherwise, why would we need to handle the case of cp->index != 0 in orphan or cp->index == 0 in _done() ?

sys/vm/swap_pager.c
2607

Sounds like it would be better to check for == 0 before we increment it then, since we're closing...

Add safety belt for entering swapgeom_strategy between the orphan call
and when it finally closes. I'm not sure it is possible, but I haven't
been able to prove it isn't.

sys/vm/swap_pager.c
2543

Please follow style, return (1);

2546

return (0);

2607

Why is it impossible for closing decrement to happen after the sw_dev_mtx is unlocked at line 2588 ? I.e., why not just put your _check_close() call there and be done ?

More refinement based on feedback from kib. Make the acquire/release
semantics of indext clearer. Simplify logic a bit based on
understanding the race kib was worried about better.

sys/vm/swap_pager.c
2607

That's a good idea. I understand the race you were worried about now. I didn't before.

kib edited edge metadata.
This revision is now accepted and ready to land.Sep 3 2015, 9:07 PM
imp edited edge metadata.

Minor last-minute tweaks. Fix a theoretical race setting sw_id, and combine two lines.

This revision now requires review to proceed.Sep 3 2015, 9:15 PM
kib edited edge metadata.

jhb preferences are to not put side-effects into conditions.
Anyway, new version is fine with me.

This revision is now accepted and ready to land.Sep 3 2015, 9:17 PM
sys/vm/swap_pager.c
2555

Heh, I hadn't realized my comments about side effects had made such a lasting impression. :)

In general I do prefer to avoid conditions on side effects. It is more obvious with long
expressions:

if ((error = call_some_thing(lots_of_arguments, and_other + expressions,
    even_more_args, this_gets_long)) != 0)

vs.

error = call_some_thing(....);
if (error != 0)

(Although assigning the result of getopt() in the while condition is one time I always break this.)

This one is probably fine, though I find that it requires the reader to exercise a few more brain cells (the -- is prefix, so does it happen before or after? oh, right before, so this will fire when going down to 0) whereas if you break it up you don't have to do that exercise in your head. It is plainly obvious what the order of the decrement is to the comparison:

cp->index--;
if (cp->index == 0)
imp edited edge metadata.

Minor style tweaks from jhb@ that improve readability.

This revision now requires review to proceed.Sep 3 2015, 11:07 PM

Thanks. I believe I somewhat understand the overall change and from what I understand it looks good to me.

scottl edited edge metadata.
scottl added inline comments.
sys/vm/swap_pager.c
2715

I think think the point is semantics; if you advertise G_CF_DIRECT_RECEIVE, then you shouldn't do anything that could block in the I/O path. This provider will always be at the top of the stack, and like you say it'll do the alloc before entering GEOM, so maybe the right semantic thing here is to drop the RECEIVE flag. However, keep in mind that doing a sleepable alloc in strategy might backfire if the system is out of memory and the biozone zone is empty (something that can easily happen to us at netflix because we run so much i/o in parallel). We've talked about having a reserve pool of bio's specifically set aside in this code; there's probably no other way to safely handle this until we get better asynchronous queueing in GEOM.

This revision is now accepted and ready to land.Sep 6 2015, 2:47 AM
sys/vm/swap_pager.c
2715

G_CF_RECEIVE on a consumer controls direct dispatch on the completion path. It is safe to do that. So I think that bit is OK. Am I misunderstanding the code in geom_io.c?

I totally agree that a sleepable alloc may backfire. That's a problem in the current system. That's a weakness I hope to address, but not here.

I think having a pool of bios set aside for when we can't allocate out of the normal pool is a good idea. Maybe even one or two chains on that other work I've done would suffice (we can allocate them statically, and then on the completion path have a special case to return the chain to a good state, possibly for the next I/O). This would limit the I/O we could do to one read and one right (in my thinking), but that would guarantee progress when there's no memory for a BIO which would solve the current worst-case scenario where we need to do I/O to recover memory.

I'm not sure I understand the last bit, though. Why is better asynch queueing in GEOM required for us to safely do direct dispatch of down and up? I'm not sure I understand the point well enough since I'm having trouble drawing the connection. Can you clarify perhaps?

sys/vm/swap_pager.c
2715

The swap geom strategy does not sleep on the write, i.e. when the swap is the producer of the free memory. When b_iocmd is BIO_WRITE, g_new_bio() is used, which uses M_NOWAIT.

When that change (to not sleep for bio allocation) in swap geom was introduced, M_NOWAIT combined the semantic of not sleepable allocation and hint to UMA that the request should drain the reservation of free pages deeper than other requests. This was later split into M_USE_RESERVE malloc flag, which is currently not used by g_new_bio().

I fully agree with the note that swap geom bio allocation should be allowed to drain reserve deeper, that the bio zone should have a reserve itself (but I do not see a good way to prevent the reserve drain from the non-priority bio allocations), and ultimately that the whole bio allocation on the write path is just architectural bug.

r287567 / 9e3e3fe5b3300b392b417268eb253b0623e25713 landed this, but didn't close it.