Page MenuHomeFreeBSD

Make swapoff reliable.
ClosedPublic

Authored by kib on Aug 29 2016, 8:02 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Nov 24, 9:30 PM
Unknown Object (File)
Tue, Nov 19, 11:37 AM
Unknown Object (File)
Mon, Nov 18, 5:05 PM
Unknown Object (File)
Mon, Nov 18, 2:51 PM
Unknown Object (File)
Sun, Nov 17, 9:31 PM
Unknown Object (File)
Sun, Nov 17, 4:14 PM
Unknown Object (File)
Thu, Nov 7, 1:32 AM
Unknown Object (File)
Thu, Nov 7, 1:32 AM
Subscribers

Details

Summary

swap_pager_swapoff() does trylock for pagein, which means that either io to md(4) over swap, or intensive page faults can prevent swapoff() from any progress. Then the retry < 100 check fails and machine panics.

If trylock fails, acquire the object lock in the blockable way and restart the hash bucket walk. I kept retries logic for now.

Another improvement there would be to check the ENOMEM conditions inside swap_pager_swapoff() before each page-in attempt. similar to the ENOMEM check in swapoff_one(), but this is non-trivial because swapoff_one() fills sw_blist, which must be undone on failure.

Test Plan

Peter has specific test that triggers the panic in swap_pager_swapoff(): failed to locate XXX swap blocks. The patched system survives it.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib retitled this revision from to Make swapoff reliable..
kib updated this object.
kib edited the test plan for this revision. (Show Details)
kib added reviewers: alc, markj.
kib set the repository for this revision to rS FreeBSD src repository - subversion.
kib added a subscriber: pho.
markj edited edge metadata.

This seems ok to me. Is there a specific reason to keep the retry logic?

sys/vm/swap_pager.c
1691 ↗(On Diff #19782)

Missing period.

This revision is now accepted and ready to land.Aug 29 2016, 5:36 PM
In D7688#159797, @markj wrote:

This seems ok to me. Is there a specific reason to keep the retry logic?

One vague reason is that I am not completely convinced that retry per bucket logic is enough.

But more concrete thought is that this logic would allow for more reliability later. Please look at the swp_pager_force_pagein(). Right now this function simply panic if any error is returned by getpages(). By passing the error and retrying if needed we could improve the case.

On my previous job we did see "swap_pager: indefinite wait buffer" on heavily loaded machines.

kib marked an inline comment as done.Aug 29 2016, 6:24 PM
alc edited edge metadata.
This revision was automatically updated to reflect the committed changes.