Page MenuHomeFreeBSD

Suspend all writeable local filesystems on power suspend.
ClosedPublic

Authored by kib on Nov 2 2020, 5:19 PM.

Details

Summary

This ensures that no writes are pending in memory, either metadata or user data, but not including dirty pages not yet converted to fs writes.

Only filesystems declared local are suspended.

Note that this does not guarantee absence of the metadata errors or leaks if resume is not done: for instance, on UFS unlinked but opened inodes are leaked and require fsck to gc.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

kib requested review of this revision.Nov 2 2020, 5:19 PM
kib created this revision.
sys/dev/xen/control/control.c
208 ↗(On Diff #79071)

Reading the commit log message for r314841, I don't think suspend_all_fs() needs to be done under the xs lock.

sys/kern/vfs_mount.c
2522 ↗(On Diff #79071)

How is safe traversal guaranteed if we are dropping the mountlist lock?

2539 ↗(On Diff #79071)

It might be nice to print an error message here, since the most likely scenario is that fsync() failed.

kib marked 3 inline comments as done.Nov 2 2020, 6:08 PM
kib added inline comments.
sys/kern/vfs_mount.c
2522 ↗(On Diff #79071)

It is enough to guarantee that mp is not unmounted (it is type-stable but we want it on the list). This is provided by vfs_busy().

Additional safety is provided by the fact that I suspended all usermode processes before suspending fs.

BTW I only unbusy on resume, this might be over-reaction but it is easier to structure code this way.

kib marked an inline comment as done.

Print error on suspend failure.
Move suspension out of sx lock (Xen).
Systematically unbusy only after re-acquiring mnt list mtx.

sys/kern/vfs_mount.c
2521 ↗(On Diff #79078)

With user processes suspended, what is the risk that one of them holds this lock but is waiting?

Is that enough to also guard against a filesystem that goes away due to automatic unmount too which might still race this?

2526 ↗(On Diff #79078)

Why just the local fs? Not necessarily a problem, but wondering the reason

kib marked 2 inline comments as done.Nov 2 2020, 6:36 PM
kib added inline comments.
sys/kern/vfs_mount.c
2521 ↗(On Diff #79078)

Unless there is a bug in the process suspension, it should be safe. Suspension cares to only claim that the process is suspended for ALL_PROC case when all threads are either at kernel->user boundary, or interruptible safe-sleep (so for instance NFS intr sleep while owning a vnode or buffer lock is not safe).

Do we already have async unmount in tree ? That said, vfs_busy() syncs with unmount, this is the reason for it there.

2526 ↗(On Diff #79078)

FIlesystem consistency for non-local fs is the problem of remote machine, not our.

That said, non-local filesystems take non-deterministic time to sync, and really nobody tried to suspend e.g. NFS. Lets work out the most important case first.

kib marked 2 inline comments as done.Nov 2 2020, 6:43 PM

Great! I'll test this to see if it gives similar results to my super-ugly kludge.

markj added inline comments.
sys/sys/mount.h
466 ↗(On Diff #79078)

"all-fs"

This revision is now accepted and ready to land.Nov 2 2020, 7:29 PM
kib marked an inline comment as done.Nov 2 2020, 8:49 PM
In D27054#603878, @imp wrote:

Great! I'll test this to see if it gives similar results to my super-ugly kludge.

It might not work from the first version, I did not tested it.

In D27054#603920, @kib wrote:
In D27054#603878, @imp wrote:

Great! I'll test this to see if it gives similar results to my super-ugly kludge.

It might not work from the first version, I did not tested it.

The PoC that I posted wasn't the first version I wrote... it was the first to not panic when I suspended half a dozen times :)

In D27054#603920, @kib wrote:
In D27054#603878, @imp wrote:

Great! I'll test this to see if it gives similar results to my super-ugly kludge.

It might not work from the first version, I did not tested it.

The testing results. I booted and started X running gnome, but only an terminal that I typed sudo xxx into:

With your patches: 4 times in a row it survived suspend. However, after the crash on resume all 4 times there were ~4-5 unreferenced inodes cleared.
With my patches: 4 out of 4 times I just got a report that the filesystem was dirty, but fsck was otherwise quiet. I'll have to repeat my testing to confirm my results...

Without any patches: dozens of messages from fsck fixing things...

So it's quite a bit better. And I'm guessing it's the open and unlinked issue, but I've not confirmed that, nor do I have a good explanation why I didn't see issues on my testing.

So I guess 'worked as designed' is my testing results.

Reverse suspend order. See comment for explanation.

This revision now requires review to proceed.Nov 3 2020, 12:04 PM

More testing shows my kludge acting the same, so I'm doubly on board. Thanks.

sys/kern/vfs_mount.c
2517 ↗(On Diff #79123)

"where a filesystem mounted from a md(4) vnode-backed device should be suspended before the filesystem that owns the vnode" is a bit clearer to me.

kib marked an inline comment as done.

Update comment.

This revision is now accepted and ready to land.Nov 4 2020, 6:36 PM

Weaken assert. Not all filesystems are UFS and support SUSPEND2 suspend.

Patch was tested by pho.

This revision now requires review to proceed.Nov 5 2020, 3:01 PM
This revision is now accepted and ready to land.Nov 5 2020, 7:17 PM

Two questions.

head/sys/kern/vfs_mount.c
2518

How are you ensuring that it handles the case where a filesystem mounted from a md(4) vnode-backed device is suspended before the filesystem that owns the vnode?

2530

If this vfs_busy() fails, then this filesystem will not be suspended. Doesn't this result in less than all local filesystems being suspended?

kib marked 2 inline comments as done.Nov 6 2020, 6:26 PM
kib added inline comments.
head/sys/kern/vfs_mount.c
2518

Nmount(2) adds new mount at the end of the mountlist. Because to configure vnode-backed md(4) you need to access the backing file, it means that mount over md(4) must appear after its backing filesystem in the mountlist. Iterating in reverse order makes me handle md-mount before backing mount.

2530

Failing vfs_busy() means that mp is currently being unmounted. It cannot be user-mode request for unmount, because all user processes were suspended by suspend_all_procs() done before suspend_all_fs(). As such, vfs_busy() either cannot fail at all, or if we actually have 'unmount on failure' code, which must use forced unmount, we cannot care less about this mount for suspension purposes.

head/sys/kern/vfs_mount.c
2530

The goal here is to make a best-effort at persisting as much filesystem state as is practical in case we never resume for some reason. If some fail to do that for some filesystems due to edge cases, then we're still better off than before where no effort was made. So long as we deal properly with the case where we're flushing data to storage, that storage fails in a way that causes an forced unmount, we're OK. "Properly" here I think is covered: the filesystem fails at some point and it's unmounting doesn't cause the list to be corrupt such that we cant finish walking it.

But now I see lots of mtx_lock(&mountlist_mtx) below, but no unlocks in the loop. is that implicitly unlocked somewhere? Or am I overlooking something? I see the unlock / lock dance in resume_all_fs()

kib marked 3 inline comments as done.Nov 6 2020, 6:55 PM
kib added inline comments.
head/sys/kern/vfs_mount.c
2530

vfs_busy(MBF_MNTLSTLOCK) assumes that mountlist_mtx is locked and drop it.