Details

Reviewers

imp
mckusick
markj

Commits

rS367398: Suspend all writeable local filesystems on power suspend.

Summary

This ensures that no writes are pending in memory, either metadata or user data, but not including dirty pages not yet converted to fs writes.

Only filesystems declared local are suspended.

Note that this does not guarantee absence of the metadata errors or leaks if resume is not done: for instance, on UFS unlinked but opened inodes are leaked and require fsck to gc.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib requested review of this revision.Nov 2 2020, 5:19 PM

kib created this revision.

Harbormaster completed remote builds in B34566: Diff 79071.Nov 2 2020, 5:19 PM

kib mentioned this in D27042: POC code to sync on suspend.Nov 2 2020, 5:19 PM

markj added inline comments.Nov 2 2020, 5:49 PM

sys/dev/xen/control/control.c
208 ↗	(On Diff #79071)	Reading the commit log message for r314841, I don't think suspend_all_fs() needs to be done under the xs lock.
sys/kern/vfs_mount.c
2522 ↗	(On Diff #79071)	How is safe traversal guaranteed if we are dropping the mountlist lock?
2539 ↗	(On Diff #79071)	It might be nice to print an error message here, since the most likely scenario is that fsync() failed.

kib marked 3 inline comments as done.Nov 2 2020, 6:08 PM

kib added inline comments.

sys/kern/vfs_mount.c
2522 ↗	(On Diff #79071)	It is enough to guarantee that mp is not unmounted (it is type-stable but we want it on the list). This is provided by vfs_busy(). Additional safety is provided by the fact that I suspended all usermode processes before suspending fs. BTW I only unbusy on resume, this might be over-reaction but it is easier to structure code this way.

Print error on suspend failure.
Move suspension out of sx lock (Xen).
Systematically unbusy only after re-acquiring mnt list mtx.

Harbormaster completed remote builds in B34569: Diff 79078.Nov 2 2020, 6:09 PM

imp added inline comments.Nov 2 2020, 6:19 PM

sys/kern/vfs_mount.c
2521 ↗	(On Diff #79078)	With user processes suspended, what is the risk that one of them holds this lock but is waiting? Is that enough to also guard against a filesystem that goes away due to automatic unmount too which might still race this?
2526 ↗	(On Diff #79078)	Why just the local fs? Not necessarily a problem, but wondering the reason

kib marked 2 inline comments as done.Nov 2 2020, 6:36 PM

kib added inline comments.

sys/kern/vfs_mount.c
2521 ↗	(On Diff #79078)	Unless there is a bug in the process suspension, it should be safe. Suspension cares to only claim that the process is suspended for ALL_PROC case when all threads are either at kernel->user boundary, or interruptible safe-sleep (so for instance NFS intr sleep while owning a vnode or buffer lock is not safe). Do we already have async unmount in tree ? That said, vfs_busy() syncs with unmount, this is the reason for it there.
2526 ↗	(On Diff #79078)	FIlesystem consistency for non-local fs is the problem of remote machine, not our. That said, non-local filesystems take non-deterministic time to sync, and really nobody tried to suspend e.g. NFS. Lets work out the most important case first.

kib marked 2 inline comments as done.Nov 2 2020, 6:43 PM

Great! I'll test this to see if it gives similar results to my super-ugly kludge.

markj accepted this revision.Nov 2 2020, 7:29 PM

markj added inline comments.

sys/sys/mount.h
466 ↗	(On Diff #79078)	"all-fs"

This revision is now accepted and ready to land.Nov 2 2020, 7:29 PM

In D27054#603878, @imp wrote:

Great! I'll test this to see if it gives similar results to my super-ugly kludge.

It might not work from the first version, I did not tested it.

In D27054#603920, @kib wrote:

In D27054#603878, @imp wrote:

Great! I'll test this to see if it gives similar results to my super-ugly kludge.

It might not work from the first version, I did not tested it.

The PoC that I posted wasn't the first version I wrote... it was the first to not panic when I suspended half a dozen times :)

In D27054#603920, @kib wrote:

In D27054#603878, @imp wrote:

Great! I'll test this to see if it gives similar results to my super-ugly kludge.

It might not work from the first version, I did not tested it.

The testing results. I booted and started X running gnome, but only an terminal that I typed sudo xxx into:

With your patches: 4 times in a row it survived suspend. However, after the crash on resume all 4 times there were ~4-5 unreferenced inodes cleared.
With my patches: 4 out of 4 times I just got a report that the filesystem was dirty, but fsck was otherwise quiet. I'll have to repeat my testing to confirm my results...

Without any patches: dozens of messages from fsck fixing things...

So it's quite a bit better. And I'm guessing it's the open and unlinked issue, but I've not confirmed that, nor do I have a good explanation why I didn't see issues on my testing.

So I guess 'worked as designed' is my testing results.

Reverse suspend order. See comment for explanation.

This revision now requires review to proceed.Nov 3 2020, 12:04 PM

Harbormaster completed remote builds in B34595: Diff 79123.Nov 3 2020, 12:05 PM

emaste added a subscriber: emaste.Nov 3 2020, 3:28 PM

More testing shows my kludge acting the same, so I'm doubly on board. Thanks.

markj added inline comments.Nov 4 2020, 5:41 PM

sys/kern/vfs_mount.c
2517 ↗	(On Diff #79123)	"where a filesystem mounted from a md(4) vnode-backed device should be suspended before the filesystem that owns the vnode" is a bit clearer to me.

Update comment.

Harbormaster completed remote builds in B34628: Diff 79182.Nov 4 2020, 6:13 PM

markj accepted this revision.Nov 4 2020, 6:36 PM

This revision is now accepted and ready to land.Nov 4 2020, 6:36 PM

Weaken assert. Not all filesystems are UFS and support SUSPEND2 suspend.

Patch was tested by pho.

This revision now requires review to proceed.Nov 5 2020, 3:01 PM

kib edited the test plan for this revision. (Show Details)Nov 5 2020, 3:01 PM

Harbormaster completed remote builds in B34651: Diff 79217.Nov 5 2020, 3:01 PM

markj accepted this revision.Nov 5 2020, 7:17 PM

This revision is now accepted and ready to land.Nov 5 2020, 7:17 PM

Closed by commit rS367398: Suspend all writeable local filesystems on power suspend. (authored by kib). · Explain WhyNov 5 2020, 8:53 PM

This revision was automatically updated to reflect the committed changes.

kib added a commit: rS367398: Suspend all writeable local filesystems on power suspend..

Two questions.

head/sys/kern/vfs_mount.c
2518	How are you ensuring that it handles the case where a filesystem mounted from a md(4) vnode-backed device is suspended before the filesystem that owns the vnode?
2530	If this vfs_busy() fails, then this filesystem will not be suspended. Doesn't this result in less than all local filesystems being suspended?

kib marked 2 inline comments as done.Nov 6 2020, 6:26 PM

kib added inline comments.

head/sys/kern/vfs_mount.c
2518	Nmount(2) adds new mount at the end of the mountlist. Because to configure vnode-backed md(4) you need to access the backing file, it means that mount over md(4) must appear after its backing filesystem in the mountlist. Iterating in reverse order makes me handle md-mount before backing mount.
2530	Failing vfs_busy() means that mp is currently being unmounted. It cannot be user-mode request for unmount, because all user processes were suspended by suspend_all_procs() done before suspend_all_fs(). As such, vfs_busy() either cannot fail at all, or if we actually have 'unmount on failure' code, which must use forced unmount, we cannot care less about this mount for suspension purposes.

imp added inline comments.Nov 6 2020, 6:39 PM

head/sys/kern/vfs_mount.c
2530	The goal here is to make a best-effort at persisting as much filesystem state as is practical in case we never resume for some reason. If some fail to do that for some filesystems due to edge cases, then we're still better off than before where no effort was made. So long as we deal properly with the case where we're flushing data to storage, that storage fails in a way that causes an forced unmount, we're OK. "Properly" here I think is covered: the filesystem fails at some point and it's unmounting doesn't cause the list to be corrupt such that we cant finish walking it. But now I see lots of mtx_lock(&mountlist_mtx) below, but no unlocks in the loop. is that implicitly unlocked somewhere? Or am I overlooking something? I see the unlock / lock dance in resume_all_fs()

kib marked 3 inline comments as done.Nov 6 2020, 6:55 PM

kib added inline comments.

head/sys/kern/vfs_mount.c
2530	vfs_busy(MBF_MNTLSTLOCK) assumes that mountlist_mtx is locked and drop it.

Suspend all writeable local filesystems on power suspend.
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 79232

head/sys/dev/acpica/acpi.c

head/sys/dev/xen/control/control.c

head/sys/kern/vfs_mount.c

head/sys/sys/mount.h

Suspend all writeable local filesystems on power suspend.ClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 79232

head/sys/dev/acpica/acpi.c

head/sys/dev/xen/control/control.c

head/sys/kern/vfs_mount.c

head/sys/sys/mount.h

Suspend all writeable local filesystems on power suspend.
ClosedPublic
Actions

Revision Contents
Changeset List