Page MenuHomeFreeBSD

vn_start_write(): consistently set *mpp to NULL on error or after failed sleep
ClosedPublic

Authored by kib on Apr 5 2023, 9:02 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, May 14, 6:45 AM
Unknown Object (File)
Thu, May 14, 5:48 AM
Unknown Object (File)
Thu, May 14, 5:42 AM
Unknown Object (File)
Thu, May 14, 5:33 AM
Unknown Object (File)
Thu, May 14, 2:43 AM
Unknown Object (File)
Wed, May 13, 11:17 PM
Unknown Object (File)
Wed, May 13, 11:17 PM
Unknown Object (File)
Mon, May 11, 8:11 PM

Details

Summary
This ensures that *mpp != NULL iff vn_finished_write() should be called,
regardless of the returned error.  The only exception that must be
maintained is the case where vn_start_write(V_NOWAIT) is called with the
intent of later dropping other locks and then doing
vn_start_write(V_XSLEEP), which needs the mp value calculated from the
non-waitable call above it.  Add V_FORXSLEEP flag, which could be
combined with V_NOWAIT, and use it in such situations.

Also note that V_XSLEEP is not supported by vn_start_secondary_write().

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.Apr 5 2023, 9:02 PM

i have a forgotten branch to sort out this api, i'll refresh it

in the meantime this is probably good

This revision is now accepted and ready to land.Apr 5 2023, 9:04 PM
kib edited the summary of this revision. (Show Details)

The previous version was too simple-minded, it breaks the common pattern of vn_start_write(&mp, V_NOWAIT) and then vn_start_write(&mp, V_XSLEEP). V_NOWAIT call in such cases must return mp.

Currently being tested by Peter.

This revision now requires review to proceed.Apr 6 2023, 6:21 PM

I don't understand what problem this is solving. Why is it so important to avoid setting *mpp in the failure case that we need to add a new flag?

sys/kern/vfs_vnops.c
1947

This could still be return (vn_start_write_refed(mpp, flags, false));?

I don't understand what problem this is solving. Why is it so important to avoid setting *mpp in the failure case that we need to add a new flag?

I'm kinda the guilty party here. In D39419 pjd@ noted that I
checked mp != NULL to decide to call vn_finished_write().
I countered that when the vp argument != NULL, that this
was sufficient (you can actually call it unconditionally, since
it checks for mp == NULL and just returns).
This inspired kib@ to make vn_start_write() always set mp non-NULL
if vn_finished_write() needs to be called and mp == NULL when it does
not need to be called.
--> However, it has turned out that this cannot be done for the V_XSLEEP

case with the vp argument == NULL.

To be honest, if I was kib@, I'd just abandon the exercise, but that is
obviously up to him.

kib marked an inline comment as done.

Remove unneeded chunk

I don't understand what problem this is solving. Why is it so important to avoid setting *mpp in the failure case that we need to add a new flag?

I'm kinda the guilty party here. In D39419 pjd@ noted that I
checked mp != NULL to decide to call vn_finished_write().
I countered that when the vp argument != NULL, that this
was sufficient (you can actually call it unconditionally, since
it checks for mp == NULL and just returns).
This inspired kib@ to make vn_start_write() always set mp non-NULL
if vn_finished_write() needs to be called and mp == NULL when it does
not need to be called.
--> However, it has turned out that this cannot be done for the V_XSLEEP

case with the vp argument == NULL.

To be honest, if I was kib@, I'd just abandon the exercise, but that is
obviously up to him.

The V_XSLEEP is a very special thing, I do not believe that it is needed anywhere outside the very core VFS, and even there it could be done differently (but not in this patch).
The goal is to reduce confusion for VFS consumers, in particular, by maintaining the invariant that *mpp != NULL after vn_start_write() implies the need to call vn_finish_write().
IMO it is worth a single flag.

Doesn't the call in kern_sync() also need adjustment? If vn_start_write() fails, the rest of the loop body expects mp != NULL. Same in linux_syncfs(), I believe.

The call in vm_pageout_clean() no longer needs to set mp = NULL after vn_start_write() fails.

The V_XSLEEP is a very special thing, I do not believe that it is needed anywhere outside the very core VFS, and even there it could be done differently (but not in this patch).
The goal is to reduce confusion for VFS consumers, in particular, by maintaining the invariant that *mpp != NULL after vn_start_write() implies the need to call vn_finish_write().

vn_start_write(V_NOWAIT) is also not used much outside of the core VFS. IMHO it would be a bit simpler and more helpful to preserve *mpp unconditionally when that flag is set, instead of requiring a new one.

Only zero *mpp for !V_NOWAIT calls.

This revision is now accepted and ready to land.Apr 10 2023, 2:09 PM
bdrewery added inline comments.
sys/kern/vfs_vnops.c
1943–1947

Is this leaking the vfs_ref(mp) on error from vn_start_write_reffed?

Doesn't the call in kern_sync() also need adjustment? If vn_start_write() fails, the rest of the loop body expects mp != NULL. Same in linux_syncfs(), I believe.

In fact this is a good example of the V_NOWAIT use alone, same as vm_pageout.

The call in vm_pageout_clean() no longer needs to set mp = NULL after vn_start_write() fails.

The V_XSLEEP is a very special thing, I do not believe that it is needed anywhere outside the very core VFS, and even there it could be done differently (but not in this patch).
The goal is to reduce confusion for VFS consumers, in particular, by maintaining the invariant that *mpp != NULL after vn_start_write() implies the need to call vn_finish_write().

vn_start_write(V_NOWAIT) is also not used much outside of the core VFS. IMHO it would be a bit simpler and more helpful to preserve *mpp unconditionally when that flag is set, instead of requiring a new one.

sys/kern/vfs_vnops.c
1943–1947

vn_start_write_reffed() does the following on error path:

	if (error != 0 || (flags & V_XSLEEP) != 0)
		MNT_REL(mp);

Do you mean that the cleanup above is not enough?

sys/kern/vfs_vnops.c
1943–1947

Thanks for checking. It's not obvious at quick glance that MNT_REL is the same reference as vfs_ref.
Looking at it further vfs_ref is using a per-cpu ref count (commit 4cace859c2a) while MNT_REL is not. It's still not obvious to me if it is safe to use vfs_ref -> MNT_REL here.

sys/kern/vfs_vnops.c
1943–1947

There is an IPI dance in vfs_op_thread_enter() that ensures that we are either using pcpu counters (when there is no 'slow' ops like suspend) or global counter protected by mnt interlock, when some slow op was requested. I think it is better to read the comment in the code and overlook its structure than to try to repeat it there trying to explain.

sys/kern/vfs_vnops.c
1943–1947

Agreed. Appreciate your response, thank you.