Details

Reviewers

cem
kib

Commits

rS349309: MFC r348737:
rS349307: MFC r348737:
rS348737: Add a testing facility to manually reclaim a vnode

Summary

Add a debugging facility to manually reclaim a vnode

Add the debug.try_reclaim_vnode sysctl when INVARIANTS is defined. When a
pathname is written to it, it will be reclaimed, as long as it isn't in use
or doomed. The purpose is to gain test coverage for vnode reclamation,
which is otherwise hard to achieve.

Test Plan

Used by the fusefs test suite

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 24723
Build 23485: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Herald added a subscriber: imp. · View Herald TranscriptJun 4 2019, 8:13 PM

Harbormaster completed remote builds in B24687: Diff 58248.Jun 4 2019, 8:13 PM

@markj @andrew @tuexen @emaste would this be useful for the Syzkaller project?

I suspect it could be more useful to pass a file descriptor number to the sysctl, leaving path manipulation to open. E.g., this way it would be possible to hack-reclaim unlinked inodes.

sys/kern/vfs_subr.c
430	Why only under INVARIANTS ?
436	We do not allocate so large buffers on stack.
442	I think this is off-by-one since you zero byte at req->newlen. You must check some privilege to do the operation.
452	Indent should be +4 spaces. Previous line is under-filled.
459	Why disallowing v_usecount > 1 ? I think for this tool to be useful, it needs to allow reclaiming open vnodes. Result of '&' is not boolean, you should use != 0.
464	What is the reason to vhold() the vnode for which you already bumped usecount ?

In D20519#443184, @asomers wrote:

@markj @andrew @tuexen @emaste would this be useful for the Syzkaller project?

Yes, I suspect so. I am not sure how to write a syscall description for it: sysctlbyname("debug.try_reclaim_vnode") invokes multiple syscalls under the hood to resolve the sysctl name to a MIB entry.

asomers marked 2 inline comments as done.Jun 4 2019, 9:25 PM

asomers added inline comments.

sys/kern/vfs_subr.c
430	Because it's strictly a facility for testing. It probably isn't even useful for debugging.
442	I left out a privilege check because only root has the ability to write to sysctls. Or is there some case where non-root users can do it too?
459	My intent was to simulate what the vnlru thread might do anyway. But I can remove that restriction if you think it would be useful.
464	Because that's what `devfs_revoke` and `vlrureclaim` do. Are they wrong? Or am I missing something?

In D20519#443203, @markj wrote:

In D20519#443184, @asomers wrote:

@markj @andrew @tuexen @emaste would this be useful for the Syzkaller project?

Yes, I suspect so. I am not sure how to write a syscall description for it: sysctlbyname("debug.try_reclaim_vnode") invokes multiple syscalls under the hood to resolve the sysctl name to a MIB entry.

Would it be useful enough that I should convert it into a syscall?

In D20519#443196, @kib wrote:

I suspect it could be more useful to pass a file descriptor number to the sysctl, leaving path manipulation to open. E.g., this way it would be possible to hack-reclaim unlinked inodes.

OTOH, taking a file descriptor instead of a pathname means that it wouldn't be able to reclaim inactive vnodes. For my purposes that's more important. For example, I want to be able to reclaim a directory that doesn't currently have any open file descriptors.

Respond to kib's comments

Harbormaster completed remote builds in B24688: Diff 58249.Jun 4 2019, 9:34 PM

In D20519#443214, @asomers wrote:

In D20519#443196, @kib wrote:

I suspect it could be more useful to pass a file descriptor number to the sysctl, leaving path manipulation to open. E.g., this way it would be possible to hack-reclaim unlinked inodes.

OTOH, taking a file descriptor instead of a pathname means that it wouldn't be able to reclaim inactive vnodes. For my purposes that's more important. For example, I want to be able to reclaim a directory that doesn't currently have any open file descriptors.

You already cannot do that, because namei() reactivates. This was my point, leave path handling to open(2).

sys/kern/vfs_subr.c
430	It might be useful for debugging, I did not tried it to state this definitely. Currently we achieve the same effect with umount -f.
459	Yes, it would be more useful if there is no such restriction, see above about forced umount.
464	You already took the use count on the vnode, which implies hold count. For vlrureclaim, the function tries hard not to bump usecount to not unfree the vnode. So it needs a hold reference at least. For devfs, devfs_close() unlocks the vnode, again without extra ref, so the vhold() prevent race with a parallel reclaim. Nothing of that is relevant after vget() inside namei(). You own both refs.

asomers added inline comments.Jun 4 2019, 10:04 PM

sys/kern/vfs_subr.c
464	Do you mean `devfs_revoke`? It doesn't unlock the vnode until after the `vhold`/`vdrop`, so how is that relevant?

kib added inline comments.Jun 4 2019, 10:24 PM

sys/kern/vfs_subr.c
464	No, I mean devfs_close(). vgone() calls VOP_CLOSE() when needed. vgone() might also unlock the vnode if there are layered mounts over the vnode' mp like nullfs, but it is not typical for devfs.

cem added inline comments.Jun 5 2019, 12:25 AM

sys/kern/vfs_subr.c
438	Maybe priv check for root here?
442–449	This overflows the buffer when `req->newlen == sizeof(buf)`.
452	You can probably use LOCKSHARED for the lookup to avoid invalidating / blocking on any existing shared holders in the case that the vnode is active.
457	The latter two can be shortened to `NDF_NO_VP_PUT`.
459	Might be more clear to use `atomic_load(vp->v_usecount)`, but no difference on any freebsd arch. I'd pick different errors for v_usecount>1 <=> doomed cases.

In D20519#443220, @kib wrote:

In D20519#443214, @asomers wrote:

In D20519#443196, @kib wrote:

I suspect it could be more useful to pass a file descriptor number to the sysctl, leaving path manipulation to open. E.g., this way it would be possible to hack-reclaim unlinked inodes.

OTOH, taking a file descriptor instead of a pathname means that it wouldn't be able to reclaim inactive vnodes. For my purposes that's more important. For example, I want to be able to reclaim a directory that doesn't currently have any open file descriptors.

You already cannot do that, because namei() reactivates. This was my point, leave path handling to open(2).

Accepting a file descriptor and requiring open(2) still entails an extra VOP_OPEN and VOP_CLOSE that wouldn't otherwise be necessary, for example for directories that were traversed by lookup but never actually opened.

sys/kern/vfs_subr.c
430	Ok, I'll enable it unconditionally.

In D20519#443213, @asomers wrote:

In D20519#443203, @markj wrote:

In D20519#443184, @asomers wrote:

@markj @andrew @tuexen @emaste would this be useful for the Syzkaller project?

Yes, I suspect so. I am not sure how to write a syscall description for it: sysctlbyname("debug.try_reclaim_vnode") invokes multiple syscalls under the hood to resolve the sysctl name to a MIB entry.

Would it be useful enough that I should convert it into a syscall?

I think that would be overkill. There are other sysctls we might want to write.

asomers marked 3 inline comments as done.Jun 5 2019, 12:35 AM

asomers added inline comments.

sys/kern/vfs_subr.c
438	Since this is a sysctl handler, the thread must be privileged.
452	But the vnode needs to be locked exclusively for `VOP_RECLAIM`.
459	Like what, `EAGAIN`? I don't want to return 0, because a test program may want to know whether the sysctl had any effect.

cem added inline comments.Jun 5 2019, 1:45 AM

sys/kern/vfs_subr.c
452	You can upgrade the lock after performing usecount check. This avoids unnecessarily xlocking vnodes the routine isn’t going to act on anyway.
459	Like Ebusy for usecount and ebadf for doomed.

asomers marked an inline comment as done.Jun 5 2019, 3:34 PM

asomers added inline comments.

sys/kern/vfs_subr.c
452	I'm removing the usecount check at kib's suggestion. And the doomed check should very rarely fail. So there will no longer be any reason to do the lookup with a shared lock and then upgrade.
459	I think that `EBADF` is not a good choice in this case, since a doomed vnode is one that is already being recycled. That's an argument for returning 0. However, I do think that a test case might want to know whether the sysctl had any effect at all, so I don't want to return 0 in that case. I think `EAGAIN` is appropriate, because once the doomed vnode is fully reclaimed a subsequent call will succeed (because namei will create a new vnode if necessary).

Remove INVARIANTS check, remove vhold, and add comments.

Harbormaster completed remote builds in B24697: Diff 58266.Jun 5 2019, 3:35 PM

cem added inline comments.Jun 5 2019, 4:27 PM

sys/kern/vfs_subr.c
452	Yes, the optimization was only valid if we were checking usecount.
459	`EBADF` is consistent with the spirit / status of existing doomed vnodes, i.e., `dead_vnodeops` mostly return `EBADF`. Arguably, there should be a distinct error code between "that fd isn't valid in the fd table" and "the referenced vnode has actually been torn down/revoked under you", but that's outside the scope of this revision. I don't believe EAGAIN makes sense in this context.

asomers added inline comments.Jun 5 2019, 4:32 PM

sys/kern/vfs_subr.c
459	It's probably impossible to get into that situation, anyway. Even if the vnode were doomed when the sysctl started, `namei` would allocate a new one. And it can't become doomed after `namei` returns since it's locked and referenced.

cem added inline comments.Jun 5 2019, 6:51 PM

sys/kern/vfs_subr.c
459	Sure; hypothetically you could have two APIs here. One that takes a path and does namei, and another that takes an fd. Both are interesting cases. The fd case could be DOOMED, although you're correct that the current lookup LOCKLEAF cannot succeed and return a doomed vnode (AFAIK).

kib added inline comments.Jun 5 2019, 7:25 PM

sys/kern/vfs_subr.c
473	I do not see how error can be non-zero here.

asomers added inline comments.Jun 5 2019, 7:26 PM

sys/kern/vfs_subr.c
473	Oops. That's leftover from the `LK_UPGRADE` that I briefly added, but then removed.

Delete dead code

Harbormaster completed remote builds in B24705: Diff 58279.Jun 5 2019, 7:33 PM

I think this version of code is technically correct. I still would prefer the fd-based sysctl instead.

sys/kern/vfs_subr.c
441	I do not see a need in this blank line.
454	Note that vnode can be alive but the path not cached in namecache, so this comment is misleading, unless made more precise.

This revision is now accepted and ready to land.Jun 6 2019, 11:00 AM

At @kib's request and @cem's suggestion, add a second sysctl that operates on file descriptors instead of paths.

This revision now requires review to proceed.Jun 6 2019, 2:06 PM

Herald added a subscriber: jonathan. · View Herald TranscriptJun 6 2019, 2:06 PM

Harbormaster completed remote builds in B24721: Diff 58307.Jun 6 2019, 2:06 PM

kib added inline comments.Jun 6 2019, 2:24 PM

sys/kern/vfs_subr.c
405	For me, ioctl_rights check does not sound right. There is no specific rights which would match this operation, of course, but IMO fcntl or unlink are somewhat closer in spirit to what is done.
410	The flag must be checked under the vnode lock.
416	If you not pass LK_RETRY, then vn_lock() returns error for doomed vnode, which removes the need to manually check for VI_DOOMED.

asomers marked 3 inline comments as done.Jun 6 2019, 2:31 PM

asomers added inline comments.

sys/kern/vfs_subr.c
405	I don't think that unlink rights are correct, because unlink affects the data on disk. I'll use fcntl.

Respond to @kib's comments RE ftry_reclaim_vnode

Harbormaster completed remote builds in B24723: Diff 58309.Jun 6 2019, 2:35 PM

kib accepted this revision.Jun 6 2019, 2:49 PM

kib added inline comments.

sys/kern/vfs_subr.c
397	Do not use initialization in local's declaration.

This revision is now accepted and ready to land.Jun 6 2019, 2:49 PM

Closed by commit rS348737: Add a testing facility to manually reclaim a vnode (authored by asomers). · Explain WhyJun 6 2019, 3:05 PM

This revision was automatically updated to reflect the committed changes.

asomers added a commit: rS349307: MFC r348737:.Jun 23 2019, 1:44 PM

asomers added a commit: rS349309: MFC r348737:.Jun 23 2019, 3:09 PM

emaste added inline comments.Oct 2 2019, 12:55 PM

head/sys/kern/vfs_subr.c
357 ↗	(On Diff #58311)	why the `+1` here?

asomers added inline comments.Oct 2 2019, 1:07 PM

head/sys/kern/vfs_subr.c
357 ↗	(On Diff #58311)	So the argument can be as long as PATH_MAX, without nul-termination. nul-termination is ensured 5 lines lower.

emaste added inline comments.Oct 2 2019, 1:36 PM

head/sys/kern/vfs_subr.c
357 ↗	(On Diff #58311)	But `PATH_MAX` / `MAXPATHLEN` includes space for the NUL. (I added the original comment above while trying to understand the situation here, and have now submitted D21876 which I think makes this more clear.)

asomers mentioned this in rG19ab36104534: fusefs: in the tests, always assume debug.try_reclaim_vnode is available.Jan 1 2022, 3:39 AM

asomers mentioned this in rGbfffd3510807: fusefs: in the tests, always assume debug.try_reclaim_vnode is available.Jan 18 2022, 1:13 AM

asomers mentioned this in rG542dbacf596e: fusefs: in the tests, always assume debug.try_reclaim_vnode is available.Jan 18 2022, 2:29 AM

Add a debugging facility to manually reclaim a vnode
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 58309

sys/kern/vfs_subr.c

Add a debugging facility to manually reclaim a vnodeClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 58309

sys/kern/vfs_subr.c

Add a debugging facility to manually reclaim a vnode
ClosedPublic
Actions

Revision Contents
Changeset List