Page MenuHomeFreeBSD

Add a debugging facility to manually reclaim a vnode
ClosedPublic

Authored by asomers on Jun 4 2019, 8:13 PM.

Details

Summary

Add a debugging facility to manually reclaim a vnode

Add the debug.try_reclaim_vnode sysctl when INVARIANTS is defined. When a
pathname is written to it, it will be reclaimed, as long as it isn't in use
or doomed. The purpose is to gain test coverage for vnode reclamation,
which is otherwise hard to achieve.

Test Plan

Used by the fusefs test suite

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@markj @andrew @tuexen @emaste would this be useful for the Syzkaller project?

kib added a comment.Jun 4 2019, 8:49 PM

I suspect it could be more useful to pass a file descriptor number to the sysctl, leaving path manipulation to open. E.g., this way it would be possible to hack-reclaim unlinked inodes.

sys/kern/vfs_subr.c
342 ↗(On Diff #58248)

Why only under INVARIANTS ?

348 ↗(On Diff #58248)

We do not allocate so large buffers on stack.

354 ↗(On Diff #58248)

I think this is off-by-one since you zero byte at req->newlen.

You must check some privilege to do the operation.

364 ↗(On Diff #58248)

Indent should be +4 spaces. Previous line is under-filled.

371 ↗(On Diff #58248)

Why disallowing v_usecount > 1 ? I think for this tool to be useful, it needs to allow reclaiming open vnodes.

Result of '&' is not boolean, you should use != 0.

376 ↗(On Diff #58248)

What is the reason to vhold() the vnode for which you already bumped usecount ?

markj added a comment.Jun 4 2019, 9:19 PM

@markj @andrew @tuexen @emaste would this be useful for the Syzkaller project?

Yes, I suspect so. I am not sure how to write a syscall description for it: sysctlbyname("debug.try_reclaim_vnode") invokes multiple syscalls under the hood to resolve the sysctl name to a MIB entry.

asomers marked 2 inline comments as done.Jun 4 2019, 9:25 PM
asomers added inline comments.
sys/kern/vfs_subr.c
342 ↗(On Diff #58248)

Because it's strictly a facility for testing. It probably isn't even useful for debugging.

354 ↗(On Diff #58248)

I left out a privilege check because only root has the ability to write to sysctls. Or is there some case where non-root users can do it too?

371 ↗(On Diff #58248)

My intent was to simulate what the vnlru thread might do anyway. But I can remove that restriction if you think it would be useful.

376 ↗(On Diff #58248)

Because that's what devfs_revoke and vlrureclaim do. Are they wrong? Or am I missing something?

@markj @andrew @tuexen @emaste would this be useful for the Syzkaller project?

Yes, I suspect so. I am not sure how to write a syscall description for it: sysctlbyname("debug.try_reclaim_vnode") invokes multiple syscalls under the hood to resolve the sysctl name to a MIB entry.

Would it be useful enough that I should convert it into a syscall?

In D20519#443196, @kib wrote:

I suspect it could be more useful to pass a file descriptor number to the sysctl, leaving path manipulation to open. E.g., this way it would be possible to hack-reclaim unlinked inodes.

OTOH, taking a file descriptor instead of a pathname means that it wouldn't be able to reclaim inactive vnodes. For my purposes that's more important. For example, I want to be able to reclaim a directory that doesn't currently have any open file descriptors.

asomers updated this revision to Diff 58249.Jun 4 2019, 9:34 PM

Respond to kib's comments

kib added a comment.Jun 4 2019, 9:55 PM
In D20519#443196, @kib wrote:

I suspect it could be more useful to pass a file descriptor number to the sysctl, leaving path manipulation to open. E.g., this way it would be possible to hack-reclaim unlinked inodes.

OTOH, taking a file descriptor instead of a pathname means that it wouldn't be able to reclaim inactive vnodes. For my purposes that's more important. For example, I want to be able to reclaim a directory that doesn't currently have any open file descriptors.

You already cannot do that, because namei() reactivates. This was my point, leave path handling to open(2).

sys/kern/vfs_subr.c
342 ↗(On Diff #58248)

It might be useful for debugging, I did not tried it to state this definitely. Currently we achieve the same effect with umount -f.

371 ↗(On Diff #58248)

Yes, it would be more useful if there is no such restriction, see above about forced umount.

376 ↗(On Diff #58248)

You already took the use count on the vnode, which implies hold count.

For vlrureclaim, the function tries hard not to bump usecount to not unfree the vnode. So it needs a hold reference at least.

For devfs, devfs_close() unlocks the vnode, again without extra ref, so the vhold() prevent race with a parallel reclaim.

Nothing of that is relevant after vget() inside namei(). You own both refs.

asomers added inline comments.Jun 4 2019, 10:04 PM
sys/kern/vfs_subr.c
376 ↗(On Diff #58248)

Do you mean devfs_revoke? It doesn't unlock the vnode until after the vhold/vdrop, so how is that relevant?

kib added inline comments.Jun 4 2019, 10:24 PM
sys/kern/vfs_subr.c
376 ↗(On Diff #58248)

No, I mean devfs_close(). vgone() calls VOP_CLOSE() when needed. vgone() might also unlock the vnode if there are layered mounts over the vnode' mp like nullfs, but it is not typical for devfs.

cem added inline comments.Jun 5 2019, 12:25 AM
sys/kern/vfs_subr.c
350 ↗(On Diff #58248)

Maybe priv check for root here?

354–361 ↗(On Diff #58248)

This overflows the buffer when req->newlen == sizeof(buf).

364 ↗(On Diff #58248)

You can probably use LOCKSHARED for the lookup to avoid invalidating / blocking on any existing shared holders in the case that the vnode is active.

369 ↗(On Diff #58248)

The latter two can be shortened to NDF_NO_VP_PUT.

371 ↗(On Diff #58248)

Might be more clear to use atomic_load(vp->v_usecount), but no difference on any freebsd arch.

I'd pick different errors for v_usecount>1 <=> doomed cases.

asomers marked 2 inline comments as done.Jun 5 2019, 12:27 AM
In D20519#443220, @kib wrote:
In D20519#443196, @kib wrote:

I suspect it could be more useful to pass a file descriptor number to the sysctl, leaving path manipulation to open. E.g., this way it would be possible to hack-reclaim unlinked inodes.

OTOH, taking a file descriptor instead of a pathname means that it wouldn't be able to reclaim inactive vnodes. For my purposes that's more important. For example, I want to be able to reclaim a directory that doesn't currently have any open file descriptors.

You already cannot do that, because namei() reactivates. This was my point, leave path handling to open(2).

Accepting a file descriptor and requiring open(2) still entails an extra VOP_OPEN and VOP_CLOSE that wouldn't otherwise be necessary, for example for directories that were traversed by lookup but never actually opened.

sys/kern/vfs_subr.c
342 ↗(On Diff #58248)

Ok, I'll enable it unconditionally.

markj added a comment.Jun 5 2019, 12:34 AM

@markj @andrew @tuexen @emaste would this be useful for the Syzkaller project?

Yes, I suspect so. I am not sure how to write a syscall description for it: sysctlbyname("debug.try_reclaim_vnode") invokes multiple syscalls under the hood to resolve the sysctl name to a MIB entry.

Would it be useful enough that I should convert it into a syscall?

I think that would be overkill. There are other sysctls we might want to write.

asomers marked 3 inline comments as done.Jun 5 2019, 12:35 AM
asomers added inline comments.
sys/kern/vfs_subr.c
350 ↗(On Diff #58248)

Since this is a sysctl handler, the thread must be privileged.

364 ↗(On Diff #58248)

But the vnode needs to be locked exclusively for VOP_RECLAIM.

371 ↗(On Diff #58248)

Like what, EAGAIN? I don't want to return 0, because a test program may want to know whether the sysctl had any effect.

cem added inline comments.Jun 5 2019, 1:45 AM
sys/kern/vfs_subr.c
364 ↗(On Diff #58248)

You can upgrade the lock after performing usecount check. This avoids unnecessarily xlocking vnodes the routine isn’t going to act on anyway.

371 ↗(On Diff #58248)

Like Ebusy for usecount and ebadf for doomed.

asomers marked an inline comment as done.Jun 5 2019, 3:34 PM
asomers added inline comments.
sys/kern/vfs_subr.c
364 ↗(On Diff #58248)

I'm removing the usecount check at kib's suggestion. And the doomed check should very rarely fail. So there will no longer be any reason to do the lookup with a shared lock and then upgrade.

371 ↗(On Diff #58248)

I think that EBADF is not a good choice in this case, since a doomed vnode is one that is already being recycled. That's an argument for returning 0. However, I do think that a test case might want to know whether the sysctl had any effect at all, so I don't want to return 0 in that case. I think EAGAIN is appropriate, because once the doomed vnode is fully reclaimed a subsequent call will succeed (because namei will create a new vnode if necessary).

asomers updated this revision to Diff 58266.Jun 5 2019, 3:34 PM
asomers marked an inline comment as done.

Remove INVARIANTS check, remove vhold, and add comments.

cem added inline comments.Jun 5 2019, 4:27 PM
sys/kern/vfs_subr.c
364 ↗(On Diff #58248)

Yes, the optimization was only valid if we were checking usecount.

371 ↗(On Diff #58248)

EBADF is consistent with the spirit / status of existing doomed vnodes, i.e., dead_vnodeops mostly return EBADF.

Arguably, there should be a distinct error code between "that fd isn't valid in the fd table" and "the referenced vnode has actually been torn down/revoked under you", but that's outside the scope of this revision.

I don't believe EAGAIN makes sense in this context.

asomers added inline comments.Jun 5 2019, 4:32 PM
sys/kern/vfs_subr.c
371 ↗(On Diff #58248)

It's probably impossible to get into that situation, anyway. Even if the vnode were doomed when the sysctl started, namei would allocate a new one. And it can't become doomed after namei returns since it's locked and referenced.

cem added inline comments.Jun 5 2019, 6:51 PM
sys/kern/vfs_subr.c
371 ↗(On Diff #58248)

Sure; hypothetically you could have two APIs here. One that takes a path and does namei, and another that takes an fd. Both are interesting cases. The fd case could be DOOMED, although you're correct that the current lookup LOCKLEAF cannot succeed and return a doomed vnode (AFAIK).

kib added inline comments.Jun 5 2019, 7:25 PM
sys/kern/vfs_subr.c
385 ↗(On Diff #58266)

I do not see how error can be non-zero here.

asomers added inline comments.Jun 5 2019, 7:26 PM
sys/kern/vfs_subr.c
385 ↗(On Diff #58266)

Oops. That's leftover from the LK_UPGRADE that I briefly added, but then removed.

asomers updated this revision to Diff 58279.Jun 5 2019, 7:33 PM

Delete dead code

kib accepted this revision.Jun 6 2019, 11:00 AM

I think this version of code is technically correct. I still would prefer the fd-based sysctl instead.

sys/kern/vfs_subr.c
353 ↗(On Diff #58279)

I do not see a need in this blank line.

366 ↗(On Diff #58279)

Note that vnode can be alive but the path not cached in namecache, so this comment is misleading, unless made more precise.

This revision is now accepted and ready to land.Jun 6 2019, 11:00 AM
asomers updated this revision to Diff 58307.Jun 6 2019, 2:06 PM

At @kib's request and @cem's suggestion, add a second sysctl that operates on file descriptors instead of paths.

This revision now requires review to proceed.Jun 6 2019, 2:06 PM
kib added inline comments.Jun 6 2019, 2:24 PM
sys/kern/vfs_subr.c
405 ↗(On Diff #58307)

For me, ioctl_rights check does not sound right. There is no specific rights which would match this operation, of course, but IMO fcntl or unlink are somewhat closer in spirit to what is done.

410 ↗(On Diff #58307)

The flag must be checked under the vnode lock.

416 ↗(On Diff #58307)

If you not pass LK_RETRY, then vn_lock() returns error for doomed vnode, which removes the need to manually check for VI_DOOMED.

asomers marked 3 inline comments as done.Jun 6 2019, 2:31 PM
asomers added inline comments.
sys/kern/vfs_subr.c
405 ↗(On Diff #58307)

I don't think that unlink rights are correct, because unlink affects the data on disk. I'll use fcntl.

asomers updated this revision to Diff 58309.Jun 6 2019, 2:35 PM
asomers marked an inline comment as done.

Respond to @kib's comments RE ftry_reclaim_vnode

kib accepted this revision.Jun 6 2019, 2:49 PM
kib added inline comments.
sys/kern/vfs_subr.c
397 ↗(On Diff #58309)

Do not use initialization in local's declaration.

This revision is now accepted and ready to land.Jun 6 2019, 2:49 PM
This revision was automatically updated to reflect the committed changes.
emaste added inline comments.Oct 2 2019, 12:55 PM
head/sys/kern/vfs_subr.c
357

why the +1 here?

asomers added inline comments.Oct 2 2019, 1:07 PM
head/sys/kern/vfs_subr.c
357

So the argument can be as long as PATH_MAX, without nul-termination. nul-termination is ensured 5 lines lower.

emaste added inline comments.Oct 2 2019, 1:36 PM
head/sys/kern/vfs_subr.c
357

But PATH_MAX / MAXPATHLEN includes space for the NUL.

(I added the original comment above while trying to understand the situation here, and have now submitted D21876 which I think makes this more clear.)