Page MenuHomeFreeBSD

Add a debugging facility to manually reclaim a vnode
ClosedPublic

Authored by asomers on Jun 4 2019, 8:13 PM.
Tags
None
Referenced Files
F103389089: D20519.id58249.diff
Sun, Nov 24, 9:33 AM
Unknown Object (File)
Sat, Nov 23, 10:29 AM
Unknown Object (File)
Thu, Nov 21, 7:05 AM
Unknown Object (File)
Tue, Nov 5, 8:42 PM
Unknown Object (File)
Oct 2 2024, 7:37 PM
Unknown Object (File)
Sep 24 2024, 4:09 AM
Unknown Object (File)
Sep 22 2024, 11:45 PM
Unknown Object (File)
Sep 22 2024, 11:28 AM

Details

Summary

Add a debugging facility to manually reclaim a vnode

Add the debug.try_reclaim_vnode sysctl when INVARIANTS is defined. When a
pathname is written to it, it will be reclaimed, as long as it isn't in use
or doomed. The purpose is to gain test coverage for vnode reclamation,
which is otherwise hard to achieve.

Test Plan

Used by the fusefs test suite

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 24723
Build 23485: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@markj @andrew @tuexen @emaste would this be useful for the Syzkaller project?

I suspect it could be more useful to pass a file descriptor number to the sysctl, leaving path manipulation to open. E.g., this way it would be possible to hack-reclaim unlinked inodes.

sys/kern/vfs_subr.c
430

Why only under INVARIANTS ?

436

We do not allocate so large buffers on stack.

442

I think this is off-by-one since you zero byte at req->newlen.

You must check some privilege to do the operation.

452

Indent should be +4 spaces. Previous line is under-filled.

459

Why disallowing v_usecount > 1 ? I think for this tool to be useful, it needs to allow reclaiming open vnodes.

Result of '&' is not boolean, you should use != 0.

464

What is the reason to vhold() the vnode for which you already bumped usecount ?

@markj @andrew @tuexen @emaste would this be useful for the Syzkaller project?

Yes, I suspect so. I am not sure how to write a syscall description for it: sysctlbyname("debug.try_reclaim_vnode") invokes multiple syscalls under the hood to resolve the sysctl name to a MIB entry.

asomers added inline comments.
sys/kern/vfs_subr.c
430

Because it's strictly a facility for testing. It probably isn't even useful for debugging.

442

I left out a privilege check because only root has the ability to write to sysctls. Or is there some case where non-root users can do it too?

459

My intent was to simulate what the vnlru thread might do anyway. But I can remove that restriction if you think it would be useful.

464

Because that's what devfs_revoke and vlrureclaim do. Are they wrong? Or am I missing something?

@markj @andrew @tuexen @emaste would this be useful for the Syzkaller project?

Yes, I suspect so. I am not sure how to write a syscall description for it: sysctlbyname("debug.try_reclaim_vnode") invokes multiple syscalls under the hood to resolve the sysctl name to a MIB entry.

Would it be useful enough that I should convert it into a syscall?

In D20519#443196, @kib wrote:

I suspect it could be more useful to pass a file descriptor number to the sysctl, leaving path manipulation to open. E.g., this way it would be possible to hack-reclaim unlinked inodes.

OTOH, taking a file descriptor instead of a pathname means that it wouldn't be able to reclaim inactive vnodes. For my purposes that's more important. For example, I want to be able to reclaim a directory that doesn't currently have any open file descriptors.

Respond to kib's comments

In D20519#443196, @kib wrote:

I suspect it could be more useful to pass a file descriptor number to the sysctl, leaving path manipulation to open. E.g., this way it would be possible to hack-reclaim unlinked inodes.

OTOH, taking a file descriptor instead of a pathname means that it wouldn't be able to reclaim inactive vnodes. For my purposes that's more important. For example, I want to be able to reclaim a directory that doesn't currently have any open file descriptors.

You already cannot do that, because namei() reactivates. This was my point, leave path handling to open(2).

sys/kern/vfs_subr.c
430

It might be useful for debugging, I did not tried it to state this definitely. Currently we achieve the same effect with umount -f.

459

Yes, it would be more useful if there is no such restriction, see above about forced umount.

464

You already took the use count on the vnode, which implies hold count.

For vlrureclaim, the function tries hard not to bump usecount to not unfree the vnode. So it needs a hold reference at least.

For devfs, devfs_close() unlocks the vnode, again without extra ref, so the vhold() prevent race with a parallel reclaim.

Nothing of that is relevant after vget() inside namei(). You own both refs.

sys/kern/vfs_subr.c
464

Do you mean devfs_revoke? It doesn't unlock the vnode until after the vhold/vdrop, so how is that relevant?

sys/kern/vfs_subr.c
464

No, I mean devfs_close(). vgone() calls VOP_CLOSE() when needed. vgone() might also unlock the vnode if there are layered mounts over the vnode' mp like nullfs, but it is not typical for devfs.

sys/kern/vfs_subr.c
438

Maybe priv check for root here?

442–449

This overflows the buffer when req->newlen == sizeof(buf).

452

You can probably use LOCKSHARED for the lookup to avoid invalidating / blocking on any existing shared holders in the case that the vnode is active.

457

The latter two can be shortened to NDF_NO_VP_PUT.

459

Might be more clear to use atomic_load(vp->v_usecount), but no difference on any freebsd arch.

I'd pick different errors for v_usecount>1 <=> doomed cases.

In D20519#443220, @kib wrote:
In D20519#443196, @kib wrote:

I suspect it could be more useful to pass a file descriptor number to the sysctl, leaving path manipulation to open. E.g., this way it would be possible to hack-reclaim unlinked inodes.

OTOH, taking a file descriptor instead of a pathname means that it wouldn't be able to reclaim inactive vnodes. For my purposes that's more important. For example, I want to be able to reclaim a directory that doesn't currently have any open file descriptors.

You already cannot do that, because namei() reactivates. This was my point, leave path handling to open(2).

Accepting a file descriptor and requiring open(2) still entails an extra VOP_OPEN and VOP_CLOSE that wouldn't otherwise be necessary, for example for directories that were traversed by lookup but never actually opened.

sys/kern/vfs_subr.c
430

Ok, I'll enable it unconditionally.

@markj @andrew @tuexen @emaste would this be useful for the Syzkaller project?

Yes, I suspect so. I am not sure how to write a syscall description for it: sysctlbyname("debug.try_reclaim_vnode") invokes multiple syscalls under the hood to resolve the sysctl name to a MIB entry.

Would it be useful enough that I should convert it into a syscall?

I think that would be overkill. There are other sysctls we might want to write.

asomers added inline comments.
sys/kern/vfs_subr.c
438

Since this is a sysctl handler, the thread must be privileged.

452

But the vnode needs to be locked exclusively for VOP_RECLAIM.

459

Like what, EAGAIN? I don't want to return 0, because a test program may want to know whether the sysctl had any effect.

sys/kern/vfs_subr.c
452

You can upgrade the lock after performing usecount check. This avoids unnecessarily xlocking vnodes the routine isn’t going to act on anyway.

459

Like Ebusy for usecount and ebadf for doomed.

asomers added inline comments.
sys/kern/vfs_subr.c
452

I'm removing the usecount check at kib's suggestion. And the doomed check should very rarely fail. So there will no longer be any reason to do the lookup with a shared lock and then upgrade.

459

I think that EBADF is not a good choice in this case, since a doomed vnode is one that is already being recycled. That's an argument for returning 0. However, I do think that a test case might want to know whether the sysctl had any effect at all, so I don't want to return 0 in that case. I think EAGAIN is appropriate, because once the doomed vnode is fully reclaimed a subsequent call will succeed (because namei will create a new vnode if necessary).

asomers marked an inline comment as done.

Remove INVARIANTS check, remove vhold, and add comments.

sys/kern/vfs_subr.c
452

Yes, the optimization was only valid if we were checking usecount.

459

EBADF is consistent with the spirit / status of existing doomed vnodes, i.e., dead_vnodeops mostly return EBADF.

Arguably, there should be a distinct error code between "that fd isn't valid in the fd table" and "the referenced vnode has actually been torn down/revoked under you", but that's outside the scope of this revision.

I don't believe EAGAIN makes sense in this context.

sys/kern/vfs_subr.c
459

It's probably impossible to get into that situation, anyway. Even if the vnode were doomed when the sysctl started, namei would allocate a new one. And it can't become doomed after namei returns since it's locked and referenced.

sys/kern/vfs_subr.c
459

Sure; hypothetically you could have two APIs here. One that takes a path and does namei, and another that takes an fd. Both are interesting cases. The fd case could be DOOMED, although you're correct that the current lookup LOCKLEAF cannot succeed and return a doomed vnode (AFAIK).

sys/kern/vfs_subr.c
473

I do not see how error can be non-zero here.

sys/kern/vfs_subr.c
473

Oops. That's leftover from the LK_UPGRADE that I briefly added, but then removed.

I think this version of code is technically correct. I still would prefer the fd-based sysctl instead.

sys/kern/vfs_subr.c
441

I do not see a need in this blank line.

454

Note that vnode can be alive but the path not cached in namecache, so this comment is misleading, unless made more precise.

This revision is now accepted and ready to land.Jun 6 2019, 11:00 AM

At @kib's request and @cem's suggestion, add a second sysctl that operates on file descriptors instead of paths.

This revision now requires review to proceed.Jun 6 2019, 2:06 PM
sys/kern/vfs_subr.c
405

For me, ioctl_rights check does not sound right. There is no specific rights which would match this operation, of course, but IMO fcntl or unlink are somewhat closer in spirit to what is done.

410

The flag must be checked under the vnode lock.

416

If you not pass LK_RETRY, then vn_lock() returns error for doomed vnode, which removes the need to manually check for VI_DOOMED.

asomers added inline comments.
sys/kern/vfs_subr.c
405

I don't think that unlink rights are correct, because unlink affects the data on disk. I'll use fcntl.

asomers marked an inline comment as done.

Respond to @kib's comments RE ftry_reclaim_vnode

kib added inline comments.
sys/kern/vfs_subr.c
397

Do not use initialization in local's declaration.

This revision is now accepted and ready to land.Jun 6 2019, 2:49 PM
This revision was automatically updated to reflect the committed changes.
head/sys/kern/vfs_subr.c
357 ↗(On Diff #58311)

why the +1 here?

head/sys/kern/vfs_subr.c
357 ↗(On Diff #58311)

So the argument can be as long as PATH_MAX, without nul-termination. nul-termination is ensured 5 lines lower.

head/sys/kern/vfs_subr.c
357 ↗(On Diff #58311)

But PATH_MAX / MAXPATHLEN includes space for the NUL.

(I added the original comment above while trying to understand the situation here, and have now submitted D21876 which I think makes this more clear.)