Page MenuHomeFreeBSD

Add a debugging facility to manually reclaim a vnode
ClosedPublic

Authored by asomers on Jun 4 2019, 8:13 PM.
Tags
None
Referenced Files
F103389089: D20519.id58249.diff
Sun, Nov 24, 9:33 AM
Unknown Object (File)
Sat, Nov 23, 10:29 AM
Unknown Object (File)
Thu, Nov 21, 7:05 AM
Unknown Object (File)
Tue, Nov 5, 8:42 PM
Unknown Object (File)
Oct 2 2024, 7:37 PM
Unknown Object (File)
Sep 24 2024, 4:09 AM
Unknown Object (File)
Sep 22 2024, 11:45 PM
Unknown Object (File)
Sep 22 2024, 11:28 AM

Details

Summary

Add a debugging facility to manually reclaim a vnode

Add the debug.try_reclaim_vnode sysctl when INVARIANTS is defined. When a
pathname is written to it, it will be reclaimed, as long as it isn't in use
or doomed. The purpose is to gain test coverage for vnode reclamation,
which is otherwise hard to achieve.

Test Plan

Used by the fusefs test suite

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 24688
Build 23456: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@markj @andrew @tuexen @emaste would this be useful for the Syzkaller project?

I suspect it could be more useful to pass a file descriptor number to the sysctl, leaving path manipulation to open. E.g., this way it would be possible to hack-reclaim unlinked inodes.

sys/kern/vfs_subr.c
342

Why only under INVARIANTS ?

348

We do not allocate so large buffers on stack.

354

I think this is off-by-one since you zero byte at req->newlen.

You must check some privilege to do the operation.

364

Indent should be +4 spaces. Previous line is under-filled.

371

Why disallowing v_usecount > 1 ? I think for this tool to be useful, it needs to allow reclaiming open vnodes.

Result of '&' is not boolean, you should use != 0.

376

What is the reason to vhold() the vnode for which you already bumped usecount ?

@markj @andrew @tuexen @emaste would this be useful for the Syzkaller project?

Yes, I suspect so. I am not sure how to write a syscall description for it: sysctlbyname("debug.try_reclaim_vnode") invokes multiple syscalls under the hood to resolve the sysctl name to a MIB entry.

asomers added inline comments.
sys/kern/vfs_subr.c
342

Because it's strictly a facility for testing. It probably isn't even useful for debugging.

354

I left out a privilege check because only root has the ability to write to sysctls. Or is there some case where non-root users can do it too?

371

My intent was to simulate what the vnlru thread might do anyway. But I can remove that restriction if you think it would be useful.

376

Because that's what devfs_revoke and vlrureclaim do. Are they wrong? Or am I missing something?

@markj @andrew @tuexen @emaste would this be useful for the Syzkaller project?

Yes, I suspect so. I am not sure how to write a syscall description for it: sysctlbyname("debug.try_reclaim_vnode") invokes multiple syscalls under the hood to resolve the sysctl name to a MIB entry.

Would it be useful enough that I should convert it into a syscall?

In D20519#443196, @kib wrote:

I suspect it could be more useful to pass a file descriptor number to the sysctl, leaving path manipulation to open. E.g., this way it would be possible to hack-reclaim unlinked inodes.

OTOH, taking a file descriptor instead of a pathname means that it wouldn't be able to reclaim inactive vnodes. For my purposes that's more important. For example, I want to be able to reclaim a directory that doesn't currently have any open file descriptors.

Respond to kib's comments

In D20519#443196, @kib wrote:

I suspect it could be more useful to pass a file descriptor number to the sysctl, leaving path manipulation to open. E.g., this way it would be possible to hack-reclaim unlinked inodes.

OTOH, taking a file descriptor instead of a pathname means that it wouldn't be able to reclaim inactive vnodes. For my purposes that's more important. For example, I want to be able to reclaim a directory that doesn't currently have any open file descriptors.

You already cannot do that, because namei() reactivates. This was my point, leave path handling to open(2).

sys/kern/vfs_subr.c
342

It might be useful for debugging, I did not tried it to state this definitely. Currently we achieve the same effect with umount -f.

371

Yes, it would be more useful if there is no such restriction, see above about forced umount.

376

You already took the use count on the vnode, which implies hold count.

For vlrureclaim, the function tries hard not to bump usecount to not unfree the vnode. So it needs a hold reference at least.

For devfs, devfs_close() unlocks the vnode, again without extra ref, so the vhold() prevent race with a parallel reclaim.

Nothing of that is relevant after vget() inside namei(). You own both refs.

sys/kern/vfs_subr.c
376

Do you mean devfs_revoke? It doesn't unlock the vnode until after the vhold/vdrop, so how is that relevant?

sys/kern/vfs_subr.c
376

No, I mean devfs_close(). vgone() calls VOP_CLOSE() when needed. vgone() might also unlock the vnode if there are layered mounts over the vnode' mp like nullfs, but it is not typical for devfs.

sys/kern/vfs_subr.c
350

Maybe priv check for root here?

354–361

This overflows the buffer when req->newlen == sizeof(buf).

364

You can probably use LOCKSHARED for the lookup to avoid invalidating / blocking on any existing shared holders in the case that the vnode is active.

369

The latter two can be shortened to NDF_NO_VP_PUT.

371

Might be more clear to use atomic_load(vp->v_usecount), but no difference on any freebsd arch.

I'd pick different errors for v_usecount>1 <=> doomed cases.

In D20519#443220, @kib wrote:
In D20519#443196, @kib wrote:

I suspect it could be more useful to pass a file descriptor number to the sysctl, leaving path manipulation to open. E.g., this way it would be possible to hack-reclaim unlinked inodes.

OTOH, taking a file descriptor instead of a pathname means that it wouldn't be able to reclaim inactive vnodes. For my purposes that's more important. For example, I want to be able to reclaim a directory that doesn't currently have any open file descriptors.

You already cannot do that, because namei() reactivates. This was my point, leave path handling to open(2).

Accepting a file descriptor and requiring open(2) still entails an extra VOP_OPEN and VOP_CLOSE that wouldn't otherwise be necessary, for example for directories that were traversed by lookup but never actually opened.

sys/kern/vfs_subr.c
342

Ok, I'll enable it unconditionally.

@markj @andrew @tuexen @emaste would this be useful for the Syzkaller project?

Yes, I suspect so. I am not sure how to write a syscall description for it: sysctlbyname("debug.try_reclaim_vnode") invokes multiple syscalls under the hood to resolve the sysctl name to a MIB entry.

Would it be useful enough that I should convert it into a syscall?

I think that would be overkill. There are other sysctls we might want to write.

asomers added inline comments.
sys/kern/vfs_subr.c
350

Since this is a sysctl handler, the thread must be privileged.

364

But the vnode needs to be locked exclusively for VOP_RECLAIM.

371

Like what, EAGAIN? I don't want to return 0, because a test program may want to know whether the sysctl had any effect.

sys/kern/vfs_subr.c
364

You can upgrade the lock after performing usecount check. This avoids unnecessarily xlocking vnodes the routine isn’t going to act on anyway.

371

Like Ebusy for usecount and ebadf for doomed.

asomers added inline comments.
sys/kern/vfs_subr.c
364

I'm removing the usecount check at kib's suggestion. And the doomed check should very rarely fail. So there will no longer be any reason to do the lookup with a shared lock and then upgrade.

371

I think that EBADF is not a good choice in this case, since a doomed vnode is one that is already being recycled. That's an argument for returning 0. However, I do think that a test case might want to know whether the sysctl had any effect at all, so I don't want to return 0 in that case. I think EAGAIN is appropriate, because once the doomed vnode is fully reclaimed a subsequent call will succeed (because namei will create a new vnode if necessary).

asomers marked an inline comment as done.

Remove INVARIANTS check, remove vhold, and add comments.

sys/kern/vfs_subr.c
364

Yes, the optimization was only valid if we were checking usecount.

371

EBADF is consistent with the spirit / status of existing doomed vnodes, i.e., dead_vnodeops mostly return EBADF.

Arguably, there should be a distinct error code between "that fd isn't valid in the fd table" and "the referenced vnode has actually been torn down/revoked under you", but that's outside the scope of this revision.

I don't believe EAGAIN makes sense in this context.

sys/kern/vfs_subr.c
371

It's probably impossible to get into that situation, anyway. Even if the vnode were doomed when the sysctl started, namei would allocate a new one. And it can't become doomed after namei returns since it's locked and referenced.

sys/kern/vfs_subr.c
371

Sure; hypothetically you could have two APIs here. One that takes a path and does namei, and another that takes an fd. Both are interesting cases. The fd case could be DOOMED, although you're correct that the current lookup LOCKLEAF cannot succeed and return a doomed vnode (AFAIK).

sys/kern/vfs_subr.c
385

I do not see how error can be non-zero here.

sys/kern/vfs_subr.c
385

Oops. That's leftover from the LK_UPGRADE that I briefly added, but then removed.

I think this version of code is technically correct. I still would prefer the fd-based sysctl instead.

sys/kern/vfs_subr.c
353

I do not see a need in this blank line.

366

Note that vnode can be alive but the path not cached in namecache, so this comment is misleading, unless made more precise.

This revision is now accepted and ready to land.Jun 6 2019, 11:00 AM

At @kib's request and @cem's suggestion, add a second sysctl that operates on file descriptors instead of paths.

This revision now requires review to proceed.Jun 6 2019, 2:06 PM
sys/kern/vfs_subr.c
404

For me, ioctl_rights check does not sound right. There is no specific rights which would match this operation, of course, but IMO fcntl or unlink are somewhat closer in spirit to what is done.

409

The flag must be checked under the vnode lock.

415

If you not pass LK_RETRY, then vn_lock() returns error for doomed vnode, which removes the need to manually check for VI_DOOMED.

asomers added inline comments.
sys/kern/vfs_subr.c
404

I don't think that unlink rights are correct, because unlink affects the data on disk. I'll use fcntl.

asomers marked an inline comment as done.

Respond to @kib's comments RE ftry_reclaim_vnode

kib added inline comments.
sys/kern/vfs_subr.c
396

Do not use initialization in local's declaration.

This revision is now accepted and ready to land.Jun 6 2019, 2:49 PM
This revision was automatically updated to reflect the committed changes.
head/sys/kern/vfs_subr.c
357 ↗(On Diff #58311)

why the +1 here?

head/sys/kern/vfs_subr.c
357 ↗(On Diff #58311)

So the argument can be as long as PATH_MAX, without nul-termination. nul-termination is ensured 5 lines lower.

head/sys/kern/vfs_subr.c
357 ↗(On Diff #58311)

But PATH_MAX / MAXPATHLEN includes space for the NUL.

(I added the original comment above while trying to understand the situation here, and have now submitted D21876 which I think makes this more clear.)