Page MenuHomeFreeBSD

fusefs: implement FUSE_COPY_FILE_RANGE.

Authored by asomers on Dec 29 2020, 8:16 PM.



fusefs: fix an expectation in one of the tests

An order-of-operations problem caused an expectation intended for
FUSE_READ to instead match FUSE_ACCESS. Surprisingly, only one test
case was affected.

fusefs: update FUSE protocol to 7.28

None of the new optional features are implemented yet.

fusefs: implement FUSE_COPY_FILE_RANGE.

MFC after: 2 weeks
Relnotes: yes

Diff Detail

R10 FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

cem added inline comments.
1244–1246 ↗(On Diff #81325)

What's the locking protocol for this function's vnode? Should we have an ASSERT_VOP_LOCKED/ELOCKED/UNLOCKED?

Edit: and now I see this function was just moved from another file. I'll leave the questions in case they're interesting.

1256–1257 ↗(On Diff #81325)

For these kind of things, I would generally early-return to save the horizontal compression forced on the meat of the function. But that's just personal preference; this is fine.

1266–1269 ↗(On Diff #81325)

I'm not sure this makes sense. I think we want to do *something* if the setattr fails. (Side note: I forget what fuse_internal_setattr does or why it might fail. I might be imagining a mode that doesn't make sense. Just tell me if I'm wrong :-).)

I think we really don't want to leak a SUID vnode written by non-root into the global namespace. Right? If setattr fails, we need to nuke the vnode, or something extreme like that, to maintain the integrity of our security model. You could imagine a malicious FUSE server program that accepted writes to suid programs from certain unprivileged users and produced errors on subsequent setattrs, to enable a backdoor of the ordinary security model.

Whether or not this produces an error code at the userspace level is kind of less important to me than whether or not we actually have a half-completed operation. Does that make sense?

271–281 ↗(On Diff #81325)

Should this be conditional on if (err == 0)? Or maybe not — a write might partially succeed and still produce an error, in which case we'd want to clear SUID?

653 ↗(On Diff #81325)

Isn't this the same as fsess_not_impl now?

666–668 ↗(On Diff #81325)

Do we need to do anything to avoid leaking infufh here? I forget how these lifetimes work but see that we don't do any special cleanup at the bottom, either.

670–686 ↗(On Diff #81325)

There is sort of a nice subroutine for this already: vn_lock_pair() in kern/vfs_vnops.c. It does a little more than we want: handles one vnode being NULL; and a little less than we want: doesn't SHARE-lock one of the vnodes, doesn't handle vp1 == vp2.

But I'd suggest moving the dual-vnode logic we need to a subroutine, possibly also in vfs_vnops.c, and doing a similar thing as far as alternating which lock we start from. It may be usable by other filesystems implementing this VOP.

688–690 ↗(On Diff #81325)

These assignments and the io variable lifetime could be moved inside this condition.

(Do we ignore rlimits if fsizetd == NULL? I don't fully understand the semantics of this VOP.)

699–701 ↗(On Diff #81325)

It is pretty odd to me that FUSE chose the *read* file as sort of the primary subject of this operation. I would have picked the written file. Just an off-hand comment, not actionable.

This revision is now accepted and ready to land.Dec 30 2020, 4:22 PM
asomers added inline comments.
1266–1269 ↗(On Diff #81325)

The "malicious FUSE server" scenario doesn't work. The kernel already refuses to honor the setuid bit on any file system mounted by a non-root user. So the malicious FUSE server would need to already have root privileges in order for the attack to work.

The real problem is that there's no way to indicate an error of "the write succeeded but the permission bits are now wrong". Those two things are supposed to happen automatically.

The real solution is to implement FUSE_WRITE_KILL_PRIV (and its tweaks in later protocol versions), which instruct the server to kill suid at the same time that it does the write.

271–281 ↗(On Diff #81325)

Yeah, that's what I was thinking. Kill suid even if the write fails.

666–668 ↗(On Diff #81325)

No. fuse_filehandle_getrw doesn't allocate anything, it just looks up an entry in an existing table.

670–686 ↗(On Diff #81325)

I didn't know about vn_lock_pair. I copied this section from nfs_copy_file_range. However, fusefs's version is not identical to nfs's, because nfs needs to worry about vn_start_write/vn_finish_write, and fusefs does not. I don't think I should make fusefs's a subroutine right now, because there's only one consumer, and I don't know how the exact needs might vary between consumers.

688–690 ↗(On Diff #81325)

Yes. If a_fsizetd is NULL, that indicates that the write is coming from a privileged process, like the NFS server.

I'm planning to commit this (with a couple of changes from your final comments) if it completes a make tinderbox.