Page MenuHomeFreeBSD

fcntl(2): add F_KINFO operation
ClosedPublic

Authored by kib on Dec 5 2021, 6:54 PM.

Details

Summary
that returns struct kinfo_file for the given file descriptor.  Among
other data, it also returns kf_path, if file op was able to restore file
path.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

kib requested review of this revision.Dec 5 2021, 6:54 PM

Why not a sysctl-based interface, where the fd is a component of the MIB? Then info for a specific fd can be grabbed from a different process.

sys/kern/kern_descrip.c
858 ↗(On Diff #99492)

I suspect it should be at least cap_fcntl_rights. Maybe this should not be permitted in capability mode at all, it seems dubious to allow a capmode process to look up the path corresponding to an fd.

kib marked an inline comment as done.Dec 5 2021, 7:19 PM

Why not a sysctl-based interface, where the fd is a component of the MIB? Then info for a specific fd can be grabbed from a different process.

There is already kern.proc.filedesc, and I highly doubt that somebody needs a path for individual fd of remote process.

Disable F_KINFO in capability mode.
Use cap_fcntl_rights.

sys/kern/kern_descrip.c
858 ↗(On Diff #99492)

So I added this, but an after-thought is that I do not see anything wrong with resolving full path even in capability mode. What additional privilege does this resolution grant?

859 ↗(On Diff #99493)

Should this be ECAPMODE or ENOTCAPABLE?

In D33277#752192, @kib wrote:

Why not a sysctl-based interface, where the fd is a component of the MIB? Then info for a specific fd can be grabbed from a different process.

There is already kern.proc.filedesc, and I highly doubt that somebody needs a path for individual fd of remote process.

I'm not so sure, it could be useful for some kind of truss extension which prints a path corresponding to an fd.

The sysctl-based interface is also slightly more future-proof since it lets the caller and kernel detect size mismatches.

lib/libc/sys/fcntl.2
197
sys/kern/kern_descrip.c
858 ↗(On Diff #99492)

It doesn't grant any new privileges, but it leaks information about a namespace. KERN_PROC_FILEDESC is not permitted in capability mode, for example, and the fd may have been opened by a process with more privileges than the current (sandboxed) process. As another example, we have CAP_GETPEERNAME for a socket, but this operation returns a superset of the information provided by getpeername(2), so the right can be bypassed.

Hmm, I forgot that we have cap_fcntls_limit(2) and fget_fcntl(). So with a bit more work it could be possible to strip the right to call F_KINFO for a particular fd. If that is implemented, then I think it is ok to remove the IN_CAPABILITY_MODE() check. The code should also check for CAP_GETPEERNAME for a socket, though.

859 ↗(On Diff #99493)

ECAPMODE means, "this operation is not permitted in capability mode," whereas ENOTCAPABLE means "this fd has insufficient rights to perform the operation," so ECAPMODE is correct. If we added a fine-grained right(4) for this operation, like CAP_FCNTL_KINFO, and it was missing, then the error would be ENOTCAPABLE.

863 ↗(On Diff #99493)

This should be zeroed, the fo_fill_kinfo() implementations do not handle it and we'll copy out uninitialized pad bytes.

kib marked 3 inline comments as done.Dec 5 2021, 8:18 PM

I'm not so sure, it could be useful for some kind of truss extension which prints a path corresponding to an fd.

Our truss or a debugger should be fine using kern.proc.filedesc, it is not performance-important.

The sysctl-based interface is also slightly more future-proof since it lets the caller and kernel detect size mismatches.

It is not, unfortunately. What can be done, on the other hand, is to require kf_structsize to be pre-filled with the expected size before fcntl call.

Require userspace to fill kf_structsize for versioning.

In D33277#752192, @kib wrote:

Why not a sysctl-based interface, where the fd is a component of the MIB? Then info for a specific fd can be grabbed from a different process.

There is already kern.proc.filedesc, and I highly doubt that somebody needs a path for individual fd of remote process.

In Wine's case, other processes would be ok, because they share all file descriptors for "Windows files" with wineserver (with sendmsg() + SCM_RIGHTS), and this F_KINFO would only get queried in wineserver.

Even if sysutils/lsof takes the route of doing more from user space, kern.proc.filedesc should be enough for its needs. In fact its upstream developers were already in discussion about just that. [Its current approach of probing kernel data structures with kvm_read() is racy and unmaintainable, "struct namecache" changes ABI once a year, zfs broke in FreeBSD >= 13, msdosfs returns corrupt filenames, isofs was broken from FreeBSD 6.]

I like this patch.

markj added inline comments.
sys/kern/kern_descrip.c
873 ↗(On Diff #99499)

Should this be using export_file_to_kinfo() instead of calling the file method directly? It fills out some generic fields.

This revision is now accepted and ready to land.Dec 6 2021, 4:11 PM
kib marked an inline comment as done.

Use export_file_for_kinfo.
Lock fdp around export_file.
Use fget_cap_locked().

This revision now requires review to proceed.Dec 6 2021, 5:55 PM

Slightly streamline flow.

This revision is now accepted and ready to land.Dec 6 2021, 6:36 PM
jhb added a subscriber: brooks.

@brooks so he is aware of of this when it comes down to merging to CheriBSD. (Maybe sys/fcntl.h should be a default pause path in CheriBSD's mergify config?)

lib/libc/sys/fcntl.2
56

Maybe just say the type of the 3rd arg depends on the operation (and document the type for each operation that takes the third arg) instead of trying to assume it's a single type? In CheriBSD we have a wrapper in libc to deal with this and have fcntl() as a truly var-args function. On existing platforms var-args is close enough to work for passing syscall arguments, but not all ABIs require this. (var-args are passed as a block in a bounded pointer on CHERI platforms and not just a continuation of register arguments).

kib marked an inline comment as done.Dec 6 2021, 8:19 PM
kib added inline comments.
lib/libc/sys/fcntl.2
56

I pushed everything but this change, the review is updated with the man page changes.

kib marked an inline comment as done.

fcntl(2): be more precise about third arg type

kib added a reviewer: emaste.
lib/libc/sys/fcntl.2
56

maybe "with type dependent on the specific operation"

or maybe

can take an additional third argument
.Fa arg.
Unless otherwise noted below for a specific operation,
.Fa arg
has type
.Vt int .
293

Is "command" or "operation" the better term, in general? I think we should be consistent between this sentence and the added text above/below

304

These commands take a pointer to struct flock as the third argument arg.
or
These advisory file locking commands take a pointer to struct flock as the third argument arg.

kib marked 3 inline comments as done.

Add Ed suggestions.

emaste added inline comments.
lib/libc/sys/fcntl.2
304

Maybe "operations" here too for consistency?
But from the context this one is clear so I don't mind either way.

This revision is now accepted and ready to land.Dec 6 2021, 9:41 PM
kib marked an inline comment as done.

One more "operations".

This revision now requires review to proceed.Dec 6 2021, 9:45 PM
This revision is now accepted and ready to land.Dec 6 2021, 9:59 PM
This revision was automatically updated to reflect the committed changes.