Page MenuHomeFreeBSD

Implement O_BENEATH flag for openat(2).
AbandonedPublic

Authored by jonathan on Jun 14 2015, 4:41 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Dec 27, 1:49 PM
Unknown Object (File)
Thu, Dec 26, 11:16 PM
Unknown Object (File)
Wed, Dec 25, 5:13 AM
Unknown Object (File)
Sun, Dec 22, 1:16 PM
Unknown Object (File)
Dec 4 2024, 2:27 AM
Unknown Object (File)
Dec 2 2024, 12:00 PM
Unknown Object (File)
Nov 30 2024, 1:59 PM
Unknown Object (File)
Nov 27 2024, 6:32 PM

Details

Reviewers
pjd
rwatson
sson
Summary

Capsicum restricts capability-mode and capability-relative path lookups
to be "strictly relative": we do not permit absolute paths or ".." in
path resolution for these cases. This functionality could be useful in
non-Capsicum applications as well, and it has been proposed for inclusion
in Linux as a step along the way to Capsicum in Linux.

This commit would add the O_BENEATH flag for use in openat(2) to enable
"strict relative" lookups with unrestricted file descriptors (i.e., not
Capsicum capabilities) outside of capability mode. The only difference
from the Capsicum behaviour is that O_BENEATH would cause errno to be
EPERM rather than ECAPMODE or ENOTCAPABLE. If O_BENEATH is used together
with capability mode or a directory capability, EPERM is returned, since
the usermode application has explicitly requested the new behaviour.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage

Event Timeline

jonathan retitled this revision from to Implement O_BENEATH flag for openat(2)..
jonathan updated this object.
jonathan edited the test plan for this revision. (Show Details)
jonathan added reviewers: rwatson, pjd.

I like the idea of this change.

I'd rename ni_strictrelative to nd_strictrelativeerror or such, since it's now an errno.

Does this change have the correct effect when combined with fchdir() and fchroot()? I.e., can I fchdir() to an O_BENEATH descriptor and then perform a lookup without beneath properties?

sys/kern/vfs_lookup.c
1170

I might move this comment up to just above the function, and provide a little more explanation as to why this is needed, not just what it does.

1187

Possibly this should be a panic(), as should this occur at runtime, we want to fail closed, not open? Also, do include the proposed error number in the panic message.

Thanks for the feedback!

On the ni_strictrelative renaming: how about ni_nonrelativeerrno?

On your composition question:
The proposed mechanism would only affect the current system call: there is no change to the behaviour of the returned descriptor. If the dirfd was a capability then we still apply the usual capability restrictions to the opened fd. If the dirfd was not a capability, then we return a normal full-fledged file descriptor.

In your specific example, let's say we have a regular (non-capability) directory descriptor top and that we are not in capability mode. In that case, we could do:

int subdir = openat(top, "deep/subdir/path", O_DIRECTORY | O_BENEATH);
// subdir is now an unrestricted dirfd

int config = openat(subdir, "../config/my.conf", O_RDONLY);

fchdir(subdir);
int config2 = open("../config/my.conf", O_RDONLY);
int config3 = openat(FD_ATCWD, "../config/my.conf", O_RDONLY);

after which config, config2 and config3 should all be valid file descriptors for the configuration file.

If we are in capability mode or operating relative to a capability then the usual Capsicum rules apply.

sys/kern/vfs_lookup.c
1170

I can certainly do this. I was wondering, however, if this function would be considered useful or overly heavyweight: the call could be replaced by a conditional assignment in most places (and an unconditional assignment in one). I like the clarity of an explicit function call, but it can't really be inlined (unless I move the implementation to a header file, which I also don't like).

jonathan edited edge metadata.

Additional commit:

  • Rename nameidata::ni_strictrelative to ni_nonrelativeerrno.

Sounds reasonable with regard to fchdir() and friends -- and, indeed -- right now, CAPMODE forces O_BENEATH, but it's true we don't have a variant on O_BENEATH that forces it to be used for open()'d directories via other open()'d directories with O_BENEATH set. I think this is fine. There's a slight confused deputy problem if directories are passed from CAPMODE processes to non-CAPMODE processes anyway -- and we don't make it worse here.

Is there a way we can add David Drysdale as a reviewer for this patch?

sys/kern/vfs_lookup.c
1170

I would be OK with moving to various in situ -- especially of the form "... CAPMODE ? ENOTCAPABLE : EPERM".

Is there a way we can add David Drysdale as a reviewer for this patch?

Yes, David will need to check out https://wiki.freebsd.org/CodeReview which explains the procedure for getting an account. Short version: sign up at https://reviews.freebsd.org/auth/register/

sys/kern/vfs_syscalls.c
1000–1002

Extra braces

sys/sys/namei.h
172–173

This just seems to describe ndrequire_strict_relative_lookups's implementation, and I'm not sure it would help my understanding of this functionality without the full context of this change.

We should also add a test case -- @drysdale_google.com, is there a possibility the O_BENEATH tests you have could be made available under a BSD license?

@emaste, I updated the capsicum-test framework to include openat tests as well as capability-mode/capability-dfd tests, so that covers everything I've got.

sys/kern/vfs_lookup.c
1173

This seems superfluous with the second KASSERT below

sys/kern/vfs_syscalls.c
983

(Random aside: does FreeBSD not attempt to police unrecognized flags? In Linux, syscalls are (now) recommended to reject unknown flags with EINVAL.)

sys/sys/namei.h
153

I couldn't see where this is used - is it needed?

Thanks for these comments! I should be updating (and shrinking) the diff soon... just need a little more testing first.

sys/kern/vfs_lookup.c
1173

This function will go away anyhow based the above discussion with Robert.

sys/kern/vfs_syscalls.c
983

This strikes me as a fundamentally good idea, but others may disagree for compatibility reasons... maybe worth talking about for 11, but probably not in this hopefully-MFC'able change.

1000–1002

Ok, I will change this conform to style(9), but I officially protest that I do so under duress. :)

From now on, I will refer to this paragraph of style(9) as the "goto fail" paragraph.

sys/sys/namei.h
153

You are quite right: that was only needed in my very first patch version. I'll remove it.

172–173

I'm going to change this by removing the function and just setting the fields in the appropriate places. The logic will be unconditional in one place and if (foo != EPERM) in two, so it's probably not worth the cost of this only-inlineable-with-LTO function call.

@emaste, I updated the capsicum-test framework to include openat tests as well as capability-mode/capability-dfd tests, so that covers everything I've got.

Sounds good. I was thinking of the standalone O_BENEATH test posted to LKML and wasn't aware that equivalent test coverage now exists in the capsicum-test project - thanks.

Remove ndrequire_strict_relative_lookups().

  • Increase style(9) compliance.
  • Remove unused namei() flag.

Ok, so I believe that I have addressed all of the comments above... please let me know if I've missed anything! If I have indeed addressed everything, it might be appropriate to hit the LGTM button now. :)

Getting close, but a couple more comments :-).

sys/kern/vfs_lookup.c
658–660

Hmm. If we're limited by O_BENEATH, should we be using a different KTR trace function here?

sys/kern/vfs_syscalls.c
983

I think validating flags is a good idea -- but agree that it might be done on a different MFC schedule and hence should be a separate change.

1018–1020

Hmm. I think failing closed here is reasonable for capability mode -- but I'm not sure if that's true for ambient O_BENEATH. I believe td_dupfd is set for /dev/fd/[012] and /dev/{stderr,stdin,stdout}, which might reasonable be expected to work outside of capability mode. It also looks like it's used in another edge case or two (sysvr4 compatibility!?).

jonathan added inline comments.
sys/kern/vfs_lookup.c
658–660

How would you feel about using KTR_NAMEI instead? We could use KTR_CAPFAIL for Capsicum-caused failures and KTR_NAMEI for O_BENEATH-induced failures, but that's adding a lot of complexity and it's not entirely clear to me how we should handle the "both" case...

sys/kern/vfs_syscalls.c
1018–1020

Since we've said that O_BENEATHsupercedes the capability rules (in the sense that we want the O_BENEATH errno whether we're in or out of capability mode), we can't know at this point whether the original file was a capability: we just see EPERM.

If this code is related to fdopen(), I don't anticipate O_BENEATH being used much...

sys/kern/vfs_lookup.c
658–660

Hurum. Agreed it would make things more complex, but it probably is the right thing. How about we simply make the decision based on whether it's ENOTCAPABLE or EPERM?

sys/kern/vfs_syscalls.c
1018–1020

Maybe make this an XXXCAP or XXXJA or such, but otherwise, yes, I agree. We'll see if anyone ever runs into it!

jonathan added inline comments.
sys/kern/vfs_lookup.c
658–660

Ok, I can try that and see how it looks.

sys/kern/vfs_syscalls.c
1018–1020

Adding more XXX here sounds like a fine thing to do. :)

  • Let ktrace output depend on nonrelativeerrno.
  • Implement O_BENEATH flag for openat(2).
  • Rename nameidata::ni_strictrelative to ni_nonrelativeerrno.
  • Remove ndrequire_strict_relative_lookups().
  • Increase style(9) compliance.
  • Remove unused namei() flag.

Sorry for all of the recent noise: it seems that Arcanist requires extra care when doing distributed development!

To catch everybody up: I implemented @rwatson's ktrace change suggestion, but now it's been rolled back by the Arcanist fixups. Before I re-implement it, I'm going to merge in @mjg's namei refactoring work (which hit HEAD yesterday).

sys/kern/vfs_lookup.c
211

The caller is free to specify any error they want, yet this checks specifically for EPERM. I would say that's inconsequential for no good reason.

Instead, you can if (ndp->ni_nonrelativeerrno == 0) ndp->ni_nonrelativeerrno = ENOTCAPABLE;

while a no-op change with current consumers, may come in handy.

  • Let ktrace output depend on nonrelativeerrno.
jonathan added inline comments.
sys/kern/vfs_lookup.c
211

We actually want to honour EPERM above ENOTCAPABLE, since the user application code can always set the O_BENEATH flag and might expect EPERM when there is a failure. In all other cases, we preserve the existing ABI by returning ENOTCAPABLE.

658–661

Ok, @rwatson, how does this ktrace update look?

  • Add an XXXJA to the fdopen() with O_BENEATH case.

Marking request for XXXJA in the fdopen()+O_BENEATH case as done.

sys/kern/vfs_lookup.c
211

But proposed change would honour EPERM just fine. Further, when one day other caller will want to return EACCESS, it will just work, while with current patch whatever it sets will be converted to ENOTCAPABLE. That's only a nit though.

sys/kern/vfs_lookup.c
211

That's a fair point. Another development (not visible here on the issue tracker) is that we're considering just creating a brand new errno for this kind of failure. That would provide more information to the caller about the true nature of the error. It would also allow us to stay compatible between the FreeBSD and Linux implementations even though Linux doesn't have ENOTCAPABLE, and we could #define this value to ENOTCAPABLE on FreeBSD 10 to maintain ABI compatibility.

  • MFC r303406
  • Treat EPERM slightly less specially.

Address @mjg 's point about EPERM.

With D17547/rS339748 committed should we now close this?

Closing revision: this functionality was implemented in D17547.