Page MenuHomeFreeBSD

Allow some dotdot lookups in capability mode.
ClosedPublic

Authored by kib on Oct 1 2016, 11:25 AM.
Referenced Files
Unknown Object (File)
Tue, Mar 19, 12:25 AM
Unknown Object (File)
Sun, Mar 17, 4:56 AM
Unknown Object (File)
Sat, Mar 16, 8:24 PM
Unknown Object (File)
Feb 24 2024, 2:04 AM
Unknown Object (File)
Jan 14 2024, 6:28 PM
Unknown Object (File)
Jan 10 2024, 7:25 PM
Unknown Object (File)
Jan 7 2024, 9:20 PM
Unknown Object (File)
Dec 26 2023, 12:58 PM

Details

Summary

If dotdot lookup does not escape from the file descriptor passed as the lookup root, we can allow the component traversal.

This might be unnecessary strict with regard to the jail or real root traversal, but definitely it is required to avoid errors during unwind of the mount point stacked.

Minor style editing will be committed separately.

Test Plan

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 5458
Build 5670: CI src buildJenkins

Event Timeline

kib retitled this revision from to Allow some dotdot lookups in capability mode..
kib updated this object.
kib edited the test plan for this revision. (Show Details)
kib added reviewers: emaste, jonathan, rwatson, mjg, pjd.
kib set the repository for this revision to rS FreeBSD src repository - subversion.
kib added a project: capsicum.
sys/kern/vfs_lookup.c
706–707

This check is not sufficient to provide security in the presence of colluding concurrent userspace threads (t1, t2, both in capability mode) operating on a shared directory subtree, "a/b/c" and directory descriptors for "a" (afd) and "b" (bfd). When t1 performs an openat(bfd, "c/.."), t2 might simultaneously perform a renameat(afd, "b/c", afd, "c"). If the attacker wins the race, t1's evaluation if "c/.." could reach "a" rather than being blocked at "b", at which point the further lookup of ".." would escape both afd's and bfd's subtrees. Assuming I've described the scenario correctly, anyway!

Construct a lookup tracker, which records the path by holding each directory vnode found during namei() operations. Simultaneously, all the directories are added to some global structure and marked by VOP_RENAME() implementations while tvp is still locked, as renamed.

After the lookup finished, assert that all vnodes in the lookup tracker were not renamed. If any was, and dotdot found during the path traversal, return EAGAIN. Of course, the code may be changed to restart the lookup instead.

The global tracker for VOP_RENAME() is a plain list, which might be converted by per-bucket locked hash table to reduce rename overhead. Only ufs_rename() was adjusted.

Implement Jonathan Anderson suggestion of checking the result of dotdot lookup against the recorded list of traversed vnodes. Drop rename notifications. Check for dotdot vnodes living on local fs.

In D8110#168999, @kib wrote:

Implement Jonathan Anderson suggestion of checking the result of dotdot lookup against the recorded list of traversed vnodes. Drop rename notifications. Check for dotdot vnodes living on local fs.

Possibly the Robert Watson suggestion. :-)

In D8110#168999, @kib wrote:

Implement Jonathan Anderson suggestion of checking the result of dotdot lookup against the recorded list of traversed vnodes. Drop rename notifications. Check for dotdot vnodes living on local fs.

Possibly the Robert Watson suggestion. :-)

Sorry.

Overall I like this approach, but there's an important experimental question as to whether this enables all the use cases we care about -- and, more generally, whether there are visible failure modes that might surprise application programmers. We also need to think quite hard to convince ourselves this maintains safe operation. Getting Jon Anderson, Ben Laurie, and David Drysdale to review the approach would be very useful.

sys/kern/vfs_lookup.c
138

perhaps s/resulted/resulting/?

141

I'm still pondering the rationale for this choice but feel that being conservative is safer.

272

With the new approach, we might need to clarify that this is not quite what we implement, although it is the effect of what we implement.

kib edited edge metadata.

Update comments.

A fairly cursory look didn't turn up any obvious problems for me, but I would like to take a deeper look over the next few days.

sys/kern/vfs_lookup.c
686

If this were earlier in dirloop, would we be able to remove the call at line 371?

743

We always do the same KTrace stuff when nameicap_check_dotdot fails, so could we move the KTrace bits to nameicap_check_dotdot?

kib marked an inline comment as done.Oct 7 2016, 1:23 PM
kib added inline comments.
sys/kern/vfs_lookup.c
743

I pondered this but I think I do not want to do the change now.

Current structure of nameicap_check_dotdot() is not suitable for the move, since there are two error exits. There are reservations about MNT_LOCAL already stated, I do not want to change the control flow without the decision about local.

Additionally, that function is a helper with the single purpose, KTR/probes stuff lives in lookup() itself.

kib edited edge metadata.

Remove nameicap_tracker_add() call from namei().

sys/kern/vfs_lookup.c
96

I wonder if "nameicap_tracker" or similar might be a better string than "rentr"?

153

My initial thinking was that this new approach is safe even for !MNT_LOCAL -- do we have reason to think we need to retain this check?

sys/kern/vfs_lookup.c
96

I highly dislike long (or worse, containing space-delimited words) names for zones. Both vmstat -z and ddb 'show uma' become hard to read and parse. Anybody who looks at the zone needs to grep the source anyway.

153

It depends on what you consider the adverse behavior on the server ? If you consider it fine to remove the check, I will just do it without arguing.

Add a knob to enable/disable dotdot lookups in cap mode. Restore pre-patch code to fail if dotdot parsed, and comments about that.

sys/kern/vfs_lookup.c
111

Maybe enable \"..\" components in path lookup in capability mode? I think just "enable" is canonical, and nice to be shorter.

You may want to also add a comment before this that the intent is this becomes unconditionally enabled, but that a sysctl is provided until verification efforts are complete.

Suggestions by emaste.
Use flags instead of two booleans.

Minor style editing will be committed separately.

If you wanted, you could go ahead and commit those now and rebase this patch.

On superficial review this patch looks good to me, but I don't have enough knowledge of name lookups to really be confident. I hope it will be reviewed by someone who does.

mjg edited edge metadata.
This revision is now accepted and ready to land.Oct 31 2016, 8:22 PM
This revision was automatically updated to reflect the committed changes.