Allow some dotdot lookups in capability mode.
ClosedPublic

Authored by kib on Oct 1 2016, 11:25 AM.

Details

Summary

If dotdot lookup does not escape from the file descriptor passed as the lookup root, we can allow the component traversal.

This might be unnecessary strict with regard to the jail or real root traversal, but definitely it is required to avoid errors during unwind of the mount point stacked.

Minor style editing will be committed separately.

Test Plan

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
kib retitled this revision from to Allow some dotdot lookups in capability mode..Oct 1 2016, 11:25 AM
kib updated this object.
kib edited the test plan for this revision. (Show Details)
kib added reviewers: emaste, jonathan, rwatson, mjg, pjd.
kib set the repository for this revision to rS FreeBSD src repository.
kib added a project: capsicum.
rwatson added inline comments.Oct 1 2016, 4:00 PM
sys/kern/vfs_lookup.c
648 ↗(On Diff #20909)

This check is not sufficient to provide security in the presence of colluding concurrent userspace threads (t1, t2, both in capability mode) operating on a shared directory subtree, "a/b/c" and directory descriptors for "a" (afd) and "b" (bfd). When t1 performs an openat(bfd, "c/.."), t2 might simultaneously perform a renameat(afd, "b/c", afd, "c"). If the attacker wins the race, t1's evaluation if "c/.." could reach "a" rather than being blocked at "b", at which point the further lookup of ".." would escape both afd's and bfd's subtrees. Assuming I've described the scenario correctly, anyway!

kib updated this revision to Diff 20915.Oct 1 2016, 6:30 PM

Construct a lookup tracker, which records the path by holding each directory vnode found during namei() operations. Simultaneously, all the directories are added to some global structure and marked by VOP_RENAME() implementations while tvp is still locked, as renamed.

After the lookup finished, assert that all vnodes in the lookup tracker were not renamed. If any was, and dotdot found during the path traversal, return EAGAIN. Of course, the code may be changed to restart the lookup instead.

The global tracker for VOP_RENAME() is a plain list, which might be converted by per-bucket locked hash table to reduce rename overhead. Only ufs_rename() was adjusted.

kib updated this revision to Diff 21058.Oct 5 2016, 10:21 AM

Implement Jonathan Anderson suggestion of checking the result of dotdot lookup against the recorded list of traversed vnodes. Drop rename notifications. Check for dotdot vnodes living on local fs.

In D8110#168999, @kib wrote:

Implement Jonathan Anderson suggestion of checking the result of dotdot lookup against the recorded list of traversed vnodes. Drop rename notifications. Check for dotdot vnodes living on local fs.

Possibly the Robert Watson suggestion. :-)

kib added a comment.Oct 5 2016, 10:51 AM
In D8110#168999, @kib wrote:

Implement Jonathan Anderson suggestion of checking the result of dotdot lookup against the recorded list of traversed vnodes. Drop rename notifications. Check for dotdot vnodes living on local fs.

Possibly the Robert Watson suggestion. :-)

Sorry.

Overall I like this approach, but there's an important experimental question as to whether this enables all the use cases we care about -- and, more generally, whether there are visible failure modes that might surprise application programmers. We also need to think quite hard to convince ourselves this maintains safe operation. Getting Jon Anderson, Ben Laurie, and David Drysdale to review the approach would be very useful.

sys/kern/vfs_lookup.c
138 ↗(On Diff #21058)

perhaps s/resulted/resulting/?

141 ↗(On Diff #21058)

I'm still pondering the rationale for this choice but feel that being conservative is safer.

272 ↗(On Diff #21058)

With the new approach, we might need to clarify that this is not quite what we implement, although it is the effect of what we implement.

kib updated this revision to Diff 21061.Oct 5 2016, 12:32 PM

Update comments.

A fairly cursory look didn't turn up any obvious problems for me, but I would like to take a deeper look over the next few days.

sys/kern/vfs_lookup.c
690 ↗(On Diff #21061)

If this were earlier in dirloop, would we be able to remove the call at line 371?

747 ↗(On Diff #21061)

We always do the same KTrace stuff when nameicap_check_dotdot fails, so could we move the KTrace bits to nameicap_check_dotdot?

kib marked an inline comment as done.Oct 7 2016, 1:23 PM
kib added inline comments.
sys/kern/vfs_lookup.c
747 ↗(On Diff #21061)

I pondered this but I think I do not want to do the change now.

Current structure of nameicap_check_dotdot() is not suitable for the move, since there are two error exits. There are reservations about MNT_LOCAL already stated, I do not want to change the control flow without the decision about local.

Additionally, that function is a helper with the single purpose, KTR/probes stuff lives in lookup() itself.

kib updated this revision to Diff 21139.Oct 7 2016, 1:24 PM

Remove nameicap_tracker_add() call from namei().

rwatson added inline comments.Oct 7 2016, 3:11 PM
sys/kern/vfs_lookup.c
96 ↗(On Diff #21139)

I wonder if "nameicap_tracker" or similar might be a better string than "rentr"?

153 ↗(On Diff #21139)

My initial thinking was that this new approach is safe even for !MNT_LOCAL -- do we have reason to think we need to retain this check?

kib added inline comments.Oct 7 2016, 3:50 PM
sys/kern/vfs_lookup.c
96 ↗(On Diff #21139)

I highly dislike long (or worse, containing space-delimited words) names for zones. Both vmstat -z and ddb 'show uma' become hard to read and parse. Anybody who looks at the zone needs to grep the source anyway.

153 ↗(On Diff #21139)

It depends on what you consider the adverse behavior on the server ? If you consider it fine to remove the check, I will just do it without arguing.

kib updated this revision to Diff 21775.Oct 28 2016, 4:04 PM

Add a knob to enable/disable dotdot lookups in cap mode. Restore pre-patch code to fail if dotdot parsed, and comments about that.

kib updated this revision to Diff 21776.Oct 28 2016, 4:06 PM

Fix copy/paste.

emaste added inline comments.Oct 28 2016, 6:31 PM
sys/kern/vfs_lookup.c
111 ↗(On Diff #21776)

Maybe enable \"..\" components in path lookup in capability mode? I think just "enable" is canonical, and nice to be shorter.

You may want to also add a comment before this that the intent is this becomes unconditionally enabled, but that a sysctl is provided until verification efforts are complete.

kib updated this revision to Diff 21786.Oct 29 2016, 9:05 AM

Suggestions by emaste.
Use flags instead of two booleans.

Minor style editing will be committed separately.

If you wanted, you could go ahead and commit those now and rebase this patch.

On superficial review this patch looks good to me, but I don't have enough knowledge of name lookups to really be confident. I hope it will be reviewed by someone who does.

mjg accepted this revision.Oct 31 2016, 8:22 PM
This revision is now accepted and ready to land.Oct 31 2016, 8:22 PM
This revision was automatically updated to reflect the committed changes.