Paths

Table of Contentst

nullfs lookup: cn_flags is 64bit
ClosedPublic
Actions

Authored by kib on Sat, May 17, 5:52 AM.

Details

Reviewers

markj
jah
olce

Commits

rG89549c234817: nullfs lookup: cn_flags is 64bit

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib created this revision.Sat, May 17, 5:52 AM

Herald added a subscriber: imp. · View Herald TranscriptSat, May 17, 5:52 AM

kib requested review of this revision.Sat, May 17, 5:52 AM

This doesn't fix the problem. There are no mount points involved (except for a nullfs mount point).

The issue involves what happens if 1) a process is using a nullfs vnode as the cwd, 2) something moves the corresponding lower vnode outside of the directory hierarchy bound by the nullfs mount. Suppose that the process does chdir(".."). null_lookup() does the lower vnode lookup, then calls null_nodeget() to find the corresponding nullfs node. There is none (because the lower vnode isn't covered by nullfs anymore), so null_nodeget() instantiates a new vnode and returns it.

Now keep doing that until we reach the process' root directory (i.e., the jail root). We will return a nullfs vnode where the lower vnode is the process root dir. But then the root dir check in vfs_lookup() does not catch it, because it is using pointer equality to check whether the returned vnode is the root.

In fact this is the same case that is already handled by null_lookup ("Renames in the lower mounts might create an inconsistent configuration..."), but that check is not sufficient unless the root directory is also a mount point.

To fix it, I think null_lookup() needs to check whether the /lower/ vnode is equal to ndp->ni_rootdir or ndp->ni_topdir. But, VOP_LOOKUP does not get the nameidata. Maybe we should start passing that as an extra parameter?

In D50390#1150039, @markj wrote:

This doesn't fix the problem. There are no mount points involved (except for a nullfs mount point).

The issue involves what happens if 1) a process is using a nullfs vnode as the cwd, 2) something moves the corresponding lower vnode outside of the directory hierarchy bound by the nullfs mount. Suppose that the process does chdir(".."). null_lookup() does the lower vnode lookup, then calls null_nodeget() to find the corresponding nullfs node. There is none (because the lower vnode isn't covered by nullfs anymore), so null_nodeget() instantiates a new vnode and returns it.

Now keep doing that until we reach the process' root directory (i.e., the jail root). We will return a nullfs vnode where the lower vnode is the process root dir. But then the root dir check in vfs_lookup() does not catch it, because it is using pointer equality to check whether the returned vnode is the root.

In fact this is the same case that is already handled by null_lookup ("Renames in the lower mounts might create an inconsistent configuration..."), but that check is not sufficient unless the root directory is also a mount point.

To fix it, I think null_lookup() needs to check whether the /lower/ vnode is equal to ndp->ni_rootdir or ndp->ni_topdir. But, VOP_LOOKUP does not get the nameidata. Maybe we should start passing that as an extra parameter?

I am confused. Why is nullfs important in such scenario?
If you have a jailed process with cwd, then cwd is moved outside the jail root, the process already escaped the jail fs namespace. The layer of nullfs only adds some confusion but does not enhance the escape possibility.
Might be, more explicit description, with the written out hierarchy, cwd point, renames, and mounts, would be useful.

In fact, for nullfs + jails, I think that we should recommend to only use nullfs mounted over root of some fs,

OTOH, I believe that my patch in the review does fix a case where nullfs creates null vnode for wrong lower mp.

In D50390#1150078, @kib wrote:

In fact, for nullfs + jails, I think that we should recommend to only use nullfs mounted over root of some fs,

Yes, that nullfs is mounted over some FS root, *and* that the jail has the the mounted nullfs' root as its own root.

OTOH, I believe that my patch in the review does fix a case where nullfs creates null vnode for wrong lower mp.

Indeed.

In D50390#1150146, @olce wrote:

In D50390#1150078, @kib wrote:

OTOH, I believe that my patch in the review does fix a case where nullfs creates null vnode for wrong lower mp.

Indeed.

Or maybe not. Crossing a mount point involves calling the namei()/vfs_lookup() machinery, which IIRC is never the case by implementations of VOP_LOOKUP(), so how could this happen in practice here?

In D50390#1150039, @markj wrote:

(snip)

I agree with your analysis here, and your comment in the PR, except for:

In fact this is the same case that is already handled by null_lookup ("Renames in the lower mounts might create an inconsistent configuration..."), but that check is not sufficient unless the root directory is also a mount point.

which I think is unrelated.

In D50390#1150186, @olce wrote:

In D50390#1150146, @olce wrote:

In D50390#1150078, @kib wrote:

OTOH, I believe that my patch in the review does fix a case where nullfs creates null vnode for wrong lower mp.

Indeed.

Or maybe not. Crossing a mount point involves calling the namei()/vfs_lookup() machinery, which IIRC is never the case by implementations of VOP_LOOKUP(), so how could this happen in practice here?

Yes, you are right. OTOH the situation with nullfs mount with null root over non-covered vnode still makes me wonder of possible leaks of lover roots (I understand that this is unclear formulation, but for now I do not see it useful to spend others time).

I left the other bug fix from this review.

kib updated this revision to Diff 155645.Sun, May 18, 9:09 AM

kib retitled this revision from nullfs lookup: looked up vnode must be from the lower mount to nullfs lookup: cn_flags is 64bit.

kib edited the summary of this revision. (Show Details)

olce accepted this revision.Sun, May 18, 9:19 AM

This revision is now accepted and ready to land.Sun, May 18, 9:19 AM

Closed by commit rG89549c234817: nullfs lookup: cn_flags is 64bit (authored by kib). · Explain WhySun, May 18, 9:24 AM

This revision was automatically updated to reflect the committed changes.

kib added a commit: rG89549c234817: nullfs lookup: cn_flags is 64bit.

In D50390#1150078, @kib wrote:

In D50390#1150039, @markj wrote:

This doesn't fix the problem. There are no mount points involved (except for a nullfs mount point).

The issue involves what happens if 1) a process is using a nullfs vnode as the cwd, 2) something moves the corresponding lower vnode outside of the directory hierarchy bound by the nullfs mount. Suppose that the process does chdir(".."). null_lookup() does the lower vnode lookup, then calls null_nodeget() to find the corresponding nullfs node. There is none (because the lower vnode isn't covered by nullfs anymore), so null_nodeget() instantiates a new vnode and returns it.

Now keep doing that until we reach the process' root directory (i.e., the jail root). We will return a nullfs vnode where the lower vnode is the process root dir. But then the root dir check in vfs_lookup() does not catch it, because it is using pointer equality to check whether the returned vnode is the root.

In fact this is the same case that is already handled by null_lookup ("Renames in the lower mounts might create an inconsistent configuration..."), but that check is not sufficient unless the root directory is also a mount point.

To fix it, I think null_lookup() needs to check whether the /lower/ vnode is equal to ndp->ni_rootdir or ndp->ni_topdir. But, VOP_LOOKUP does not get the nameidata. Maybe we should start passing that as an extra parameter?

I am confused. Why is nullfs important in such scenario?
If you have a jailed process with cwd, then cwd is moved outside the jail root, the process already escaped the jail fs namespace. The layer of nullfs only adds some confusion but does not enhance the escape possibility.

The example in the first comment of PR 262180 is enlightening.

There, the cwd is not moved outside of the jail root, it is just moved outside of the nullfs root. The cwd is a nullfs vnode, so chdir("..") looks up the lower vnode's parent, then creates a new nullfs vnode and returns that. That new nullfs vnode is also outside the nullfs root. So successive ".." lookups "shadow" the lower vnode, and let the process cross over the jail root.

Might be, more explicit description, with the written out hierarchy, cwd point, renames, and mounts, would be useful.

Let /j be a jail root. Assume it is not a mountpoint. Suppose the jail has allow.mount.nullfs permission.

With the jail (using full paths everywhere for clarity):

# mkdir -p /j/tmp/a/dir /j/tmp/b
# mount -t nullfs /j/tmp/a /j/tmp/b
# cd /j/tmp/b/dir
# mv /j/tmp/a/dir /j/tmp/c

Now if the process does chdir(".."), its cwd vnode will be a nullfs vnode with lower vnode equal to /j/tmp. If it does so again, its cwd vnode will be a nullfs vnode with lower vnode equal to /j. And so on and so on. And the vfs_lookup() check which makes sure that we don't chdir to /j/.. doesn't work, because we are chdir'ing to a nullfs alias of /j/...

In fact, for nullfs + jails, I think that we should recommend to only use nullfs mounted over root of some fs,

OTOH, I believe that my patch in the review does fix a case where nullfs creates null vnode for wrong lower mp.

To be clear, I don't claim the patch is wrong, just that it's not sufficient to fix the PR.

In D50390#1150187, @olce wrote:

In D50390#1150039, @markj wrote:

(snip)

I agree with your analysis here, and your comment in the PR, except for:

In fact this is the same case that is already handled by null_lookup ("Renames in the lower mounts might create an inconsistent configuration..."), but that check is not sufficient unless the root directory is also a mount point.

which I think is unrelated.

It is related. In the example I gave above, if /j is also a filesystem mountpoint, then the (ldvp->v_vflag & VV_ROOT) != 0 && (flags & ISDOTDOT) != 0 case in null_lookup() is true, and the user cannot escape the jail root.

In D50390#1150425, @markj wrote:

In D50390#1150078, @kib wrote:

In D50390#1150039, @markj wrote:

This doesn't fix the problem. There are no mount points involved (except for a nullfs mount point).

The issue involves what happens if 1) a process is using a nullfs vnode as the cwd, 2) something moves the corresponding lower vnode outside of the directory hierarchy bound by the nullfs mount. Suppose that the process does chdir(".."). null_lookup() does the lower vnode lookup, then calls null_nodeget() to find the corresponding nullfs node. There is none (because the lower vnode isn't covered by nullfs anymore), so null_nodeget() instantiates a new vnode and returns it.

Now keep doing that until we reach the process' root directory (i.e., the jail root). We will return a nullfs vnode where the lower vnode is the process root dir. But then the root dir check in vfs_lookup() does not catch it, because it is using pointer equality to check whether the returned vnode is the root.

In fact this is the same case that is already handled by null_lookup ("Renames in the lower mounts might create an inconsistent configuration..."), but that check is not sufficient unless the root directory is also a mount point.

To fix it, I think null_lookup() needs to check whether the /lower/ vnode is equal to ndp->ni_rootdir or ndp->ni_topdir. But, VOP_LOOKUP does not get the nameidata. Maybe we should start passing that as an extra parameter?

nameidata is available as container (in the sense of containerof()) of the passed cnp. Might be worth adding a flag to cnp to state that containerof is safe for case when VOP_LOOKUP() is called bare. But see below for my evaluation.

I am confused. Why is nullfs important in such scenario?
If you have a jailed process with cwd, then cwd is moved outside the jail root, the process already escaped the jail fs namespace. The layer of nullfs only adds some confusion but does not enhance the escape possibility.

The example in the first comment of PR 262180 is enlightening.

There, the cwd is not moved outside of the jail root, it is just moved outside of the nullfs root. The cwd is a nullfs vnode, so chdir("..") looks up the lower vnode's parent, then creates a new nullfs vnode and returns that. That new nullfs vnode is also outside the nullfs root. So successive ".." lookups "shadow" the lower vnode, and let the process cross over the jail root.

I see. So it is the reformulation of the fact that if nullfs is mounted over some lower fs, than any vnode of the lower fs could end up covered by the nullfs vnode, not only vnodes beneath the nullfs root.

Again this is self-inflicted/assisted issue.

Might be, more explicit description, with the written out hierarchy, cwd point, renames, and mounts, would be useful.

Let /j be a jail root. Assume it is not a mountpoint. Suppose the jail has allow.mount.nullfs permission.

With the jail (using full paths everywhere for clarity):
# mkdir -p /j/tmp/a/dir /j/tmp/b
# mount -t nullfs /j/tmp/a /j/tmp/b
# cd /j/tmp/b/dir
# mv /j/tmp/a/dir /j/tmp/c
Now if the process does chdir(".."), its cwd vnode will be a nullfs vnode with lower vnode equal to /j/tmp. If it does so again, its cwd vnode will be a nullfs vnode with lower vnode equal to /j. And so on and so on. And the vfs_lookup() check which makes sure that we don't chdir to /j/.. doesn't work, because we are chdir'ing to a nullfs alias of /j/...

In fact, for nullfs + jails, I think that we should recommend to only use nullfs mounted over root of some fs,

OTOH, I believe that my patch in the review does fix a case where nullfs creates null vnode for wrong lower mp.

To be clear, I don't claim the patch is wrong, just that it's not sufficient to fix the PR.

In D50390#1150187, @olce wrote:

In D50390#1150039, @markj wrote:

(snip)

I agree with your analysis here, and your comment in the PR, except for:

In fact this is the same case that is already handled by null_lookup ("Renames in the lower mounts might create an inconsistent configuration..."), but that check is not sufficient unless the root directory is also a mount point.

which I think is unrelated.

It is related. In the example I gave above, if /j is also a filesystem mountpoint, then the (ldvp->v_vflag & VV_ROOT) != 0 && (flags & ISDOTDOT) != 0 case in null_lookup() is true, and the user cannot escape the jail root.

In D50390#1150425, @markj wrote:

It is related. In the example I gave above, if /j is also a filesystem mountpoint, then the (ldvp->v_vflag & VV_ROOT) != 0 && (flags & ISDOTDOT) != 0 case in null_lookup() is true, and the user cannot escape the jail root.

Yes, sorry, I didn't really want to mean that it was completely unrelated, and what you say is true, it's just that this behavior has not been introduced with the problem at hand in mind and effectively does not handle it, except *by accident* for the case of a filesystem mountpoint.

In D50390#1150490, @kib wrote:

I see. So it is the reformulation of the fact that if nullfs is mounted over some lower fs, than any vnode of the lower fs could end up covered by the nullfs vnode, not only vnodes beneath the nullfs root.
Again this is self-inflicted/assisted issue.

I think this issue is quite different from the well-known jail escape problem, and more to the point here also from its specific variant involving using a nullfs filesystem as a jail's root filesystem because of its specific scenario and security characteristics despite being similar technically.

Let me first recap the other ones in this paragraph, just to be sure we are talking about the same things. The well-known jail escape happens when an administrator (or some user allowed by the FS permissions) moves a directory within the jail's hierarchy outside of that hierarchy (nullfs need not be involved at all), which can be seen as self-inflicting pain. I think however that assisting/warning an administrator here could be nice but a priori seems quite impractical; it's a different discussion which I won't go into here. One of the practical stopgap recommendation is to have a jail's root be a filesystem mount point (other recommendations are to say "don't do that" for root, which can be considered weak, and "ensure that non-root users cannot access the hierarchy outside the jail"; admittedly, these are more error-prone). However, that's not even enough when the filesystem where a jail is grafted is a nullfs, because there the duplicated hierarchy's root itself must *also* be a filesystem's root, else it is still possible to move a nullfs vnode used in the jail out of the jail by just moving the underlying vnode in the duplicated filesystem (the alternative recommendations above apply equally here). The just-developed scenario of moving a nullfs vnode by moving its underlying vnode is a fundamental characteristic of our nullfs, so the escape scenario just described in fact applies to *any* nullfs mounts in a jail (and, again, the stopgap is to graft a nullfs mount on some FS root vnode). I fear that people are generally not well aware of the security implications of using nullfs. All that equally applies to unionfs (and all stackable FS if some other exist out-of-tree).

Back to the the difference with this scenario/bug: Here, there is no need for an *outside* administrator (or user) to actually move any file. The mere delegation of the ability to mount nullfs in a jail gives an opportunity for an administrator *inside the jail* to escape it. That seems much more severe, and much less understandable to administrators/users than a deliberate file "move-out" IMHO. And it also seems that we can actually fix it relatively easily.

In D50390#1150039, @markj wrote:

To fix it, I think null_lookup() needs to check whether the /lower/ vnode is equal to ndp->ni_rootdir or ndp->ni_topdir. But, VOP_LOOKUP does not get the nameidata. Maybe we should start passing that as an extra parameter?

Passing this information to VFS_LOOKUP() is simpler and more efficient than the alternative of having a new VOP asking if some vnode is actually referencing in its lower layers some specific vnode (we would pass ndp->ni_rootdir or ndp->ni_topdir to it), but it forces the limiting behavior to be duplicated into all stackable filesystems. I'm fine with having some centralized helper function called from nullfs/unionfs, but I'd prefer calling it instead from a new vop_lookup_pre() (if that can work, I haven't worked out the details).

In D50390#1150490, @kib wrote:

nameidata is available as container (in the sense of containerof()) of the passed cnp. Might be worth adding a flag to cnp to state that containerof is safe for case when VOP_LOOKUP() is called bare. (snip)

Seems the simplest way indeed.

In D50390#1150490, @kib wrote:

In D50390#1150425, @markj wrote:

In D50390#1150078, @kib wrote:

In D50390#1150039, @markj wrote:

This doesn't fix the problem. There are no mount points involved (except for a nullfs mount point).

The issue involves what happens if 1) a process is using a nullfs vnode as the cwd, 2) something moves the corresponding lower vnode outside of the directory hierarchy bound by the nullfs mount. Suppose that the process does chdir(".."). null_lookup() does the lower vnode lookup, then calls null_nodeget() to find the corresponding nullfs node. There is none (because the lower vnode isn't covered by nullfs anymore), so null_nodeget() instantiates a new vnode and returns it.

Now keep doing that until we reach the process' root directory (i.e., the jail root). We will return a nullfs vnode where the lower vnode is the process root dir. But then the root dir check in vfs_lookup() does not catch it, because it is using pointer equality to check whether the returned vnode is the root.

In fact this is the same case that is already handled by null_lookup ("Renames in the lower mounts might create an inconsistent configuration..."), but that check is not sufficient unless the root directory is also a mount point.

To fix it, I think null_lookup() needs to check whether the /lower/ vnode is equal to ndp->ni_rootdir or ndp->ni_topdir. But, VOP_LOOKUP does not get the nameidata. Maybe we should start passing that as an extra parameter?

nameidata is available as container (in the sense of containerof()) of the passed cnp. Might be worth adding a flag to cnp to state that containerof is safe for case when VOP_LOOKUP() is called bare. But see below for my evaluation.

I am confused. Why is nullfs important in such scenario?
If you have a jailed process with cwd, then cwd is moved outside the jail root, the process already escaped the jail fs namespace. The layer of nullfs only adds some confusion but does not enhance the escape possibility.

The example in the first comment of PR 262180 is enlightening.

There, the cwd is not moved outside of the jail root, it is just moved outside of the nullfs root. The cwd is a nullfs vnode, so chdir("..") looks up the lower vnode's parent, then creates a new nullfs vnode and returns that. That new nullfs vnode is also outside the nullfs root. So successive ".." lookups "shadow" the lower vnode, and let the process cross over the jail root.

I see. So it is the reformulation of the fact that if nullfs is mounted over some lower fs, than any vnode of the lower fs could end up covered by the nullfs vnode, not only vnodes beneath the nullfs root.

Again this is self-inflicted/assisted issue.

The only assistance required is that the jail have allow.mount.nullfs set. I do not think we can consider that to be a case of operator error.

My patch for solving this is D50418.

Revision Contents
Changeset List

Path

Size

sys/

fs/

nullfs/

null_vnops.c

6 lines

Diff 155646

View Options

sys/fs/nullfs/null_vnops.c

nullfs lookup: cn_flags is 64bitClosedPublicActions