Details

Reviewers

mjg
kib

Commits

rG823dfd17e27c: Add support for mounting single files in nullfs
rGa3f714c4ff8c: Add support for mounting single files in nullfs

Summary

My main use-case for this is to support mounting config files and secrets
into OCI containers. My current workaround copies the files into the
container which is messy and risks secrets leaking into container images
if the cleanup fails.

Allow mounting files as well as directories

This adds a VFCF flag to indicate whether the filesystem supports file
mounts and allows fspath to be either a directory or a file if the flag
is set.

Test Plan

$ sudo mkdir -p /mnt
$ sudo touch /mnt/foo
$ sudo mount -t nullfs /COPYRIGHT /mnt/foo

head -1 /mnt/foo
@(#)COPYRIGHT 8.2 (Berkeley) 3/21/94

The compilation of software known as FreeBSD is distributed under the
following terms:

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 48573
Build 45459: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Updated to include requested changes. I'm still working on a regular-file equivalent to vn_path_to_global_path.

Harbormaster completed remote builds in B48518: Diff 113553.Nov 28 2022, 3:02 PM

In D37478#852943, @mjg wrote:

The invocation suggested is definitely a problem: sudo mount -t nullfs /COPYRIGHT /mnt/foo

should someone try to mount a regular file by accident on stock system their attempt is going to fail, patched one will succeed when it should not have.

I think there should be an explicit option (say -o regfile or similar) which would *require* the target to be VREG when used, VDIR otherwise.

I'm not sure what the problem is here. Assuming that /mnt/foo is a regular file, why should the example mount fail? Surely the checks to enforce that the two objects are either both file or both directory is sufficient.

sbin/mount/getmntopts.c
155	I am mostly attempting to follow existing here but there is clearly some redundancy. This kind of checking is happening in mount, mount_nullfs and kernel. It would be nice to have only kernel checking but it seems like that is best left until we can get better error messages from kernel.
sys/kern/vfs_mount.c
1564	I will work on adding a vn_path_to_global_path_hardlink with similar semantics, ideally with a simpler interface to vn_fullpath_hardlink.

Add vn_path_to_global_path_hardlink. This works for trivial tests and makes
the code in vfs_domount more readable. I'll update my podman test VM and
stress test it a little.

Harbormaster completed remote builds in B48519: Diff 113555.Nov 28 2022, 4:06 PM

dfr marked an inline comment as done.Nov 28 2022, 4:07 PM

In D37478#852960, @dfr wrote:

In D37478#852943, @mjg wrote:

The invocation suggested is definitely a problem: sudo mount -t nullfs /COPYRIGHT /mnt/foo

should someone try to mount a regular file by accident on stock system their attempt is going to fail, patched one will succeed when it should not have.

I think there should be an explicit option (say -o regfile or similar) which would *require* the target to be VREG when used, VDIR otherwise.

I'm not sure what the problem is here. Assuming that /mnt/foo is a regular file, why should the example mount fail? Surely the checks to enforce that the two objects are either both file or both directory is sufficient.

I mean they made a mistake and with your patch it will happen to "work" even though it is not what was meant.

sys/kern/vfs_mount.c
1558	this needs to test on VREG? a switch statement is probably best here

zlei added a subscriber: zlei.Nov 29 2022, 3:57 AM

In D37478#853071, @mjg wrote:

In D37478#852960, @dfr wrote:

In D37478#852943, @mjg wrote:

The invocation suggested is definitely a problem: sudo mount -t nullfs /COPYRIGHT /mnt/foo

should someone try to mount a regular file by accident on stock system their attempt is going to fail, patched one will succeed when it should not have.

I think there should be an explicit option (say -o regfile or similar) which would *require* the target to be VREG when used, VDIR otherwise.

I'm not sure what the problem is here. Assuming that /mnt/foo is a regular file, why should the example mount fail? Surely the checks to enforce that the two objects are either both file or both directory is sufficient.

I mean they made a mistake and with your patch it will happen to "work" even though it is not what was meant.

That kind of makes sense - the closest equivalent elsewhere is on Linux where bind mounts ignore the mount type but require the MS_BIND flag. I'm not sure gating this with an option is the best interface but don't feel that strongly. A possible alternative would be a sysctl to globally enable/disable this new feature?

sys/kern/vfs_mount.c
1558	I don't see the need for a switch - the only reason I needed something other than vn_path_to_global_path here was that non-directories need a little help to resolve, e.g. parent directory plus leaf component. This would be the same for types other than VREG. Perhaps this can be reduced to just vn_path_to_global_path_hardlink since that should worok just as well for directories. I can't think of a scenario where something other than VDIR or VREG is useful here but I would prefer to leave that kind of policy enforcement to vfs_domount_first which is where all the other sanity checks are applied.

Added a couple of comments above

you should ask pho@ to test this, I would not be surprised if an assert tripped over

addition of vn_path_to_global_path_hardlink has to be a standalone commit

sys/kern/vfs_mount.c
1558	my point is that that you may find yourself with neither VDIR nor VREG here and pass it down to vfs_domount_first. bare minimum this requires a comment that the type is checked in that routine.

Added a comment to note that vnode type validation happens in vfs_domount_first. Also fixed a couple of line-too-long formatting issues.

Harbormaster completed remote builds in B48573: Diff 113682.Nov 30 2022, 4:59 PM

I will also reach out to pho@ for advice on stress testing.

I'm going to have to take a closer look. v_mountedhere is in a union which can be used for other purposes for non-VDIR vnodes, chances are decent this trips over.

Split out vn_path_to_global_path_hardlink into its own commit.

Harbormaster completed remote builds in B48600: Diff 113774.Dec 2 2022, 2:14 PM

Added an ugly workaround for a panic caused by running realpath on a file mount.

Harbormaster completed remote builds in B48608: Diff 113801.Dec 4 2022, 3:56 PM

The proposed hack in kern___realpathat is definitely not the way to do it -- by the time the second lookup is performed the name may be different.

I figured this corner case should have an obvious solution: the target vnode has v_mountedhere, from there you should be able to reliably find the vnode from the mounted on filesystem, and that should be unique and consequently easily sorted out with reverse path resolution.

Except I realized there is another issue: hardlinks to the mount point losing the uniqueness which sounds like a huge can of worms. You can definitely check there are no hardlinks at mount time and deny creating them when mounted.

Another potential can of worms is stacked mounts, one after another. Should probably also get disallowed.

That said, I'm going to have to chew on this, there are more corner cases to consider that I initially thought.

In D37478#854930, @mjg wrote:

The proposed hack in kern___realpathat is definitely not the way to do it -- by the time the second lookup is performed the name may be different.

I figured this corner case should have an obvious solution: the target vnode has v_mountedhere, from there you should be able to reliably find the vnode from the mounted on filesystem, and that should be unique and consequently easily sorted out with reverse path resolution.

Except I realized there is another issue: hardlinks to the mount point losing the uniqueness which sounds like a huge can of worms. You can definitely check there are no hardlinks at mount time and deny creating them when mounted.

Another potential can of worms is stacked mounts, one after another. Should probably also get disallowed.

That said, I'm going to have to churn on this, there are more corner cases to consider that I initially thought.

I realised my change to kern___realpathat was broken shortly after I updated the diff. For one thing, it returns the wrong path if the input path is relative. At the moment, I'm conditionally disabling the vp_crossmp replacement for lookups where the mount point is the last path element but that probably opens up the same LOR from commit 7f92c4e which crossmp was introduced to fix.

Thanks for the hint about v_mountedhere - I will try to make that work. Also, restricting these mounts to files without hardlinks etc. makes sense to me so I will add a check for that.

I'm not confident the LOR is really there and even if it is, crossmp is a hack solution. unmount can instead be patched around to not suffer the problem, but I don't have the time right now to look into it.

In D37478#854931, @dfr wrote:

Thanks for the hint about v_mountedhere - I will try to make that work. Also, restricting these mounts to files without hardlinks etc. makes sense to me so I will add a check for that.

Erm, I meant v_mount. The idea being you can test this is a nullfs file mount, and if so, v_mount can be used to walk up as it contains a pointer to the covered vnode.

Second attempt at fixing realpath for file mounts - use the covered vnode
with vn_fullpath. This assumes that the covered vnode is in the name cache
which I'm not convinced is a good assumption.

This also prevents mounting over files with more than one hardlink.
Attempting to create a hardlink to a file mount after its mounted is already
forbidden since it would be a cross-device mount. We should also disable
stacking file mounts but I'm not sure the right way to detect this.

Harbormaster completed remote builds in B48610: Diff 113831.Dec 5 2022, 11:06 AM

Slightly better logic when testing for file mounts in realpath

Harbormaster completed remote builds in B48611: Diff 113832.Dec 5 2022, 2:01 PM

I get a panic with D37478.id113832.diff : https://people.freebsd.org/~pho/stress/log/log0394.txt

In D37478#855162, @pho wrote:

I get a panic with D37478.id113832.diff : https://people.freebsd.org/~pho/stress/log/log0394.txt

This one looks like an attempt to mount a file onto an existing file mount. I'm going to disable that.

Don't allow stacking of file mounts

Harbormaster completed remote builds in B48624: Diff 113872.Dec 6 2022, 1:30 PM

Second attempt - we need to check for file mount stacking before the
call to vn_path_to_global_path_hardlink to avoid a panic

Harbormaster completed remote builds in B48625: Diff 113874.Dec 6 2022, 2:11 PM

Third attempt at stopping stacks of file mounts. With this one, pho@'s
n30.sh test runs without panicing the first time it attempts to stack
files.

Harbormaster completed remote builds in B48629: Diff 113884.Dec 6 2022, 5:23 PM

I have not observed any problems while testing with D37478.113884.patch

Thanks for all your work Peter!

Assuming that we complete the review process and get this change into -current, does anyone have concerns merging this into 13-stable? It would be really nice to have this feature available in 13.2-RELEASE.

kib added a subscriber: kib.Dec 9 2022, 5:34 AM

kib added inline comments.

sbin/mount_nullfs/mount_nullfs.8
73
sys/kern/vfs_cache.c
3163	Should be a blank line above this one. That said, what does prevent reclaiming of nd.ni_vp? It is not locked, and this allows both unmount and reclaim, which causes v_mount to become NULL.
3166
3831	Since you unlock the vnode right after the assert, should you also assert that the vnode has use-ref on it?

BTW, I remember Solaris had the ability to mount a regular file over a regular file, without stacking helpers.

I mean, why requiring nullfs there? Just keep a reference to the lower vnode as v_mountedthere and make lookup() handle it. This is probably as close to Linux' bind mount as it is possible.

In D37478#856005, @kib wrote:

BTW, I remember Solaris had the ability to mount a regular file over a regular file, without stacking helpers.

I mean, why requiring nullfs there? Just keep a reference to the lower vnode as v_mountedthere and make lookup() handle it. This is probably as close to Linux' bind mount as it is possible.

I think using nullfs here simplifies the change quite a bit - doing things the bind way would involve inventing a way of expressing a non-filesystem mount to nmount, plumbing that through in /sbin/mount etc.

One thing we do get from nullfs is that the mounted file gets a new st_dev which prevents hardlinks if thats useful.

dfr added inline comments.Dec 9 2022, 2:57 PM

sys/kern/vfs_cache.c
3163	I can lock nd.ni_vp before dereferencing v_mount. Would that be enough and do I still need to call vfs_busy? I'm not sure about the ordering of these locks.

kib added inline comments.Dec 9 2022, 3:20 PM

sys/kern/vfs_cache.c
3163	vfs_busy() is before any vnode lock, see a long explanation right above it in vfs_subr.c. I think that locking ni_vp and checking for doomed state should be enough to guarantee the path to covered vnode reference.

Whitespace fix, add locking to kern___realpathat

Harbormaster completed remote builds in B48692: Diff 114033.Dec 10 2022, 2:04 PM

Style fix 'if (error != 0)'

Harbormaster completed remote builds in B48693: Diff 114034.Dec 10 2022, 2:07 PM

dfr marked an inline comment as done.Dec 10 2022, 2:10 PM

dfr added inline comments.

sys/kern/vfs_cache.c
3831	I don't quite understand what a 'use-ref' is in this context. The assert here is mostly copying the logic in vn_path_to_global_path.

kib added inline comments.Dec 10 2022, 2:25 PM

sys/kern/vfs_cache.c
3167	LK_RETRY implies the check for VN_IS_DOOMED(), this block is not needed
3831	vref() increments v_usecount, which I sort of mentioned by using the 'use-ref' abbreviation.

kib added inline comments.Dec 10 2022, 4:42 PM

sys/kern/vfs_cache.c
3167	Sorry, I mean "remove LK_RETRY", then vn_lock() does the check for VN_IS_DOOMED()

Simplify locking in kern___realpathat

Harbormaster completed remote builds in B48699: Diff 114064.Dec 12 2022, 12:05 PM

dfr added inline comments.Dec 12 2022, 12:05 PM

sys/kern/vfs_cache.c
3831	I could add KASSERT(vrefcnt(vp) > 0) but it doesn't match the logic in vn_path_to_global_path and both functions document their pre-conditions fairly clearly.

kib accepted this revision.Dec 12 2022, 1:31 PM

This revision is now accepted and ready to land.Dec 12 2022, 1:31 PM

Thanks Konstantin, Mateusz for all your help in the review process. Are there any other areas which need work here? I remember that Mateusz was interested in ensuring that existing erroneous attempts to mount files would succeed with this change. How serious is this potential problem? Perhaps we can mitigate by gating this with a sysctl? I plan to merge this to stable/13 so perhaps have this feature opt in for 13 and opt out for 14 and later?

In D37478#856652, @dfr wrote:

I remember that Mateusz was interested in ensuring that existing erroneous attempts to mount files would succeed with this change. How serious is this potential problem? Perhaps we can mitigate by gating this with a sysctl? I plan to merge this to stable/13 so perhaps have this feature opt in for 13 and opt out for 14 and later?

I personally do not think that this is a serious issue. But if you want to provide a gatekeeping, I would suggest adding a mount option (like mount -o allowfile -t nullfs /from /to). This option is not even needs to be passed to kernel, check for it in mount_null(8). IMO it is more explicit than a hidden sysctl.

I do not see why this patch, in any form, cannot be merged to 13, as is.

Personally, I don't think its necessary to be bug-for-bug compatible in most cases. If Mateusz feels strongly about it I will add a check in mount_nullfs as suggested but if not I'm happy with to move forwards as-is.

What "bug to bug compatiblity"?

I noted mount -t nullfs /some-file /some-other-file would fail on stock kernel and succeed on the patched one. If someone messes up and specifies regular files by accident, this is going to work out when it should not have from their perspective and this is why I suggested a switch to denote what they want, but I'm not going to insist on it.

code-wise, I'm deeply unsatisfied with this crossmp vnode business. I'll look around, perhaps it is fixable, in which case the above patch will be able to drop the realpath kludge.

Past that I don't see problems with the code and I'll ack in a day or two if the crossmp endeavor wont show anything promising.

In D37478#856863, @mjg wrote:

What "bug to bug compatiblity"?

I noted mount -t nullfs /some-file /some-other-file would fail on stock kernel and succeed on the patched one. If someone messes up and specifies regular files by accident, this is going to work out when it should not have from their perspective and this is why I suggested a switch to denote what they want, but I'm not going to insist on it.

I used the wrong term. I was just trying to point out that if an application has a bug (mistakenly mounting a file) or is trying to verify that FreeBSD does not have the ability to mount files, it should not stop us from implementating that feature.

code-wise, I'm deeply unsatisfied with this crossmp vnode business. I'll look around, perhaps it is fixable, in which case the above patch will be able to drop the realpath kludge.

Past that I don't see problems with the code and I'll ack in a day or two if the crossmp endeavor wont show anything promising.

I am also unhappy with the crossmp parts but did not want to attempt to change that in this diff - working around it seemed like the lower-risk option. Looking forward to seeing if you can find a cleaner way.

I ran a full stress2 test with D37478.114064.patch. No problems seen.

@markj is there an easy way to convince syzkaller to play with file mounts?

In D37478#857428, @mjg wrote:

@markj is there an easy way to convince syzkaller to play with file mounts?

I'm not sure about "easy", but it should be pretty straightforward. You'd add a pseudo-syscall probably modeled after syz_mount_image() in executor/common_linux.h. That'd be a C function which wraps nmount(2), and from syzkaller's perspective it's just another system call. Then add some syzlang definitions which bind specific parameter values for syz_mount_image, e.g., by specifying "nullfs" as the filesystem type. These would go in sys/freebsd/mount.txt or so. Again, Linux has some examples here in sys/linux/filesystem.h.

If anyone would like to take this on, let me know and I can provide more details.

well I don't have time to do the syzkaller bit, but someone(tm) should. i'm still not convinced the crossmp business is necessary, but this should not be holding up the patch either.

Thanks everyone. I'm going to rebase and land this today.

Closed by commit rGa3f714c4ff8c: Add support for mounting single files in nullfs (authored by dfr). · Explain WhyDec 19 2022, 4:50 PM

This revision was automatically updated to reflect the committed changes.

dfr added a commit: rGa3f714c4ff8c: Add support for mounting single files in nullfs.

dfr added a commit: rG823dfd17e27c: Add support for mounting single files in nullfs.Jan 6 2023, 9:46 AM

Add support for mounting single files in nullfs
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 113682

sbin/mount/getmntopts.c

sbin/mount/mntopts.h

sbin/mount/mount.c

sbin/mount_nullfs/mount_nullfs.8

sbin/mount_nullfs/mount_nullfs.c

sys/fs/nullfs/null_vfsops.c

sys/kern/vfs_cache.c

sys/kern/vfs_mount.c

sys/sys/mount.h

sys/sys/vnode.h

Add support for mounting single files in nullfsClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 113682

sbin/mount/getmntopts.c

sbin/mount/mntopts.h

sbin/mount/mount.c

sbin/mount_nullfs/mount_nullfs.8

sbin/mount_nullfs/mount_nullfs.c

sys/fs/nullfs/null_vfsops.c

sys/kern/vfs_cache.c

sys/kern/vfs_mount.c

sys/sys/mount.h

sys/sys/vnode.h

Add support for mounting single files in nullfs
ClosedPublic
Actions

Revision Contents
Changeset List