Page MenuHomeFreeBSD

Unmout devfs when killing a jail.
AbandonedPublic

Authored by robak on Mar 24 2015, 8:56 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Oct 3, 6:09 PM
Unknown Object (File)
Aug 21 2024, 11:06 PM
Unknown Object (File)
Aug 21 2024, 10:53 PM
Unknown Object (File)
Aug 7 2024, 7:57 AM
Unknown Object (File)
Jul 31 2024, 1:35 AM
Unknown Object (File)
May 9 2024, 9:17 PM
Unknown Object (File)
May 9 2024, 9:17 PM
Unknown Object (File)
May 9 2024, 1:28 PM

Details

Summary

When creating jails that are using devfs it mounts them in to the jail.
Upon killing such jail these devfs mounts are left intact mounted, cluttering
the host system.

This patch has been written and sent by Pawel Biernacki <pawel.biernacki@gmail.com>

Test Plan

Apply the patch, compile, create jail using devfs mount and kill it - the devfs
should be unmounted.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

robak retitled this revision from to Unmout devfs when killing a jail..
robak updated this object.
robak edited the test plan for this revision. (Show Details)
robak added reviewers: dim, jamie, hrs, bz.

Hi,

The concept and mechanism look good to me, however I see a few potential issues:

(1) unmount() should either always be passed the MNT_FORCE flag, or a flag should be added to jail(8) to allow a forced shutdown. I guess the argument could be made that this isn't necessary if you kill all processes in the jail.
(2) The path filter needs to check for trailing slashes, in case multiple jails share the same path string before the trailing slash.
(3) Are paths always listed in dependency order? E.g. would <jailroot>/var/log always get unmounted before <jailroot>/var?
(4) This seems like the sort of thing that would be relatively trivial to create tests for; I strongly encourage adding them to test the above issues and the original goal.

Thanks!

robak added a subscriber: kaktus.

As noted by others, this patch unmounts more than just the devfs filesystem. Which is fine, since it serves the purpose. Jail shutdown should clean up all resources associated with it.

usr.sbin/jail/jail.c
484

do you need to check errno here too (in addition to ret value 0)?

In D2130#8, @will wrote:

As noted by others, this patch unmounts more than just the devfs filesystem. Which is fine, since it serves the purpose. Jail shutdown should clean up all resources associated with it.

Would it be a better idea to parse the jail.conf and instead unmount everything that's found in there, assuming that this is what the jail should really have? This creates a problem when someone changed the jail config while the jail was running, without recycling the jail properly but it shouldnt be done this way, I think.

This may work better as a "command", i.e. IP_* thing. For one, unmounting parts of the jail and then running a stop command inside the jail is not a good idea. The code already has a concept of dependencies between commands and that should be taken advantage of.

If you care just about devfs, IP_MOUNT_DEVFS appears to be capable of an umount command already.

In D2130#11, @jilles wrote:

If you care just about devfs, IP_MOUNT_DEVFS appears to be capable of an umount command already.

Indeed - a mount.devfs parameter in the jail will do the right thing:

boromir# jail -vr afcm
afcm: run command in jail: /bin/sh /etc/rc.shutdown
Stopping inetd.
Waiting for PIDS: 27302.
Stopping cron.
Waiting for PIDS: 27286.
Stopping apache22.
Waiting for PIDS: 27274.
Stopping sshd.
Waiting for PIDS: 27269.
Stopping mysql.
Waiting for PIDS: 27267.
.
Terminated
afcm: sent SIGTERM to: 27282 27279 27079
afcm: removed
afcm: run command: /sbin/umount /jail/afcm/dev
afcm: run command: /sbin/ifconfig em0 inet XXX.XXX.XXX.XXX netmask 255.255.255.255 -alias

Adding "-f" to the umount command seems a reasonable addition, but other than that I don't see what this patch is supposed to accomplish that isn't already there.

And if it's not just devfs - anything that's mounted in jail created (mount.*) is unmounted (in reverse order) on jail deletion.

dim removed a reviewer: dim.
In D2130#13, @jamie wrote:
In D2130#11, @jilles wrote:

If you care just about devfs, IP_MOUNT_DEVFS appears to be capable of an umount command already.

Indeed - a mount.devfs parameter in the jail will do the right thing:

boromir# jail -vr afcm
afcm: run command in jail: /bin/sh /etc/rc.shutdown
Stopping inetd.
Waiting for PIDS: 27302.
Stopping cron.
Waiting for PIDS: 27286.
Stopping apache22.
Waiting for PIDS: 27274.
Stopping sshd.
Waiting for PIDS: 27269.
Stopping mysql.
Waiting for PIDS: 27267.
.
Terminated
afcm: sent SIGTERM to: 27282 27279 27079
afcm: removed
afcm: run command: /sbin/umount /jail/afcm/dev
afcm: run command: /sbin/ifconfig em0 inet XXX.XXX.XXX.XXX netmask 255.255.255.255 -alias

Adding "-f" to the umount command seems a reasonable addition, but other than that I don't see what this patch is supposed to accomplish that isn't already there.

And if it's not just devfs - anything that's mounted in jail created (mount.*) is unmounted (in reverse order) on jail deletion.

I've made some more testing to figure out why you say it works, when I clearly see it doesnt, and I think I've found a problem: when jail -r jailname is used, then indeed, the jail command cleans up the mounts it created. However, when jail -r jid is used, the mounts are staying after killing the jail and cluttering the host by future mounts in the same place.

I'd suggest adding a functionality to jail -r jid which seems to be not reading the config when jailname is not used, to check the mounts in killed jail and remove them, in some similar way that the patch is doing it, to clean up after the jail. In addition to that, creating a jail -c could verify if there are any leftover mounts in the jail's target mountpoint to do something about them (cleanup? warning? error?).

In D2130#16, @robak wrote:

I've made some more testing to figure out why you say it works, when I clearly see it doesnt, and I think I've found a problem: when jail -r jailname is used, then indeed, the jail command cleans up the mounts it created. However, when jail -r jid is used, the mounts are staying after killing the jail and cluttering the host by future mounts in the same place.

I'd suggest adding a functionality to jail -r jid which seems to be not reading the config when jailname is not used, to check the mounts in killed jail and remove them, in some similar way that the patch is doing it, to clean up after the jail. In addition to that, creating a jail -c could verify if there are any leftover mounts in the jail's target mountpoint to do something about them (cleanup? warning? error?).

True, specifying a jail by jid doesn't read the config file, which is keyed by name (unless you actually use the jid as the name). It would make sense to call jail_getname(3) read the config entry. It might be a bit of a confusion to users who expect the current behavior, but the -R flag has been the proper and documented way to not use the config file and they could depend on that. It's not quite as simple as that though, since for backward compatibility purposes, -r jid should still work when the jail isn't in the config file.

But that's a completely different direction than this patch. If a mount point isn't in the config file (or if there is no config file entry), I don't think anything should be unmounted without specific user direction. I would assume that if something is mounted outside of jail.conf, the user had his own reason to mount it, and may well have his own reason to keep it that way.

As for mount points that are in jail.conf, a failed unmount will give an error message (well, umount(8) will).

In D2130#17, @jamie wrote:

True, specifying a jail by jid doesn't read the config file, which is keyed by name (unless you actually use the jid as the name). It would make sense to call jail_getname(3) read the config entry. It might be a bit of a confusion to users who expect the current behavior, but the -R flag has been the proper and documented way to not use the config file and they could depend on that. It's not quite as simple as that though, since for backward compatibility purposes, -r jid should still work when the jail isn't in the config file.

it is possible to start 'standalone' (i.e. without an entry in the config) jails and have jail(8) mount stuff for you. As it is there is no mechanism to automatically clean this up.

But that's a completely different direction than this patch. If a mount point isn't in the config file (or if there is no config file entry), I don't think anything should be unmounted without specific user direction. I would assume that if something is mounted outside of jail.conf, the user had his own reason to mount it, and may well have his own reason to keep it that way.

Unmounting random filesystems which were mounted somewhere under the jail is definitely not an option.

However, the bigger problem in current patch is the fact that it runs unmount while jailed processes are still running, allowing them to race unmounter to force it to unmount filesystems outside of the jail (rename + symlink stuff after getmntinfo returns).

I believe I already wrote about this general problem.

My proposed solution is to keep a list of cleanup tasks associated with given jail in the kernel. We can start with something as simple as just tracking mounts done by jail(8). Would be trivially extendable to track mounts done by jailed root and cleaning them up automatically.

Note this would not touch filesystems not mounted by jailed root.

This has some caveats (e.g. would you track a vnode or a path which can no longer be a mount point)?, but I belive is the safest option.

If going with keeping the path, we can trivially assure lookups being done against jail root.

Tracking a vnode has the problem that currently there is no nice callback mechanism to tell us someone unmounted it. (there is a general on umount hook, but it is executed after unmount completed, we would need to additionaly hold the vnode and would be called for *all* unmount requests to check what's up. or iow: yuck).

Thank you all for the constructive feedback. A bit of history. I wrote this more that two year ago as a quick hack to solve a problem of leftovers after removing jails using jids. From the two years perspective I see that now I'd write it differently.

So my proposition is to add another command/flag that'd explicitly clean underlaying path from all mounts. The flag could be specified during creation and deletion of a jail.
I found the idea of keeping list of mounts done by local root in kernel a bit overkill for now. The first thing I'd like to achieve is support for unmounting all fs in jail during its removal, regardless of using name (and reading config) or jid.

Comments welcome.

In D2130#20, @pawel.biernacki-gmail.com wrote:

So my proposition is to add another command/flag that'd explicitly clean underlaying path from all mounts. The flag could be specified during creation and deletion of a jail.
I found the idea of keeping list of mounts done by local root in kernel a bit overkill for now. The first thing I'd like to achieve is support for unmounting all fs in jail during its removal, regardless of using name (and reading config) or jid.

Blind unmount of all filesystems from given tree is unsuitable for general use and still leaves users without the a way to just clean up after jail destruction.

My proposal is to keep the list of filesystems to unmount in the kernel, which is easy to achieve.

Tracking mounts created by jailed root will be necessary at some point, but is completely optional for now.

There is no way to achieve similar correctness with a userspace tool without modifying the kernel in some way anyway.

So my proposition is to add another command/flag that'd explicitly clean underlaying path from all mounts. The flag could be specified during creation and deletion of a jail.

Ideally I would like to see jail cleaning after itself as a default behavior instead of an extra flag because that promotes a 'healthier' and cleaner OS, so perhaps instead a flag could be added to NOT do that and preserve original behavior for those that need it?

A flag/whatever to prevent automagic unmounting is fine, but it has to be specified at jail creation (with an option to change it later).

jail(8) mounts filesystem with current jail credentials (e.g. the 'host' if you are not jailed at the moment). A jail can die at any time (unless 'persistent' is set) and as such relying on jail -r to clean up after it is not really user-friendly (and how would you know what filesystems to unmount in the first place if a jail as created wihout an entry in config file?).

Or to re-state, I consider userspace approach inherently flawed. I also consider a kernel one relatively easy to achieve (pass filesystem list along when the jail is created, then let the kernel act on it when it decides to kill the jail). Note this does not deal with mounts by in-jail root very well without further tinkering, but neither does userspace approach. It definitely solves the problem in question though.

In D2130#23, @mjg wrote:

Or to re-state, I consider userspace approach inherently flawed. I also consider a kernel one relatively easy to achieve (pass filesystem list along when the jail is created, then let the kernel act on it when it decides to kill the jail).

What with filesystems mounted by in-jail root?

In D2130#24, @pawel.biernacki-gmail.com wrote:
In D2130#23, @mjg wrote:

Or to re-state, I consider userspace approach inherently flawed. I also consider a kernel one relatively easy to achieve (pass filesystem list along when the jail is created, then let the kernel act on it when it decides to kill the jail).

What with filesystems mounted by in-jail root?

As a total hack we can just traverse mountlist and check what mountpoints have credentials with given jail. As time goes the implementation will have to avoid such behaviour (and track paths/vnodes instead for scalability reasons), but the change will be transparent to users.

Note this information is not exported to userspace, so you cannot know what filesystems were mounted prior to jail creation (and should be left untouched). It could be beneficial to export this information, but the issue of jails just dying remains.

(strictly speaking a jail with jailed root's mountpoints would linger until relevant filesystems are unmounted, but other jails would disappear so leaving this case for jail -r would be quite weird)

Paweł reported he's not going to work in this anytime soon, and asked me to close this revision.