Page MenuHomeFreeBSD

sys/kern/kern_jail: Improve subsystem flag handling
Needs ReviewPublic

Authored by ivy on Wed, May 27, 5:19 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Jun 8, 8:09 PM
Unknown Object (File)
Fri, Jun 5, 6:10 PM
Unknown Object (File)
Fri, Jun 5, 6:07 PM
Unknown Object (File)
Fri, Jun 5, 6:52 AM
Unknown Object (File)
Fri, Jun 5, 3:33 AM
Unknown Object (File)
Thu, Jun 4, 11:56 PM
Unknown Object (File)
Thu, Jun 4, 3:57 PM
Unknown Object (File)
Thu, Jun 4, 2:07 AM
Subscribers

Details

Reviewers
kevans
markj
netchild
olce
des
Group Reviewers
Jails
Summary

The logic for handling "host", "vnet", "ip4" and "ip6" in kern_jail
is somewhat confused: it tries to define a "new" and "disable" flag
for each subsystem, but the disable flag is either 0 or the same as
the enable flag. Then, when checking valid flags, it unconditionally
sets the 'new' flag in ch_flags.

This means that if, say, 'vnet' is set to 'inherit', the jail will
fail to start, because PR_VNET is set in ch_flags, even though the
jail should not have PR_VNET enabled.

There is additionally some confused handling of pr_flags that tries
to determine if an option is 'disable', 'new' or 'inherit' based on
the subsystem flags, but this can never work, because it can only
indicate either enabled or not enabled. This leads to 'jls -n'
showing incorrect values, e.g. a jail with 'ip4.addr' explicitly
set will show as 'ip4=disable'.

Fix this up within the limits of what's possible without refactoring
the entire thing. jailsys_flags now has name, immutable, can_disable
and flag members. Disallow changing immutable flags, or disabling
can_disable flags. Where we try to report the value of a subsystem,
return either "new" or "inherit" based on whether the flag is set.

This is probably still not entirely consistent with how it's meant
to work, but it fixes some obvious issues with jail(8).

MFC after: 4 weeks

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 73484
Build 70367: arc lint + arc unit

Event Timeline

ivy requested review of this revision.Wed, May 27, 5:19 PM

Yes, the current setup is an ad-hoc mess, and could use some work. But maybe we *should* refactor the whole thing. Not to the point of ABI change with different parameters, but at least with enough specification inside to be able to know and report the state of subsystems. The current setup of a single bit works for the majority of systems, which can't be disabled. But the truly three-way systems could be standardized with separate "enabled" and "new" flags. There's usually a "real" state that uses something aside from a bit, such as the existence of the pr_addrs array, but these could exist along with the bitmask.

Since this fix still uses only the one bit, it can still give a wrong answer in jail_get, this time returning JAIIL_SYS_DISABLE for "inherit" systems, even for can_disable=false.

I can only find two places where all three states are possible: IP4/6, and SYSV IPC. The latter lives outside of kern_jail.c, and uses a control block pointer to differentiate (null is DISABLE, same as parent's block is INHERIT, different is NEW). IP4/6 are an admittedly bad fit, given that "new" takes freedom away rather than adding it, like the new namespace used by SYSV IPC.

I've talked too much. Short answer: how about we add a second bit, even if just for proper reporting in jail_get?

in the mean time, this diff is still wrong: while it fixes my original problem, it breaks poudriere somehow. so i'll have another look at this and see if we can handle tri-state options better.