This revision is part of a series. Click on the Stack tab below to see the context.
This series has also been squeezed into D47633 to provide an overall view.
Commit message:
TL;DR:
Now monitor setcred() calls, and reject or grant them according to the
new rules specification.
Drop monitoring setuid() and setgroups(). As previously explained in
the commit introducing the setcred() system call, MAC/do must know the
entire new credentials while the old ones are still available to be able
to approve or reject the requested changes. To this end, the chosen
approach was to introduce a new system call, setcred(), instead of
modifying existing ones to be able to participate in a "prepare than
commit"-like protocol.
The MAC framework typically calls several hooks of its registered
policies as part of the privilege checking/granting process. Each
system call calls some dedicated hook early, to which it usually passes
the same arguments it received, whose goal is to forcibly deny access to
the functionality when needed (i.e., a single deny by any policy
globally denies the access). Then, the system call usually calls
priv_check() or priv_check_cred() an unspecified number of times, each
of which may trigger calls to two generic MAC hooks. The first such
call is to mac_priv_check(), and always happens. Its role is to deny
access early and forcibly, as can be done also in system calls'
dedicated early hooks (with different reach, however). The second,
mac_priv_grant(), is called only if the priv_check*() and
prison_priv_check() generic code doesn't handle the request by itself,
i.e., doesn't explicitly grant access (to the super user, or to all
users for a few specific privileges). It allows any single policy to
grant the requested access (regardless of whether the other policies do
so or not).
MAC/do currently only has an effect on processes spawned from the
'/usr/bin/mdo' executable. It implements all setcred() hooks, called
via mac_cred_setcred_enter(), mac_cred_check_setcred() and
mac_cred_setcred_exit(). In the first one, implemented in
mac_do_setcred_enter(), it checks if MAC/do has to apply to the current
process, allocates (or re-uses) per-thread data to be later used by the
other hooks (those of setcred() and the mac_priv_grant() one, called by
priv_check*()) and fills them with the current context (the rules to
apply). This is both because memory allocations cannot be performed
while holding the process lock and to ensure that all hooks called by
a single setcred() see the same rules to apply (not doing this would be
a security hazard as rules are concurrently changed by the
administrator, as explained in more details below). In the second one
(implemented by mac_do_check_setcred()), it stores in MAC/do's
per-thread data the new credentials. Indeed, the next MAC/do's hook
implementation to be called, mac_do_priv_grant() (implementing the
mac_priv_grant() hook) must have knowledge of the new credentials that
setcred() wants to install in order to validate them (or not), which the
MAC framework can't provide as the priv_check*() API only passes the
current credentials and a specific privilege number to the
mac_priv_check() and mac_priv_grant() hooks. By contrast, the very
point of MAC/do is to grant the privilege of changing credentials not
only based on the current ones but also on the seeked-for ones.
The MAC framework's constraints that mac_priv_grant() hooks are called
without context and that MAC modules must compose (each module may
implement any of the available hooks, and in particular those of
setcred()) impose some aspects of MAC/do's design. Because MAC/do's
rules are tied to jails, accessing the current rules requires holding
the corresponding jail's lock. As other policies might try to grab the
same jail's lock in the same hooks, it is not possible to keep the
rules' jail's lock between mac_do_setcred_enter() and
mac_do_priv_grant() to ensure that the rules are still alive. We have
thus augmented 'struct rules' with a reference count, and its lifecyle
is now decoupled from being referenced or not by a jail. As a thread
enters mac_cred_setcred_enter(), it grabs a hold on the current rules
and keeps a pointer to them in the per-thread data. In its
mac_do_setcred_exit(), MAC/do just "frees" the per-thread data, in
particular by dropping the referenced rules (we wrote "frees" within
guillemets, as in fact the per-thread structure is reused, and only
freed when a thread exits or the module is unloaded).
Additionally, ensuring that all hooks have a consistent view of the
rules to apply might become crucial if we augment MAC/do with forceful
access denial policies in the future (i.e., policies that forcibly
disable access regardless of other MAC policies wanting to grant that
access). Indeed, without the above-mentioned design, if newly installed
rules start to forcibly deny some specific transitions, and some thread
is past the mac_cred_check_setcred() hook but before the
mac_priv_grant() one, the latter may grant some privileges that should
have been rejected first by the former (depending on the content of
user-supplied rules).
A previous version of this change used to implement access denial
mandated by the '!' and '-' GID flags in mac_do_check_setcred() with the
goal to have this rejection prevail over potential other MAC modules
authorizing the transition. However, this approach had two drawbacks.
First, it was incompatible both conceptually and in the current
implementation with multiple rules being treated as an inclusive
disjunction, where any single rule granting access is enough for MAC/do
to grant access. Explicit denial requested by one matching rule could
prevent another rule from granting access. The implementation could
have been fixed, but the conflation of rules being considered as
disjoint for explicit granting but conjunct for forced denial would have
remained. Second, MAC/do applies only to processes spawned from
a particular executable, and imposing system-wide restrictions on only
these processes is conceptually strange and probably not very useful.
In the end, we moved the implementation of explicit access denial into
mac_do_priv_grant(), along with the interpretation of other target
clauses.
The separate definition of 'struct mac_do_data_header' may seem odd, as
it is only used in 'struct mac_do_setcred_data'. It is a remnant of an
earlier version that was not using setcred(), but rather implemented
hooks for setuid() and setgroups(). We however kept it, as it clearly
separates the machinery to pass data from dedicated system call hooks to
priv_grant() from the actual data that MAC/do needs to monitor a call to
setcred() specifically. It may be useful in the future if we evolve
MAC/do to also grant privileges through other system calls (each seen as
a complete credentials transition on its own).
The target supplementary groups are checked with merge-like algorithms
leveraging the fact that all supplementary groups in credentials
('struct ucred') and in each rule ('struct rule') are sorted, avoiding
to start a binary search for each considered GID which is asymptotically
more costly. All access granting/denial is thus at most linear and in
at most the sum of the number of requested groups, currently held ones
and those contained in the rule, per applicable rule. This should be
enough in all practical cases. There is however still room for more
optimizations, without or with changes in rules' data structures, if the
need ever arises.