Page MenuHomeFreeBSD

Allow jail cpuset masks to expand
AbandonedPublic

Authored by pizzamig on Oct 3 2019, 9:04 PM.
Tags
None
Referenced Files
F106050972: D21890.id62905.diff
Tue, Dec 24, 1:06 PM
Unknown Object (File)
Fri, Dec 13, 4:49 PM
Unknown Object (File)
Mon, Dec 2, 9:27 AM
Unknown Object (File)
Mon, Dec 2, 9:27 AM
Unknown Object (File)
Mon, Dec 2, 9:26 AM
Unknown Object (File)
Mon, Dec 2, 9:07 AM
Unknown Object (File)
Sun, Dec 1, 5:06 PM
Unknown Object (File)
Nov 23 2024, 8:34 PM
Subscribers

Details

Reviewers
jhb
jeff
Summary

This patch aim to solve the bug reported at https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=240687

As phrased in the PR:
It will inherit the default cpuset from the parent jail. However, once you've shrunk the set, you can never expand it. The reason is that the jail set is its own root, so the check against the 'root' mask in cpuset_modify() fails with EINVAL. I think this is perhaps not the intended behavior. I think that when setting the cpuset of a jail you want to apply the check against the parent jail's mask, not the jail's own mask.

Test Plan

Create a jail, shrink its cpuset and extend it back:

# jail -c name=foo command=/bin/sh
# cpuset -g -j 1
jail 1 mask: 0, 1, 2, 3
# cpuset -g -j 1 -r
jail 1 mask: 0, 1, 2, 3
# cpuset -j 1 -l 0-1
# cpuset -g -j 1
jail 1 mask: 0, 1
# cpuset -j 1 -l 0-3
# cpuset -g -j 1
jail 1 mask: 0, 1, 2, 3

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 26864
Build 25183: arc lint + arc unit

Event Timeline

sys/kern/kern_cpuset.c
162–163

I would perhaps code this as a do-while loop, though I suppose you have to always check set->cs_parent against NULL. I think you can simplify this to remove the CPU_SET_ROOT check though:

if (set->cs_parent != NULL)
     set = set->cs_parent;
while ((set->cs_flags & CPU_SET_ROOT) == 0 && set->cs_parent != NULL)
     set = set->cs_parent;

or perhaps alternatively:

 if (set->cs_parent == NULL)
     return (set);
 do {
     set = set->cs_parent;
} while ((set->cs_flags & CPU_SET_ROOT) == 0 && set->cs_parent != NULL);

Not sure which of the various variants is easier to read. I'll defer to Jeff.

sys/kern/kern_cpuset.c
162–163

IMHO, the first solution is more readable.
We'll wait for Jeff to decide

  • Simplifying the check. adopting a do-while loop

Any news on this patch?
From the PR, some users noticed that this is a regression introduced in FreeBSD 12. I check on FreeBSD 11.X and jails can expand their cpu set, on FreeBSD 12.X and CURRENT is not possible. It would be great to include this fix before FreeBSD 12.1 is released

kib added inline comments.
sys/kern/kern_cpuset.c
162–163

Please update the patch with one of the variants suggested by John. For me the second snippet looks cleaner.

In fact I am not sure about the patch. It probably fixes the wrong spot, for jails CPU_SET_ROOT is intended to mark the root cpuset which must be not expandable, so this part of the code works as intended. What is wrong is that either root cpuset is shrunk directly (instead a copy of it should be created below and clamped), or this should be explained as the behavior change and users should create a child manually.

May be jail creation should do this automatically, installing root set which is additionally marked read-only and then create a child with the same set of cpus.

The primary negative consequence of this patch is that it inadvertently exposes information about the parent jail from within the jail.

$ jail -c path=/ command=cpuset -gi
6
$ jail -c path=/ command=cpuset -gir
0

This tells you the minimum nesting of your jail (-r == 0 implying "direct child of prison0", -r > 0 implying "further down"), which isn't stellar, but also makes it harder to identify the root set of the jail you're in. You can cpuset_getid(CPU_LEVEL_ROOT, CPU_WHICH_TID, -1, ...) to get the root, but with this patch you have to first confirm that you're not using the root with cpuset(2) and running the previous cpuset_getid(2) again, probably in conjunction with some fork() magic.

That said, I think this would be fine for now:

diff --git a/sys/kern/kern_cpuset.c b/sys/kern/kern_cpuset.c
index 6130dd32ed4..deae44968f4 100644
--- a/sys/kern/kern_cpuset.c
+++ b/sys/kern/kern_cpuset.c
@@ -686,8 +686,16 @@ cpuset_modify(struct cpuset *set, cpuset_t *mask)
         * Verify that we have access to this set of
         * cpus.
         */
-       root = cpuset_getroot(set);
        mtx_lock_spin(&cpuset_lock);
+       if ((set->cs_flags & (CPU_SET_ROOT | CPU_SET_RDONLY)) == CPU_SET_ROOT &&
+           set != cpuset_getroot(curthread->td_cpuset)) {
+               KASSERT(set->cs_parent != NULL,
+                   ("jail.cpuset=%d is not a proper child of parent jail's root.",
+                   set->cs_id));
+               root = cpuset_getroot(set->cs_parent);
+       } else {
+               root = cpuset_getroot(set);
+       }
        if (root && !CPU_SUBSET(&root->cs_mask, mask)) {
                error = EINVAL;
                goto out;

This should let you modify the jail's root cpuset however you want if you're outside of the jail, while only allowing you to further limit the mask if you're inside of the jail. I think this is closer to the semantics we want; if we hung a new cpuset off like we do in prison0 and make the root immutable, then we have to add some complexity to make it possible to specify the root cpuset of the jail upon creation and the administrator has no way of limiting the jail further later because the jail would be able to change the mutable child of the root freely.

My patch is also wrong, because the process could've been manually set to a cpuset hanging off the jail root set. Better to just check the thread's prison.

Edit #2: I missed the jailed() check already above, so that can be striked entirely and just check that it's not a read-only root (mostly so we can assert that it's properly connected to the parent, rdonly would get caught later anyways).

pizzamig marked an inline comment as not done.

This patch has been superseded by https://reviews.freebsd.org/D27352 that fixes the issue.
I'll abandon this revision