Page MenuHomeFreeBSD

kern: cpuset: allow jails to modify child jails' roots
ClosedPublic

Authored by kevans on Nov 24 2020, 12:22 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Dec 11, 8:15 AM
Unknown Object (File)
Mon, Dec 2, 9:07 AM
Unknown Object (File)
Mon, Dec 2, 9:07 AM
Unknown Object (File)
Mon, Dec 2, 9:06 AM
Unknown Object (File)
Mon, Dec 2, 9:06 AM
Unknown Object (File)
Mon, Dec 2, 8:46 AM
Unknown Object (File)
Oct 7 2024, 9:32 AM
Unknown Object (File)
Oct 6 2024, 6:44 AM
Subscribers

Details

Summary

This partially lifts a restriction imposed by rS191639 ("Prevent a superuser inside a jail from modifying the dedicated root cpuset of that jail") that's perhaps beneficial after rS192895 ("Add hierarchical jails."). Jails still cannot modify their own cpuset, but they can modify child jails' roots to further restrict them or widen them back to the modifying jails' own mask.

As a side effect of this, the system root may once again widen the mask of jails as long as they're still using a subset of the parent jails' mask. This was previously prevented by the fact that cpuset_getroot of a root set will return that root, rather than the root's parent -- cpuset_modify uses cpuset_getroot since it was introduced in rS327895, previously it was just validating against set->cs_parent which allowed the system root to widen jail masks.

I think the implementation's nearly trivial and accurate for the description, but I'd like a sanity check that this is a reasonable thing to do and within the spirit of hierarchical jails.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kevans added inline comments.
sys/kern/kern_cpuset.c
692 ↗(On Diff #79931)

Er, I guess this could move back outside of the lock.

Fix root grabbing for top-level jails, explain why we can't use getroot, and move back outside of the spinlock to reduce churn.

Well it certainly works as expected, but then you knew that :-).

I do wonder what the point was of hanging top-level jail cpusets off of set 0 instead of set 1. But for that matter, I'm not well versed enough in cpuset lore to understand the point of sets 0 and 1 anyway. I suspect I made it that way because of that lack of understanding, and it should have always been set 1.

This revision is now accepted and ready to land.Dec 14 2020, 12:45 AM

Well it certainly works as expected, but then you knew that :-).

:-D

I do wonder what the point was of hanging top-level jail cpusets off of set 0 instead of set 1. But for that matter, I'm not well versed enough in cpuset lore to understand the point of sets 0 and 1 anyway. I suspect I made it that way because of that lack of understanding, and it should have always been set 1.

Yeah, this is where it gets kinda weird -- they are actually parented to set 1, but set 1 isn't a root. I think the only decision that you could have made differently would have been to mark both 0 and 1 as CPU_SET_ROOT so that prison0 is consistent with all other prisons.

I'd /almost/ go as far as to say "should have," but I don't think it was (or is) all that clear that that's correct. Looking at the context in which the root is used, though, it might be a good idea to consider it now:

  • cpuset_modify{,_domain} currently validates against the system set rather than the prison0 set for non-jailed processes
  • cpuset(2) creates a new set based on the root, which should probably be prison0's root for non-jailed processes rather than the current cpuset0
  • cpuset_getid(2) just uses it to answer the CPU_LEVEL_ROOT inquiry
  • cpuset_{get,set}affinity(2) ditto
  • cpuset_{get,set}domain(2) ditto

The comment above cpuset_thread0 has this to say:

* 1 - The default set which all processes are a member of until changed.
*     This allows an administrator to move all threads off of given cpus to
*     dedicate them to high priority tasks or save power etc.

I think from that POV, we *do* want set 1 to be marked as a root so that cpuset(2) will DTRT. Right now you can restrict cpuset 1 and affect all userland processes except for those that have executed or will execute cpuset(2), since they'll get one inheriting from set0 instead.

That last bit was a misinterpretation on my part -- this is the way it's always worked.