[2/2] kern: cpuset: properly rebase when attaching to a jail
ClosedPublic
Actions

Authored by kevans on Nov 20 2020, 7:53 PM.

Details

Reviewers

markj
jamie

Commits

rS368011: kern: cpuset: properly rebase when attaching to a jail

Summary

The current logic is a fine choice for a system administrator modifying process cpusets or a process creating a new cpuset(2), but not ideal for processes attaching to a jail.

Currently, when a process attaches to a jail, it does exactly what any other process does and loses any mask it might have applied in the process of doing so because cpuset_setproc() is entirely based around the assumption that non-anonymous cpusets in the process can be replaced with the new parent set.

This approach slightly improves the jail attach integration by modifying cpuset_setproc() callers to indicate if they should rebase their cpuset to the indicated set or not (i.e. cpuset_setproc_update_set).

If we're rebasing and the new root set really is different from the current process's root (perhaps due to a race while we're dropping the proc lock), then allocate an extra cpuset to use as the process's new base set with its parent being the jail's set with the process base mask and domainset applied.

We avoid actually creating a new base set as long as all of the threads in the process are using either an anonymous set or the currently-containing prison's set, indicating that there's really nothing special happening.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kevans created this revision.Nov 20 2020, 7:53 PM

Herald added a subscriber: imp. · View Herald TranscriptNov 20 2020, 7:53 PM

Harbormaster completed remote builds in B34933: Diff 79815.Nov 20 2020, 7:53 PM

kevans requested review of this revision.Nov 20 2020, 7:53 PM

kevans added a parent revision: D27297: [1/2] kern: cpuset: allow cpuset_create() to take an allocated *setp.

Slightly less silly version after increased coffee intake; we don't need to inspect every thread in the process to decide, we just need to inspect the base of the first thread... I have a series of cpuset(2) tests that I'm finishing up that also provide sanity checks for the four scenarios we care about (along with some other non-jail-related):

1.) Process attaches to a jail with its own cpuset+mask (e.g. cpuset -c -l 1,2 jail -c path=/ command=/bin/sh)
2.) Process attaches to a jail with its own cpuset (e.g. cpuset -c jail -c path=/ command=/bin/sh)
3.) Process attaches to a jail with the containing jail's root cpuset (e.g. jail -c path=/ command=/bin/sh)
4.) Process attaches to a jail with the containing jail's root cpuset+mask.

#1 and #2 should yield a new base cpuset separate from the jail cpuset, since they were given a distinct cpuset for whatever reason -- we assume someone may already know they'll want to manipulate it, perhaps.

#3 and #4 should yield no new base cpuset, both take the new jail's cpuset and create the appropriate anonymous cpuset for the #4 case.

The tests also test for the small change in cpuset_setproc_setthread(), and I'll likely commit that fix separately. The consequence of using tdset instead of set as the new parent is that cpuset_getid(CPU_LEVEL_ROOT, CPU_WHICH_TID, -1, ...) could return the thread's previous jail's root set instead of the new jail's root set, as the parent chain leads back through the thread's original set. The reproducing circumstances should be just as easy as set an affinity, attach to the jail, then execute the above.

Harbormaster completed remote builds in B34942: Diff 79827.Nov 21 2020, 6:27 AM

kevans added a child revision: D27307: [3/2] tests: hook up some tests for cpuset(2) and CPUSET(9).Nov 21 2020, 5:29 PM

markj added inline comments.Nov 22 2020, 10:56 PM

sys/kern/kern_cpuset.c
1199 ↗	(On Diff #79827)	[I'm still wrapping my head around cpuset+jails as I'm not very familiar with how they interact and so might be asking some misguided questions.] If we're here because a process is attaching to a jail, `set` is the jail's cpuset. Jail cpusets are always root cpusets, so I'd expect `nroot == set`. Is that right? Then, how can `tdroot != nroot` ever be false in this scenario?

kevans added inline comments.Nov 22 2020, 11:11 PM

sys/kern/kern_cpuset.c
1199 ↗	(On Diff #79827)	Correct, `nroot == set` in that case. It could be the case that `tdroot != nroot` if the process was previously (or becomes, in the case of below where we're dropping the proc lock) assigned a cpuset that descends from the jail's root, for whatever reason. We don't actually forbid the possibility, just that a process inside the jail cannot see sibling or parent cpusets of the jail's root. Now that I say that out loud, I wonder if we shouldn't bail out just after this loop if `rebase && tdroot == nroot`. If the process already had a cpuset inside the jail, I don't necessarily see a reason we should change anything.

kevans added inline comments.Nov 22 2020, 11:12 PM

sys/kern/kern_cpuset.c
1199 ↗	(On Diff #79827)	Sorry, that should read "It could be the case that `tdroot == nroot` ..."

kevans mentioned this in rS367944: cpuset_setproc: use the appropriate parent for new anonymous sets.Nov 23 2020, 2:50 AM

As I had hoped, it took away the expected problem of attaching to a jail (when the process doesn't have its own visible cpuset), and ending up with the process still having its old root cpuset (though under a new anonymous masked bit). That was the real problem I see with the current setup (not that what I didn't see aren't problems as well, but at least there was something I noticed ;-).

But I was kind of thinking that when the process had its own cpuset, it would retain that number. I guess that's not particularly important, since you can find the new cpuset ID from the process, and it would only work if the process was the set's only ref. Maybe I'm just more hung on known numeric IDs than I need to be - a bad habit from jails that I need to lose.

The only thing I don't get is:

If we're rebasing and the new root set really is different from the current process's root (perhaps due to a race while we're dropping the proc lock), then allocate an extra cpuset to use as the process's new base set with its parent being the jail's set with the process base mask and domainset applied.

Isn't that what happens in this case though? It seems a fairly regular and expected thing: you give the process a cpuset for whatever reason you want, and then you attach it to the jail. It doesn't seem like the "perhaps a race" exceptional condition you mention, but then perhaps you're talking there about something else entirely.

Anyway, this seems to be mostly musing to myself about my own misunderstandings. It all works well.

In D27298#610431, @jamie wrote:

But I was kind of thinking that when the process had its own cpuset, it would retain that number. I guess that's not particularly important, since you can find the new cpuset ID from the process, and it would only work if the process was the set's only ref. Maybe I'm just more hung on known numeric IDs than I need to be - a bad habit from jails that I need to lose.

Right, I think we could make it work if it's the set's only ref, but then you run into the problem that you have to check the ID anyways unless you're 110% sure that that really was the only ref.

The only thing I don't get is:

If we're rebasing and the new root set really is different from the current process's root (perhaps due to a race while we're dropping the proc lock), then allocate an extra cpuset to use as the process's new base set with its parent being the jail's set with the process base mask and domainset applied.

Isn't that what happens in this case though? It seems a fairly regular and expected thing: you give the process a cpuset for whatever reason you want, and then you attach it to the jail. It doesn't seem like the "perhaps a race" exceptional condition you mention, but then perhaps you're talking there about something else entirely.

Ah, sorry, I should have noted it somewhere- the race that annoys me is when you're creating a non-persistent jail and you want to alter the cpuset of the process that's creating and attaching to the jail. Right now you have to either alter the process to do cpuset stuff after attaching, or independently cpuset(1) it after it's attached and hope that it hasn't created new processes and a game of whack-a-mole. Alternatively, you can just restrict the whole jail, but that's perhaps also reliant on the process being a long-lived one since you can't specify a mask for the jail up front (which is maybe another neat feature).

Both of the latter two scenarios have the additional problem that you can cpuset the process/jail mask down to whatever you want, but if it uses cpuset to collect the real number of CPUs available to it early on then it will be working with stale information.

jamie accepted this revision.Nov 23 2020, 9:36 PM

This revision is now accepted and ready to land.Nov 23 2020, 9:36 PM

Closed by commit rS368011: kern: cpuset: properly rebase when attaching to a jail (authored by kevans). · Explain WhyNov 25 2020, 3:14 AM

This revision was automatically updated to reflect the committed changes.

kevans added a commit: rS368011: kern: cpuset: properly rebase when attaching to a jail.

kevans mentioned this in rS368322: MFC r367944: cpuset_setproc: use the appropriate parent for new anon. sets.Dec 4 2020, 2:20 AM

jamie mentioned this in D28952: jail: allow root to implicitly widen its cpuset to attach.Feb 26 2021, 11:18 PM