Changeset View
Standalone View
sys/kern/kern_fork.c
Show First 20 Lines • Show All 705 Lines • ▼ Show 20 Lines | #endif | ||||
PROC_UNLOCK(p2); | PROC_UNLOCK(p2); | ||||
/* | /* | ||||
* Tell any interested parties about the new process. | * Tell any interested parties about the new process. | ||||
*/ | */ | ||||
knote_fork(p1->p_klist, p2->p_pid); | knote_fork(p1->p_klist, p2->p_pid); | ||||
/* | /* | ||||
* See if the containing prison died while the process was still new. | |||||
*/ | |||||
if (!prison_isalive(p2->p_ucred->cr_prison)) { | |||||
/* Folow the prison into death. */ | |||||
PROC_LOCK(p2); | |||||
mjg: This adds avoidable overhead. Combined with prison_proc_hold the lock is taken and unlocked… | |||||
Done Inline Actions
So something like a combination of refcount(9) (which is atomic-based and not per-cpu) and counter(9) (which has a caveat that a fetched value is "not guaranteed to reflect the real cumulative value for any moment"). Ideally both increment and "decrement and test for zero" would be atomic-less in the common case. Is there such a system currently in the kernel? It seems most things either use refcount(9) or mtx-protected non-atomic counters like jail does. jamie: > This will also open a way to move pr_ref to a per-cpu scheme and avoid taking the prison… | |||||
Not Done Inline ActionsThe kernel does not have a self-contained routine which does the trick, but I would argue it's not needed here. General idea is to take the per-cpu lock for reading, verify per-cpu operation is still enabled and then modify the per-cpu state in whatever manner applicable. If it's not allowed, you fallback to mutex-protected case. Then you would have both per-cpu and a global counter, all of which can be modified in arbitrary manners in per-cpu mode. But if the mode gets disabled, you sum up the per-cpu state with the global counter and store it there. From that point on the only legal modification is to the global counter which provides an exact value. I think moving refcounts around is not necessary for the purpose of this review though. mjg: The kernel does not have a self-contained routine which does the trick, but I would argue it's… | |||||
Done Inline Actions
This has a deadlock problem. allproc_lock has higher precedence than allprison_lock - I think for a number of reasons that predate my time here, but for certain that's built in to the racct system, which has many places that lock wrap allprison_lock around allproc_lock. And allprison_lock needs higher precedence than this pr_forklock, since I would need it inside e.g. prison_find to avoid invalid/new prisons. A sample deadlock is: A: fork / do_fork 1: A rlock pr_forklock So I could still use this lock around pr_state, which I would check inside do_fork as I do now, which would still save a pt_mtx lock. But the race case would have to be creating and killing the new process rather than avoiding it with EAGAIN. I still like the rm lock idea for it, not just in fork, but because I think I may be able to avoid pr_mtx in prison_find and other such allprison loops (after accounting for pr_ref which I discovered I need to move to lockless for other reasons). But does it still need to be a sleepable lock? Or should it be even? You mentioned the lack of atomics as a selling point, but a quick read of regular _rm_rlock suggests its common case is also no-atomic (though I don't know just how common that common case is). jamie: > pseudo-code wise it would be like this:
>
> ```
> int
> prison_fork_enter(struct prison *pr)… | |||||
Done Inline ActionsIt could be that by "higher precedence" I mean "lower precedence" but hopefully you get the idea. jamie: It could be that by "higher precedence" I mean "lower precedence" but hopefully you get the… | |||||
Not Done Inline ActionsThe deadlock at hand should not be present. Worst case, if code flow really induces it, you can play tricks like setting the flag at some point, dropping the locks, then rms_wlock/rms_wunlock to synchronize against fork. mjg: The deadlock at hand should not be present. Worst case, if code flow really induces it, you can… | |||||
Not Done Inline ActionsI have to stress rms_rlock and runlock don't do any atomics. which is the crux here. mjg: I have to stress rms_rlock and runlock don't do any atomics. which is the crux here. | |||||
Not Done Inline ActionsI'm not familiar with read mostly-sleepable locks; I'll read up on them, to make sure I at least begin to understand. jamie: I'm not familiar with read mostly-sleepable locks; I'll read up on them, to make sure I at… | |||||
Done Inline ActionsTrue, the rms lock would remove one mtx overhead from do_fork. It would add a certain amount of complexity to kern_jail_set and prison_deref though (the places that set pr_state), because they will have to juggle the sleepable lock with the prison mutex. Still, fork is so much a more common code path that the jail lifecycle, that it makes sense to keep that path clean. jamie: True, the rms lock would remove one mtx overhead from do_fork. It would add a certain amount… | |||||
kern_psignal(p2, SIGKILL); | |||||
PROC_UNLOCK(p2); | |||||
} | |||||
/* | |||||
* Now can be swapped. | * Now can be swapped. | ||||
*/ | */ | ||||
_PRELE(p1); | _PRELE(p1); | ||||
PROC_UNLOCK(p1); | PROC_UNLOCK(p1); | ||||
SDT_PROBE3(proc, , , create, p2, p1, fr->fr_flags); | SDT_PROBE3(proc, , , create, p2, p1, fr->fr_flags); | ||||
if (fr->fr_flags & RFPROCDESC) { | if (fr->fr_flags & RFPROCDESC) { | ||||
procdesc_finit(p2->p_procdesc, fp_procdesc); | procdesc_finit(p2->p_procdesc, fp_procdesc); | ||||
▲ Show 20 Lines • Show All 414 Lines • Show Last 20 Lines |
This adds avoidable overhead. Combined with prison_proc_hold the lock is taken and unlocked twice.
Instead you can add a read mostly-sleepable lock to protect against fork. It would scale perfectly in the common case and only be a burden when someone is racing killing the jail.
pseudo-code wise it would be like this:
fork:
then when you switch the state, you rms_wlock() around it.
As a result you have an invariant that by the time you get the lock for writing, any pending forks in the jail wait and anyone who completed managed to finish and can get a signal when you kill it.
This will also open a way to move pr_ref to a per-cpu scheme and avoid taking the prison mutex altogether.