- Filter out PRS_NEW procs as rufetch() tries taking the thread lock which may not yet be initialized.
- Hold PROC_LOCK to ensure stability of iterating the threads.
- p_rux fields are protected by the process statlock as well.
Sponsored by: Dell EMC
More details
This bug
proc0_post iterates FOREACH_PROC_IN_SYSTEM and then calls rufetch(p) which does FOREACH_THREAD_IN_PROC before taking thread lock.
This page faults of the thread lock is not yet initialized for PRS_NEW procs that are in dofork(). In the case I hit it was calling fdcopy() long before the sched_fork() call to initialize the thread ptrs.
None of this code is holding the PROC_LOCK or PROC_SLOCK.
The typical pattern I've seen for dealing with this is to simply filter out PRS_NEW procs but my proposed patch feels very incomplete as rufetch() still has no care about whether the process is PRS_NEW.
rS275121 changed from using the proc slock around the rufetch() call in proc0_post to the proc statlock. It may be enough to use the slock again in rufetch and filter out PRS_NEW procs there but I haven't analyzed it deeply yet.
more proc0_post issues?
The code is still racy in that microuptime(&p2->p_stats->p_start); is called for new forking processes but proc0_post may come along and trash that with a new value, as well as clearing all of its other stats.