Page MenuHomeFreeBSD

ddb ps: Use the pidhash to enumerate processes not in allproc.
ClosedPublic

Authored by jhb on Dec 29 2020, 11:47 PM.

Details

Summary

Exiting processes that have been removed from allproc but are still
executing are not yet marked PRS_ZOMBIE, so they were not listed (for
example, if a thread panics during exit1()). To detect these
processes, clear p_list.le_prev to NULL explicitly after removing a
process from the allproc list and check for this sentinel rather than
PRS_ZOMBIE when walking the pidhash.

While here, simplify the pidhash walk to use a single outer loop.

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

jhb requested review of this revision.Dec 29 2020, 11:47 PM

Can we leave the allproc list iteration as is, please ? My motivation is that having the list ordered by recent time of creation is actually useful behavior I accustomed to, it is useful when debugging very large dumps.

We can have some flag like P2_REMOVED to indicate that process was removed from the allproc list, and 'do zombies' loop of loops would check the flag instead of PRS_ZOMBIE.

Hmmm, I could do that. Note that for kgdb I've just given up on allproc altogether (including in my kgdb scripts which have their own ps).

I tried using a P2_ALLPROC flag instead, but that was a bit more awkward. With P2_REMOVED_ALLPROC there's still a small race between removing from the allproc list and setting the flag, partly because I didn't want to add a new PROC_LOCK/UNLOCK pair just for this case that only helps ddb's ps. The other option would be change the last loop here to just check for the process in allproc again so the last loop ends up O(n^3), but it's also for a rare case ('ps') in ddb, so maybe it's worth making that slow/more complicated in exchange for keeping the hot path simple.

I think it would be fine to report some process twice in ddb. After all we are interrogating inconsistent structure.

  • Add P2_ALLPROC_REMOVED and use that instead of PRS_ZOMBIE.
sys/kern/kern_exit.c
451

So to report twice, I'd have to move this earlier before the LIST_REMOVE, and then I'd have to add a separate PROC_LOCK/PROC_UNLOCK. The extra locking in the hot path solely for ddb 'ps' is a tradeoff I didn't think was probably worth it.

sys/kern/kern_exit.c
451

What about storing NULL into p->p_list.lh_first right after LIST_REMOVE, and checking it in pidhash iterations ?

sys/kern/kern_exit.c
451

What about storing NULL into p->p_list.lh_first right after LIST_REMOVE, and checking it in pidhash iterations ?

Hmm, that kind of conflicts with the TRASHIT stuff in the queue macros, but that's probably ok so long as we document the weirdness. I could even stick it under #ifdef DDB.

sys/kern/kern_exit.c
451

Then you can store not NULL, but the TRASHIT pattern, and compare with it.

  • Use p_list.le_prev == NULL as the sentinel instead.
jhb marked an inline comment as done.Dec 31 2020, 11:26 PM
jhb added inline comments.
sys/kern/kern_exit.c
451

Then you can store not NULL, but the TRASHIT pattern, and compare with it.

Oh, I went with NULL. I think we want to be explicit (and not depend on queue macros, etc.), and a kernel will panic for NULL just as it will panic for the TRASHIT pattern.

This revision is now accepted and ready to land.Dec 31 2020, 11:40 PM
jhb retitled this revision from Use the pidhash to enumerate all processes, not just zombies. to ddb ps: Use the pidhash to enumerate processes not in allproc..Dec 31 2020, 11:57 PM
jhb edited the summary of this revision. (Show Details)