Page MenuHomeFreeBSD

Remove deadlock in rc caused by pwait waiting for itself
ClosedPublic

Authored by lytboris_gmail.com on Jan 19 2021, 6:43 PM.
Tags
Referenced Files
Unknown Object (File)
Wed, Nov 13, 5:11 PM
Unknown Object (File)
Tue, Nov 12, 11:30 AM
Unknown Object (File)
Mon, Nov 11, 5:10 AM
Unknown Object (File)
Mon, Nov 11, 3:44 AM
Unknown Object (File)
Sep 25 2024, 1:24 AM
Unknown Object (File)
Sep 24 2024, 8:00 PM
Unknown Object (File)
Sep 24 2024, 8:00 PM
Unknown Object (File)
Sep 24 2024, 3:14 PM

Details

Summary

Imagine following situation:

  1. Long time ago a_service was started through rc.d
  2. We want to restart a_service and issue
service a_service restart
  1. rc.subr reads current process PID (via file or process), sends TERM signal and runs pwait with PID harvested
  2. a_service process dies very quickly so it's PID becomes available. It is possible that while original process was running PID counter overflowed and pwait can get a_service's PID as it's own PID trapping itself in a dealock.

This patch ignores pid(s) to wait that are equal to pwait PID.

Test Plan

Without patch proposed:

>while true; do pwait 24242 2>/dev/null; done
load: 4.60  cmd: pwait 19906 [running] 0.00r 0.00u 0.00s 11% 2120k
load: 4.60  cmd: zsh 20841 [running] 0.00r 0.00u 0.00s 12% 0k
load: 4.60  cmd: zsh 21720 [running] 0.00r 0.00u 0.00s 14% 0k
load: 4.60  cmd: zsh 6748 [running] 18810.01r 0.69u 6.41s 15% 7152k
load: 4.47  cmd: pwait 22730 [running] 0.00r 0.00u 0.00s 14% 2316k
load: 4.47  cmd: zsh 6748 [running] 18812.13r 0.72u 6.89s 15% 7152k
load: 4.47  cmd: pwait 23691 [running] 0.00r 0.00u 0.00s 16% 2172k
load: 4.47  cmd: pwait 23935 [running] 0.00r 0.00u 0.00s 17% 2172k
load: 4.47  cmd: zsh 24108 [running] 0.00r 0.00u 0.00s 16% 5624k
load: 4.47  cmd: pwait 24242 [kqread] 0.02r 0.00u 0.00s 17% 2300k
load: 4.47  cmd: pwait 24242 [kqread] 0.25r 0.00u 0.00s 17% 2300k
load: 4.47  cmd: pwait 24242 [kqread] 0.66r 0.00u 0.00s 15% 2300k

With patch proposed:

>while true; do ./pwait 64242 2>&1; done | fgrep -v 'No such process'
load: 11.65  cmd: zsh 60713 [running] 0.00r 0.00u 0.00s 12% 6960k
load: 11.65  cmd: zsh 55569 [running] 10.09r 0.17u 2.21s 15% 7152k
load: 11.36  cmd: pwait 62673 [running] 0.00r 0.00u 0.00s 15% 2236k
load: 11.36  cmd: zsh 55569 [running] 12.56r 0.20u 2.77s 16% 7152k
pwait: 64242: skip my own pid
load: 11.36  cmd: zsh 65110 [running] 0.00r 0.00u 0.00s 19% 7148k
load: 11.36  cmd: zsh 55569 [running] 16.07r 0.25u 3.56s 19% 7152k
load: 10.77  cmd: pwait 66246 [running] 0.00r 0.00u 0.00s 19% 2304k
load: 10.77  cmd: zsh 55569 [running] 17.96r 0.31u 3.95s 20% 7152k
load: 10.77  cmd: pwait 67423 [running] 0.00r 0.00u 0.00s 20% 2208k
^C

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

lytboris_gmail.com created this revision.
rpokala added inline comments.
bin/pwait/pwait.1
151
.Pp
To avoid deadlock,
.Nm
will ignore its own pid, if it is provided as a process id to wait for.

Remember to bump .Dd at the top, to reflect the commit date. Other than that, looks good from a manpage standpoint.

This was reported before as https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218598 but I did not know for sure what to do so I left it alone. Since the patch is essentially the same, perhaps we should make this change. The wording of the error message in the patch in the PR seems a bit more formal, so better.

0mp added a project: rc.
0mp added a subscriber: 0mp.

This was reported before as https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218598 but I did not know for sure what to do so I left it alone. Since the patch is essentially the same, perhaps we should make this change. The wording of the error message in the patch in the PR seems a bit more formal, so better.

+1, although what I like about the patch by @lytboris_gmail.com is that it does not introduce a new variable.

This revision is now accepted and ready to land.Jan 20 2021, 1:06 PM

suggestion to improve warning

bin/pwait/pwait.c
150

i would say "skipping" here, it sounds a bit more like an English sentence (well, sentence fragment)

this is because we are doing the skipping, we aren't instructing anyone to do it for us, and we're doing it right now

bin/pwait/pwait.c
150

+1

This revision now requires review to proceed.Jan 21 2021, 6:50 AM
This revision is now accepted and ready to land.Jan 21 2021, 7:41 AM