Page MenuHomeFreeBSD

fix a case where the kernel nfsd threads do not terminate
ClosedPublic

Authored by rmacklem on Jul 1 2018, 6:16 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Dec 13, 5:23 AM
Unknown Object (File)
Fri, Nov 29, 4:25 PM
Unknown Object (File)
Oct 21 2024, 3:00 AM
Unknown Object (File)
Oct 18 2024, 6:56 AM
Unknown Object (File)
Sep 25 2024, 3:58 PM
Unknown Object (File)
Sep 21 2024, 10:09 AM
Unknown Object (File)
Sep 8 2024, 6:49 AM
Unknown Object (File)
Sep 8 2024, 6:49 AM
Subscribers

Details

Summary

After terminating the master nfsd process/thread, I have intermittently observed that
the slave nfsd process and threads have not terminated. The master process posts a
SIGKILL to the slave process to make them terminate.
Then a call to cv_wait_sig()/cv_timedwait_sig() returns EINTR/ERESTART, which causes the
thread to call svc_exit(). svc_exit() sets SVCPOOL_CLOSING on all the thread groups and
wakes them up to get them all to terminate.

When this fails to work, "ps axHl" shows:

0 48889     1   0   20  0  5884  812 svcexit  D     -   0:00.01 nfsd: server 
0 48889     1   0   40  0  5884  812 rpcsvc   I     -   0:00.00 nfsd: server

... more of the same

0 48889     1   0   40  0  5884  812 rpcsvc   I     -   0:00.00 nfsd: server 
0 48889     1   0   -8  0  5884  812 rpcsvc   I     -   1:51.78 nfsd: server 
0 48889     1   0   -8  0  5884  812 rpcsvc   I     -   2:27.75 nfsd: server

and the nfsd threads are still working and handling NFS RPCs.
From code inspection, the only way I can see that this can happen is if
the thread called "ismaster" (which is the one created with the process
and not by kthread_start()) has returned from svc_run_internal() without
calling svc_exit().

There is only one place in svc_run_internal() where this can happen.
This patch changes this case so that it will not allow "ismaster" to
return from svc_run_internal() without first calling svc_exit().
This small change appears to be "safe" and should not break the krpc.

Test Plan

I have terminated the nfsd daemon without the problem occurring
quite a few times.
Since the problem is intermittent and I don't know of a way to reliably
reproduce it, I cannot be sure that it is fixed.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable