fix a case where the kernel nfsd threads do not terminate
ClosedPublic
Actions

Authored by rmacklem on Jul 1 2018, 6:16 PM.

Details

Reviewers

kib
mav
dfr

Commits

rS335867: Fix the server side krpc so that the kernel nfsd threads terminate.
rS335866: Fix the server side krpc so that the kernel nfsd threads terminate.

Summary

After terminating the master nfsd process/thread, I have intermittently observed that
the slave nfsd process and threads have not terminated. The master process posts a
SIGKILL to the slave process to make them terminate.
Then a call to cv_wait_sig()/cv_timedwait_sig() returns EINTR/ERESTART, which causes the
thread to call svc_exit(). svc_exit() sets SVCPOOL_CLOSING on all the thread groups and
wakes them up to get them all to terminate.

When this fails to work, "ps axHl" shows:

0 48889     1   0   20  0  5884  812 svcexit  D     -   0:00.01 nfsd: server 
0 48889     1   0   40  0  5884  812 rpcsvc   I     -   0:00.00 nfsd: server

... more of the same

0 48889     1   0   40  0  5884  812 rpcsvc   I     -   0:00.00 nfsd: server 
0 48889     1   0   -8  0  5884  812 rpcsvc   I     -   1:51.78 nfsd: server 
0 48889     1   0   -8  0  5884  812 rpcsvc   I     -   2:27.75 nfsd: server

and the nfsd threads are still working and handling NFS RPCs.
From code inspection, the only way I can see that this can happen is if
the thread called "ismaster" (which is the one created with the process
and not by kthread_start()) has returned from svc_run_internal() without
calling svc_exit().

There is only one place in svc_run_internal() where this can happen.
This patch changes this case so that it will not allow "ismaster" to
return from svc_run_internal() without first calling svc_exit().
This small change appears to be "safe" and should not break the krpc.

Test Plan

I have terminated the nfsd daemon without the problem occurring
quite a few times.
Since the problem is intermittent and I don't know of a way to reliably
reproduce it, I cannot be sure that it is fixed.