Page MenuHomeFreeBSD

gss_impl.c: Fix a nfsd hang when kgssapi.ko is loaded, but gssd not running
Needs ReviewPublic

Authored by rmacklem on Thu, Jun 4, 9:07 PM.
Tags
None
Referenced Files
F159193778: D57455.diff
Thu, Jun 11, 4:18 AM
F159169375: D57455.diff
Wed, Jun 10, 9:24 PM
F159169362: D57455.diff
Wed, Jun 10, 9:23 PM
Unknown Object (File)
Tue, Jun 9, 3:47 PM
Subscribers

Details

Reviewers
glebius
Summary

After the conversion to using netlink, the kgssapi had
no way of knowing if the gssd daemon was running.
As such, a boot where the kgssapi is loaded, but the
gssd is not enabled would hang the nfsd for a very
long time. (Many timeouts at 300sec each.)

This patch adds a Null RPC upcall with a 200msec
timeout to check to see if the gssd is running.
If the gssd is not running, the nfsd starts up
(without Kerberos support) with only a 200msec
delay.)

Test Plan

Tested with the kgssapi.ko loaded for boots with
both gssd_enable="YES" and gssd_enable="NO"
in /etc/rc.conf.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

Sorry, can't understand why do we need a second CLIENT * for this NULL RPC.

Sorry, can't understand why do we need a second CLIENT * for this NULL RPC.

Because the first one is created at module load time
and I wanted to do the NULL RPC when the first
real upcall is needed.
(There is no guarantee that the gssd daemon will be started
right after kgssapi.ko is loaded. Yes, the gssd can load it at
startup, but the kgssapi.ko can be loaded early, via /boot/loader.conf
or compiled into the kernel.)

But why do you need a second pointer? You may have a static bool that marks the kgss_gssd_handle has been successfully checked with a NULL RPC. I would actually suggest not to store a second pointer neither a bool. Make kgss_gssd_client() always return a checked client.

I think this change also creates a potential process level race with gssd. The rc(8) can run nfsd before gssd instantiated the RPC server. Together with this change in the gssd the daemon(3) call needs to be moved down after svc_nl_create() and svc_reg().

But why do you need a second pointer? You may have a static bool that marks the kgss_gssd_handle has been successfully checked with a NULL RPC. I would actually suggest not to store a second pointer neither a bool. Make kgss_gssd_client() always return a checked client.

I don't think that we want to do a NULL RPC for every request?
(That would double the # of upcalls.)

If you don't like a second global kgss_gssd_handle, I suppose it
can be a "static bool" local to kss_gssd_clien(). (I don't see any
real difference, they're both just one static variable, but??)

I think this change also creates a potential process level race with gssd. The rc(8) can run nfsd before gssd instantiated the RPC server. Together with this change in the gssd the daemon(3) call needs to be moved down after svc_nl_create() and svc_reg().

/etc/rc.d/nfsd specifies gssd as REQUIRED, so the order
should always be /etc/rc.d/gssd before /etc/rc.d/nfsd.
So, I don't follow what your race is?

I think this change also creates a potential process level race with gssd. The rc(8) can run nfsd before gssd instantiated the RPC server. Together with this change in the gssd the daemon(3) call needs to be moved down after svc_nl_create() and svc_reg().

/etc/rc.d/nfsd specifies gssd as REQUIRED, so the order
should always be /etc/rc.d/gssd before /etc/rc.d/nfsd.
So, I don't follow what your race is?

The race is that gssd(8) calls daemon(3) and control returns to rc(8). Then rc(8) would call nfsd, but gssd(8) hasn't yet completed svc_nl_create() and svc_reg().

I don't think that we want to do a NULL RPC for every request?
(That would double the # of upcalls.)

My bad, I didn't realize kgss_gssd_client() is called for every request.

If you don't like a second global kgss_gssd_handle, I suppose it
can be a "static bool" local to kss_gssd_clien(). (I don't see any
real difference, they're both just one static variable, but??)

I think two pointers that point at the same address are a bit misleading.

What about other users of kgss_gssd_handle that use it bypassing kgss_gssd_client()? I mean kgss_transfer_context() and those functions in kgssapi that call kgss_transfer_context().

What about make it other way around:

  1. On module load set timeout/retries very low.
  2. In kgss_gssd_client() have static bool gssd_replied = false. As long as it is false, perform the null RPC. When null RPC succeeds, set the client timeout/retries to some normal value and set the bool to true.

Also, a cool addition on top of that would be to create some call back on the normal timeout/retries exceeding maximum and switch back to the init mode. That will cover a condition when gssd daemon died.