Page MenuHomeFreeBSD

Add a check for LCL_NEEDSCONFIRM to nfsrv_checkgetattr()
ClosedPublic

Authored by rmacklem on Sep 5 2020, 11:11 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 23 2023, 10:55 AM
Unknown Object (File)
Nov 15 2023, 11:01 AM
Unknown Object (File)
Nov 5 2023, 8:48 PM
Unknown Object (File)
Oct 4 2023, 7:42 PM
Unknown Object (File)
Sep 30 2023, 7:41 PM
Unknown Object (File)
Sep 26 2023, 8:25 PM
Unknown Object (File)
Aug 2 2023, 6:25 PM
Unknown Object (File)
Jul 15 2023, 10:25 PM
Subscribers

Details

Reviewers
asomers
Summary

A panic is described in PR#249127, which appears to have been
caused by nfsrv_checkgetattr() failing to check for and handle
the case where the ClientID is not confirmed.

If the client that holds the delegation were to crash/reboot,
there is a period of time just after rebooting where the
ClientID (SetClientID with same "id", but different "verifier")
will be unconfirmed (LCL_NEEDSCONFIRM set).
If another client did a Getattr for the delegated file,
it would call nfsrv_checkgetattr().

Without this patch, it would bogusly try to do a callback
and that would cause the panic.

Test Plan

Tested with delegations enabled for normal operation.

I do not know an easy way to get a client to do a
SetClientID at just the right time, however the patch
seems reasonable and safe to do.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 33416

Event Timeline

This seems wrong for two reasons:

  • nfsrv_docallback has other callers, too
  • How can we be sure that the previous client isn't still using its delegation?

Maybe instead of ignoring the delegation, we should treat it like an unreachable callback and return EHOSTUNREACH from nfsrv_docallback. Then the caller could handle the error like it does for other unreachable callbacks, by sleeping until the delegation expires. Except, it seems that nfsrv_checkgetattr doesn't do that? Should it?

Also, you mentioned on the mailing list that a client which changes its callback address might also trigger this bug. Do you still agree with that idea?

Well, you are correct, in that other callers may
need similar fixes. (ie. Should never be called if
the ClientID is unconfirmed.)
--> For NFSv4.0, the only two are CBRECALL, to

 recall a delegation. This occurs when a
 conflicting open occurs and the check in
 nfsrv_checkopen() should be sufficient.
CBNULL, which is done during confirmation
to check for a callback path. (Again handled
by the current code, since it is done after the
LCL_NEEDSCONFIRM flag is cleared with the
global exclusive lock held.

--> There are Layout related CBs in NFSv4.1, but

since those use an established connection
with a session, I think they are ok, but I will
take a look.

When the ClientID is in unconfirmed state, there is
no valid state related to it (it just happens to be
there because it isn't cleared out until after the confirm),
so there really is no delegation (just a structure that
makes it look that way).
--> As such, treating it as if no delegation exists is

correct, I think.

(You cannot do a callback without a confirmed ClientID
with a valid callback address. That is why the panic() is
there.)
--> The client doing a SetClientID with a different verifier

is saying "I have no state", so it cannot use old state,
unless it is broken. (If it is broken, all I care about is the
server not panic()'ng.)

If the client is not broken, the change of address should
not make the ClientID unconfirmed (I had not looked
at the code when I made that comment), because the
verifier should be the same.
For the verifier still the same, the ClientID is not set
unconfirmed. (See nfsrv_setclient().)

A simple summary.
Once in this state, the delegation no longer exists.
If the client is still trying to use it, it is broken.

Btw, all the code that panic()d is trying to do is get
up to date size and mtime info for the file, since the
write delegation allows local updates until the
delegation is returned.
--> In this case, it would normally be done for

a Getattr on the file done by a different
client and not the one holding the delegation.
--> The bug was that the delegation was found
       by FH and not stateid, then forgot to check
       for ClientID validity before using the state
       (which in this case is no longer valid).

If it makes a difference, one of the crashed server's clients _did_ recently do a change of address (though I'm unsure of the exact timing, and of course I can't guarantee that the client isn't buggy). But none of the clients rebooted within 4 days of the crash.

Interesting. SetClientID arguments are roughly:
A) - an identifier string (that should never change for a given client)
B) - a verifier that should change whenever the client looses state information

(usually a reboot)

C) - a host IP address for the server to use to do a connection to the

client for callbacks

Now, I haven't looked in a long time, but there was a time when the
Linux client just used its host IP address for (A). Not a great choice
and a really bad one if the client machine's address changes.

When I was talking about address change, I was referring to the
host IP address used to connect to the client for callbacks being
changed by a SetClientID, where (A) and (B) remain unchanged.
--> If (A) or (B) changes, it results in a new ClientID in an unconfirmed

state.

If you can capture packets when one of these client machines
does a mount and look at it in wireshark, you should be able
to find the SetClientID operation and see what it used for
the identifier string.
--> If that changes, it really says "I'm a different client now"

and could have caused this.
--> I would have only expected a short window of time
      between the SetClientID and SetClientIDConfirm,
      but if it did a Getattr on a file in between them
      for some reason, that would do it.

I'll admit I don't know quite what would happen to the
Linux client when the IP address changes. If you can
force that to happen while capturing packets, it would
be interesting to see the packet trace (and see if that
precipitated the panic). Maybe you can replace the panic()
with a printf() and return, so the server doesn't panic?

If the RFC does not allow unconfirmed clients to have delegations, then I think this change is correct. And I can't find any other problems with it.

This revision is now accepted and ready to land.Sep 6 2020, 7:12 PM

This patch has been committed as r365703. It didn't get automatically
closed for some reason.