Fix the NFSv4.1 client for recovery from NFS4ERR_BAD_SESSION server failures
ClosedPublic
Actions

Authored by rmacklem on Dec 10 2016, 2:37 AM.

Details

Reviewers

Commits

rS310491: Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors.

Summary

Testing done by cperciva@ of the NFSv4.1 client against an AmazonEFS server found
several problems during recovery from NFS4ERR_BAD_SESSION failures. Normally
NFS4ERR_BAD_SESSION failures are a rare occurrence for an NFSv4.1 server, but
this service fails frequently in this way.
Briefly, the problems fixed are:

If more than 32 processes were attempting to do RPCs at the time of failure, some could be stuck forever waiting for a session slot on the failed session.
If the reply to an RPC that was successful on the old session just before it failed was processed after the new session was created, it bogusly updated the new session with the slot used by the old session, corrupting it.
Non-state handling RPCs (ones not using ClientIDs or StateIDs) would fail when they got NFS4ERR_BAD_SESSION instead of retrying the RPC with a new session.
Handling of the session list was "racey" and could have failed if the pointer was used just when a new session was being added to the list. This patch protects all use of this TAILQ_LIST by the NFSLOCKMNT() mutex.
RPCs that use ClientIDs/StateIDs no longer initiate recovery, since the code in the RPC handler (newnfs_request()) initiates recovery whenever a NFS4ERR_BAD_SESSION is received.

Test Plan

cperciva@ has been doing extensive testing on several patches leading up to this one.
I am doing tests via simulated failures (manual reboots of the NFSv4.1 server) with the
FreeBSD and Linux servers for both NFSv4.0 and NFSv4.1.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

rmacklem updated this revision to Diff 22794.Dec 10 2016, 2:37 AM

rmacklem retitled this revision from to Fix the NFSv4.1 client for recovery from NFS4ERR_BAD_SESSION server failures.

rmacklem updated this object.

rmacklem edited the test plan for this revision. (Show Details)

rmacklem added a reviewer: cperciva.

Added a small fix so that mkdir won't fail when it gets a NFS4ERR_BAD_SESSION and will loop
to get a new session.

The patch I was testing eliminated a variety of hangs, panics, I/O errors, and suspicious error messages, without introducing any new problems; but it wasn't exactly the same as this patch here. Not sure if the differences are significant...

fs/nfs/nfs_commonsubs.c
830	I committed the 'fileid > 32bits' printf changes in r308708, so I'm not quite sure what they're doing here.
fs/nfsclient/nfs_clstate.c
2500–2516	This bit is also different from the patch I tested; again, I'm not sure about the significance.
fs/nfsclient/nfs_clvfsops.c
1323	This bit wasn't in the patch I tested. Did it slip in by accident? I have no idea what it does.

I have no idea how to do inline comments, but responding to cperciva@'s three comments:
#1 and #3 are code already in head. (I mistakenly did the diff against old code without it.)
#2 is for Data Servers (only pNFS) which cperciva@ isn;t using.
--> So, for the purposes of cperciva@'s testing, it doesn't matter.

I'll try and update the patch so that #1 and #3 aren't there.

Redid the diff so that no code already in head is in it. (Basically cpreciva@'s 1st and 3rd items.)

Closed by commit rS310491: Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors. (authored by rmacklem). · Explain WhyDec 23 2016, 11:15 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

Path

Size

fs/

nfs/

	nfs.h
	nfs.h.sav2

3 lines

	nfs_commonkrpc.c
	nfs_commonkrpc.c.sav

113 lines

	nfs_commonport.c
	nfs_commonport.c.sav2

3 lines

	nfs_commonsubs.c
	nfs_commonsubs.c.sav

158 lines

	nfsclstate.h
	nfsclstate.h.sav

1 line

nfsclient/

	nfs_clcomsubs.c
	nfs_clcomsubs.c.sav

11 lines

	nfs_clport.c
	nfs_clport.c.sav2

4 lines

	nfs_clrpcops.c
	nfs_clrpcops.c.sav

166 lines

	nfs_clstate.c
	nfs_clstate.c.sav

70 lines

	nfs_clvfsops.c
	nfs_clvfsops.c.sav

1 line

	nfsmount.h
	nfsmount.h.sav

13 lines

Diff 22794

View Options

fs/nfs/nfs.h

View Options

fs/nfs/nfs_commonkrpc.c

View Options

fs/nfs/nfs_commonport.c

View Options

fs/nfs/nfs_commonsubs.c

View Options

fs/nfs/nfsclstate.h

View Options

fs/nfsclient/nfs_clcomsubs.c

View Options

fs/nfsclient/nfs_clport.c

View Options

fs/nfsclient/nfs_clrpcops.c

View Options

fs/nfsclient/nfs_clstate.c

View Options

fs/nfsclient/nfs_clvfsops.c

View Options

Fix the NFSv4.1 client for recovery from NFS4ERR_BAD_SESSION server failuresClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 22794

fs/nfs/nfs.h

fs/nfs/nfs_commonkrpc.c

fs/nfs/nfs_commonport.c

fs/nfs/nfs_commonsubs.c

fs/nfs/nfsclstate.h

fs/nfsclient/nfs_clcomsubs.c

fs/nfsclient/nfs_clport.c

fs/nfsclient/nfs_clrpcops.c

fs/nfsclient/nfs_clstate.c

fs/nfsclient/nfs_clvfsops.c

fs/nfsclient/nfsmount.h

Fix the NFSv4.1 client for recovery from NFS4ERR_BAD_SESSION server failures
ClosedPublic
Actions

Revision Contents
Changeset List