Page MenuHomeFreeBSD

pNFSd: Add a directory of newly created files for the pNFSd
Needs ReviewPublic

Authored by rmacklem on Fri, Jun 12, 9:25 PM.
Tags
None
Referenced Files
F159477935: D57554.diff
Sun, Jun 14, 4:30 PM
F159458390: D57554.diff
Sun, Jun 14, 10:04 AM
Unknown Object (File)
Sat, Jun 13, 4:14 PM
Subscribers

Details

Reviewers
kib
markj
Summary

When an NFSv4.1/4.2 server is configured as a pNFS server,
new file creation (via Open/Create from an NFS client) is
slow, due to the fact that the NFS server (MDS) must do
RPCs against the DS(s).

This patch precreates files in a directory called ".pnfshide/numfiles",
so that the NFS server can just rename them for the Open/Create.
A kernel process called a "replenisher" creates more files in
".pnfshide/numfiles" as required.

This is a fairly big patch, so I understand if you do not want to
review it, but it does do a bunch of VFS/VOP calls, so a review
of those would be nice.

It requires D57553.

This patch only affects the pNFS server and only if the directory
.pnfshide/numfiles exists in the exported file system.

Test Plan

Tested against a FreeBSD client and a Linux client
that supports pNFS.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

sys/fs/nfsserver/nfs_nfsdport.c
1328

You do not need a lock to read a single pointer. More, the result is no longer correct after the unlock, so there is even less reason to take a lock there.

2131

Should this and other lookups use NOCROSSMOUNT?

2157

Why specifying LOCKLEAF if you unlock the result right away?

BTW why not use namei() instead of VOP_LOOKUP() everywhere? Is this because there is no p_pd for the kernel thread? It probably can be initialized.

2210

No reason to take mnt_explock, the result of the if check is invalidated right after the unlock.

Thanks for the comments.

Once I hear from you w.r.t. what/if anything is needed
to get the "most up to date value" for a variable, I will
update the patch.

Thanks, rick

sys/fs/nfsserver/nfs_nfsdport.c
1328

Long ago (around FreeBSD7) alc@ told me that
to ensure you get the most up to date value for
a global variable, that some mutex (not necessarily
the one used to update that variable) must be
acquired.
--> The case I am concerned about is when
another thread (presumably running on a
different core) updates the variable just
before this read of the variable.

So, what is now needed (if anything) to guarantee
the variable read returns the most up to date value.

Btw, this list will never go from non-empty to empty.
That only happens when the nfsd is shutting down and,
since this code is always executed by an nfsd thread,
that doesn't happen.
--> The case I was concerned about was when the nfsd
has just been started and one thread on one core
has created the list, then this thread on this core checks
it.
(It is also just an optimization. If it sets try_pnfs true when
it should be false, the code will handle that later.)

2131

Yes. I'll do that when I update the patch.

2157

I'll need to take a look w.r.t. LOCKLEAF. I wrote
this code about 6months ago. I might have
wanted to acquire/unlock the vnode in this case
or it might be that I don't need LOCKLEAF.

I use VOP_LOOKUP() (and VOP_LINK()/VOP_REMOVE()
instead of VOP_RENAME() because I want to keep the
directory vnode locked. An exclusive lock on the driectory
is how I serialize creation and use of individual files in the
directory.

2210

Same story as above w.r.t. getting the most
up to date value. Since this process has a
reference count on the netexport structure
it is safe after mp->mnt_export is deleted,
but the sooner the process notices and exits,
the better. (So, yes, it could change as soon as
the mnt_explock is released, but it will be noticed
the next loop iteration.)

Once I've put it through some testing, I'll update the patch.

Thanks for the comments, rick

sys/fs/nfsserver/nfs_nfsdport.c
1328

Since you didn't suggest anything is needed to
get the "up-to-date" value for the variable,
I'll remove the locking for the next version.

2131

I've added it, but as far as I can see, NOCROSSMOUNT
is never checked by the VOP_LOOKUP()s.

2157

I checked and, at least for UFS and ZFS,
VOP_LOOKUP() always returns a locked vnode.
(The setting of LOCKLEAF is not checked.)

I did realize that LK_EXCLUSIVE is not needed
(I did it in an early rendition to serial startup,
but that is no longer the case), so I've changed
it to LK_SHARED for the next version.

2210

I've removed the locking, as you suggested.