Page MenuHomeFreeBSD

fix va_fsid in the NFSv4 client for a tree of server file systems
ClosedPublic

Authored by rmacklem on Sun, Jun 6, 2:40 PM.

Details

Summary

Pre-r318997 the code looked like:

                if (vp->v_mount->mnt_stat.f_fsid.val[0] !=
	               (uint32_t)np->n_vattr.na_filesid[0])
                       vap->va_fsid = (uint32_t)np->n_vattr.na_filesid[0];

Doing this assignment got lost by r318997 and, as such, NFSv4 mounts
of servers with trees of file systems on the server is broken, due to duplicate
fileno values for the same st_dev/va_fsid.

Although I could have re-introduced the assignment, since the value of
na_filesid[0] is not guaranteed to be unique across the server file systems,
I felt it was better to always do the hash for na_filesid[0,1].
Since dev_t (st_dev/va_fsid) is now 64bits, I switched to a 64bit hash.

Is there an obviously better way to attempt to create a unique 64bit value
from an array of 2 64bit values?

It is surprising that there have not been bug reports for this.
I noticed it in recent testing.

Test Plan

Built a tree of UFS file systems on the server and then did
"ls -R" on the mount and no directory cycles were reported.

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

rmacklem created this revision.

Hm, but then there is no guarantee that the value of the hash does not match fsid of some existing fs. For instance, UFS does the following:

	mp->mnt_stat.f_fsid.val[0] = fs->fs_id[0];
	mp->mnt_stat.f_fsid.val[1] = fs->fs_id[1];
...
	if (fs->fs_id[0] == 0 || fs->fs_id[1] == 0 ||
	    (nmp = vfs_getvfs(&mp->mnt_stat.f_fsid))) {
		if (nmp)
			vfs_rel(nmp);
		vfs_getnewfsid(mp);
	}

Am I right that we cannot use vfs_getnewfsid() directly because we must ensure that va_fsid's are same for all files with the same na_filesid's?
Might be, we can use some translation table from na_filesid's to generated fsids?

sys/fs/nfsclient/nfs_clport.c
510

nm_fsid

Everything in your comment is correct.

The problem with a na_filesid->va_fsid mapping table
(or linked list tree or ..) is scale.
Peer Errikson runs a FreeBSD NFSv4 server with over 80,000
file systems on it and several others with over 20000 file
systems. He uses Linux clients, so he doesn't see this problem.
However, if he runs file servers with 20000+ file systems, others
will too, sooner or later.

The only thing that breaks when there is a collision (two na_filesid[]s
hash to same va_fsid) is fts(3) and its use by commands like "ls -R"
when done above the server mount points.

I suspect most servers are shallow, wide directory trees with file
systems mounted near the top. Something like:
/export/fs0, /export/fs1,..., /export/fs10000
and only an fts(3) at /export will run into trouble.

Right now, it's always broken and has been for four years, with
few if anyone noticing, so I don't think avoiding a rare hash collision
is worth all the malloc'd storage and search overhead it would take to
manage a 20000+ element na_filesid->va_fsid map.

Unfortunately, all a client knows is that the na_filesid[2] array defines
a unique value for the fsid, but it's 128bits long.
(If all servers were FreeBSD, the client could be more clever but...)

rmacklem added inline comments.
sys/fs/nfsclient/nfs_clport.c
510

Thanks, I'll fix this, but won't bother reloading the patch here.

I just did a little experiment where I had the server file systems
mounted in a shallow tree (side by side under the exported root)
and where all these server file systems (other than the root) were
assigned the same va_fsid.

When I ran "ls -R" it worked ok.
--> It appears fts(3) only notices a duplicate [st_dev, st_ino]

and reports a directory cycle if the file systems that are
assigned the same va_fsid/st_dev are mounted vertically
(one above the other) in the server's tree.

I suspect this is not a common server file system layout
and that might explain why there have not been bug reports
related to this?

I also suggests that having a hash collision (two different
na_filesid[2] tuples map to the same va_fsid) won't cause
many grief.
This revision is now accepted and ready to land.Mon, Jun 7, 5:42 PM