When a ZFS dataset has > 1 billion files in it NFS readdirplus operations start failing. The real fix would be a 64 bit ino_t, but this patch solves the problem at the expense of a bit more CPU time on the server.
Details
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
Indeed you might want to make this a sysctl or, even, detect that the vnode type is zfs and only then switch to readdir automatically, for now.
What about defining a mnt_kern_flag bit that a file system sets to say
"inode # wouldn't fit in 32bits"?
I don't know the ZFS code, but if you find where d_fileno is filled in,
it might be easy to check to see if the inode# gets truncated there and
set MNTK_FLAG_64BITINONUM (or whatever you want to call it).
The NFS server readdirplus code could check for this flag and
force use of VOP_LOOKUP() when it is set.
Is there a way to check that at run time or would it just be a mount time option? In our use case a filesystem might start small and grow over time.
Well, I'm no ZFS guy, but I might try the following:
Change line#2530 in zfs_vnops.c to:
if ((objnum & 0xffffffff00000000ull) != 0) printf("inode# that won't fit in 32bits\n");
If this prints out when it happens, the printf() could be changed
to setting the mnt_kern_flag bit.
And, yes, I suggested it as a possible way to set this at runtime,
when a file system actually exceeds inode#s that fit in 32bits (not very often).
Btw, it has been a while, but I vaguely recall that "struct dirent64" in the
FreeBSD ZFS code is actually defined as "struct dirent" (which has 32bit ino_t),
just to make things confusing.;-)
IMO it is acceptable to set a flag in vfs_conf.vfc_flags for ZFS unconditionally and use it to switch to readdir there. The problem should be fixed right way shortly.
I am not qualified to make the requested changes to this patch, but I know someone who is and I'll ad him to this review.
Oops, I just noted strcmp("zfs") and the is_zfs test around the block in the patch. So this change only affects zfs.
I think this can go in almost as is, just remove the #if/#else braces.
kib, I miunderstood you at first, I think this is what you meant.
I also updated the comment.
Thank you. Changes have been made. As soon as make universe completes I'll commit this.