Page MenuHomeFreeBSD

Hackaround readdirplus not working on ZFS filesystems with > 1 billion files.
ClosedPublic

Authored by jpaetzel on Dec 31 2016, 10:16 AM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 23 2023, 1:43 AM
Unknown Object (File)
Oct 25 2023, 6:01 AM
Unknown Object (File)
Sep 2 2023, 8:51 AM
Unknown Object (File)
Jul 11 2023, 3:44 AM
Unknown Object (File)
May 17 2023, 7:00 PM
Unknown Object (File)
Apr 26 2023, 3:58 PM
Unknown Object (File)
Apr 9 2023, 6:27 AM
Unknown Object (File)
Jan 12 2023, 10:19 PM
Subscribers

Details

Summary

When a ZFS dataset has > 1 billion files in it NFS readdirplus operations start failing. The real fix would be a 64 bit ino_t, but this patch solves the problem at the expense of a bit more CPU time on the server.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jpaetzel retitled this revision from to Hackaround readdirplus not working on ZFS filesystems with > 1 billion files..
jpaetzel updated this object.
jpaetzel edited the test plan for this revision. (Show Details)

Indeed you might want to make this a sysctl or, even, detect that the vnode type is zfs and only then switch to readdir automatically, for now.

What about defining a mnt_kern_flag bit that a file system sets to say
"inode # wouldn't fit in 32bits"?

I don't know the ZFS code, but if you find where d_fileno is filled in,
it might be easy to check to see if the inode# gets truncated there and
set MNTK_FLAG_64BITINONUM (or whatever you want to call it).

The NFS server readdirplus code could check for this flag and
force use of VOP_LOOKUP() when it is set.

Is there a way to check that at run time or would it just be a mount time option? In our use case a filesystem might start small and grow over time.

Well, I'm no ZFS guy, but I might try the following:
Change line#2530 in zfs_vnops.c to:

if ((objnum & 0xffffffff00000000ull) != 0)
      printf("inode# that won't fit in 32bits\n");

If this prints out when it happens, the printf() could be changed
to setting the mnt_kern_flag bit.
And, yes, I suggested it as a possible way to set this at runtime,
when a file system actually exceeds inode#s that fit in 32bits (not very often).

Btw, it has been a while, but I vaguely recall that "struct dirent64" in the
FreeBSD ZFS code is actually defined as "struct dirent" (which has 32bit ino_t),
just to make things confusing.;-)

Well, I'm no ZFS guy, but I might try the following:
Change line#2530 in zfs_vnops.c to:

if ((objnum & 0xffffffff00000000ull) != 0)
      printf("inode# that won't fit in 32bits\n");

If this prints out when it happens, the printf() could be changed
to setting the mnt_kern_flag bit.
And, yes, I suggested it as a possible way to set this at runtime,
when a file system actually exceeds inode#s that fit in 32bits (not very often).

IMO it is acceptable to set a flag in vfs_conf.vfc_flags for ZFS unconditionally and use it to switch to readdir there. The problem should be fixed right way shortly.

I am not qualified to make the requested changes to this patch, but I know someone who is and I'll ad him to this review.

Oops, I just noted strcmp("zfs") and the is_zfs test around the block in the patch. So this change only affects zfs.

I think this can go in almost as is, just remove the #if/#else braces.

mav edited edge metadata.
This revision is now accepted and ready to land.Jan 1 2017, 3:12 PM
rmacklem edited edge metadata.

Basically looks fine to me.

jpaetzel edited edge metadata.

Changes based on feedback from kib

This revision now requires review to proceed.Jan 2 2017, 12:59 AM
jpaetzel edited edge metadata.

kib, I miunderstood you at first, I think this is what you meant.
I also updated the comment.

kib edited edge metadata.
kib added inline comments.
sys/fs/nfsserver/nfs_nfsdport.c
2021 ↗(On Diff #23523)

For now, ZFS requires VOP_LOOKUP as a workaround, until ino_t is changed to 64bit type, because a ZFS filesystem ....

2024 ↗(On Diff #23523)

No need for {} around 1-line block.

This revision is now accepted and ready to land.Jan 2 2017, 9:37 AM

Thank you. Changes have been made. As soon as make universe completes I'll commit this.