reimplement zfsctl (.zfs) support
ClosedPublic

Authored by avg on Aug 5 2016, 8:09 AM.

Details

Summary

The current code is written on top of GFS, a library with the generic
support for writing filesystems, which was ported from illumos.
Because of significant differences between illumos VFS and FreeBSD
VFS models, both the GFS and zfsctl code were heavily modified to
work on FreeBSD. Nonetheless, they still contain quite a few ugly
hacks and bugs.

This is a reimplementation of the zfsctl code where the VFS-specific
bits are written from scratch and only the code that interacts with
the rest of ZFS is reused.

Some highlights.

We use two types of nodes, static and on-demand. The static nodes
are used for permanent directories like .zfs, .zfs/snapshot, etc. The
on-demand nodes are used for ephemeral directories that act as snapshot
mount points.
Initially only static nodes are created. Their vnodes are instantiated
when they are looked up. The on-demand nodes and vnodes are instantiated
as needed and the nodes are destroyed as soon as the corresponding
vnodes are reclaimed.
We also try very hard to ensure that uncovered snapshot vnodes do not
linger. They are supposed to becomes inactive as soon as they are
uncovered and we try to recycle them immediately.
When a filesystem is unmounted all snapshots under .zfs are unmounted
first, then all vnodes are flushed and finally the static .zfs vnodes
are destroyed.

There are some changes outside of zfsctl code too.
z_ctldir is never used directly (as it is an opaque pointer),
zfsctl_root() has to be used instead. The function returns a locked
vnode now, so it accepts a lock flags parameter. The function can
also fail now, e.g. during force unmounting, whereas previously it
was infallible.
zfsctl_root_lookup() is retired, instead of it VOP_LOOKUP() on the .zfs
vnode (obtained with zfsctl_root) is used.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
There are a very large number of changes, so older changes are hidden. Show Older Changes
bcr added a subscriber: bcr.Aug 6 2016, 10:35 AM

Found a typo in a comment.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
749 ↗(On Diff #19058)

s/orinal/original/
Same for the next line...

avg updated this revision to Diff 19574.Aug 23 2016, 8:27 AM

rebase on the latest head

avg updated this revision to Diff 19575.EditedAug 23 2016, 8:29 AM

fix a typo spotted by bcr

avg marked an inline comment as done.Aug 23 2016, 8:32 AM
avg updated this revision to Diff 19576.Aug 23 2016, 8:35 AM

recycle an unused label

AFAIK zfs_ctldir.c is the only file that includes gfs.h. Since it no longer does, you should delete gfs.h and gfs.c. No point keeping dead code around.

avg added a comment.Aug 23 2016, 3:36 PM

AFAIK zfs_ctldir.c is the only file that includes gfs.h. Since it no longer does, you should delete gfs.h and gfs.c. No point keeping dead code around.

Yes, I am planning to do that.

IMHO it's sad to remove the ability to create a snapshot with mkdir. It's the only way to remotely create a snapshot over an NFS share. I would at least check with the FreeNAS guys before you remove that feature.

avg added a comment.Aug 23 2016, 4:07 PM

IMHO it's sad to remove the ability to create a snapshot with mkdir. It's the only way to remotely create a snapshot over an NFS share. I would at least check with the FreeNAS guys before you remove that feature.

Well, the feature is incomplete as it is impossible to destroy (or rename) a snapshot.
I understand that it might be useful even in that incomplete form, but a local access is still required unless infinite snapshots are okay.

avg added a comment.Aug 23 2016, 4:08 PM

I would at least check with the FreeNAS guys before you remove that feature.

Please feel free to point them to this review.

asomers added inline comments.Aug 25 2016, 9:32 PM
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
253 ↗(On Diff #19576)

should be

return (SET_ERROR(EINVAL))

here and on lines 266 and 281

263 ↗(On Diff #19576)

should be

sizeof (entry)

(note the space) here and elsewhere in the file.

This patch causes EOVERFLOW errors when accessing the ctldir.

# zpool create foo da0
# zfs set snapdir=visible foo
# ls /foo/.zfs
ls: /foo/.zfs: Value too large to be stored in data type
# ls /foo/.zfs/snapshot
ls: /foo/.zfs/snapshot: Value too large to be stored in data type
avg marked an inline comment as done.Sep 5 2016, 3:49 PM

This patch causes EOVERFLOW errors when accessing the ctldir.

Thank you for testing and reporting this!
It looks like the problem is caused by uninitialized va_size.
A fix is coming.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
263 ↗(On Diff #19576)

I prefer to use style(9) in this file.

avg updated this revision to Diff 20066.Sep 5 2016, 9:56 PM
  • rebase on the latest head
  • set va_size in getattr methods, reported by asomers
  • set va_nlink and va_size for .zfs/snapshot based on an actual snapshot count, code by will@
  • add missing SET_ERROR-s, reported by asomers
  • fix an edge case in sfs_readdir_common
In D7421#161576, @avg wrote:

This patch causes EOVERFLOW errors when accessing the ctldir.

Thank you for testing and reporting this!
It looks like the problem is caused by uninitialized va_size.
A fix is coming.

Thanks. I'll retest.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
263 ↗(On Diff #19576)

Better to have consistent but nonstandard style than inconsistent style. If you're going to change a file's style, you should do it all at once. But we can't change this entire file's style, because that would make merges from OpenZFS harder. So please keep the whole file in Solaris style.

avg added inline comments.Sep 6 2016, 3:47 PM
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
263 ↗(On Diff #19576)

I agree in general, but this file is so much rewritten now that it would be extremely hard to merge upstream changes to it anyway.

Running r305480 with your patch, I get a mostly repeatable (7 tries out of 8) panic during zil_002_pos from the ZFS test suite. I cannot reproduce this crash on another machine running r305663 with your patch, so it's hard to say whether this change is responsible.

panic: solaris assert: !zilog_is_dirty(zilog), file: /usr/home/alans/freebsd/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c, line: 1817

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: solaris assert: !zilog_is_dirty(zilog), file: /usr/home/alans/freebsd/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c, line: 1817
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe20b3bd1790
vpanic() at vpanic+0x182/frame 0xfffffe20b3bd1810
panic() at panic+0x43/frame 0xfffffe20b3bd1870
assfail() at assfail+0x1a/frame 0xfffffe20b3bd1880
zil_close() at zil_close+0xf0/frame 0xfffffe20b3bd18b0
zfsvfs_teardown() at zfsvfs_teardown+0x51/frame 0xfffffe20b3bd1900
zfs_umount() at zfs_umount+0xf2/frame 0xfffffe20b3bd1940
dounmount() at dounmount+0x6f4/frame 0xfffffe20b3bd19c0
sys_unmount() at sys_unmount+0x35d/frame 0xfffffe20b3bd1ae0
amd64_syscall() at amd64_syscall+0x2db/frame 0xfffffe20b3bd1bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe20b3bd1bf0

avg added a comment.EditedSep 10 2016, 7:18 AM

@asomers it seems that you experienced the same panic even before my patch:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200592#c1

I wonder if we could add some assertion to see how zilog_dirty gets called either after `zil_commit called from zil_close (if that's possible at all) or concurrently with it.

asomers added inline comments.Sep 12 2016, 5:33 PM
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
263 ↗(On Diff #19576)

In that case, you should convert the entire file rather than leave it with mixed style.

asomers added inline comments.Sep 14 2016, 9:13 PM
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
825 ↗(On Diff #20066)

This comment is obsolete.

@avg I notice you've recently made a few snapshot-related commits. They don't eliminate the need for this revision, do they?

avg added a comment.Oct 10 2016, 2:10 PM

@avg I notice you've recently made a few snapshot-related commits. They don't eliminate the need for this revision, do they?

Alan, not sure which commits you mean. I don't recall any recent commits in that area.

In D7421#170345, @avg wrote:

@avg I notice you've recently made a few snapshot-related commits. They don't eliminate the need for this revision, do they?

Alan, not sure which commits you mean. I don't recall any recent commits in that area.

I was thinking of 306801, 306665, and 306292. I guess they weren't snapshot-related, just vnops-related.

avg updated this revision to Diff 22396.Nov 21 2016, 8:15 AM

rebase

avg updated this revision to Diff 22397.Nov 21 2016, 8:17 AM

rebase

avg updated this revision to Diff 22398.Nov 21 2016, 8:20 AM

use consistent (FreeBSD) style for sizeof

avg marked an inline comment as done.Nov 21 2016, 8:23 AM
avg added inline comments.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
263 ↗(On Diff #19576)

Done.

asomers requested changes to this revision.Nov 22 2016, 6:16 PM
asomers added inline comments.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
825 ↗(On Diff #20066)

Still need to fix this comment.

This revision now requires changes to proceed.Nov 22 2016, 6:16 PM
avg updated this revision to Diff 22459.Nov 23 2016, 11:32 AM
avg marked an inline comment as done.

remove a pointless comment

avg marked 2 inline comments as done.Nov 23 2016, 11:32 AM
asomers accepted this revision.Nov 23 2016, 3:59 PM
This revision is now accepted and ready to land.Nov 23 2016, 3:59 PM
smh added a comment.Nov 28 2016, 9:58 AM

Couple of little style nits and I'd like to understand why ENAMETOOLONG error gets turned into success in a few places.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
685 ↗(On Diff #22459)

Its not clear to me why we want to loose the ENAMETOOLONG error here and below could you explain?

847 ↗(On Diff #22459)

No need to assign to err, just return?
return (SET_ERROR(ENOENT));

1158 ↗(On Diff #22459)

No need for an else here and personally I would check for the error code so that the natural return is success which I find clearer e.g.

if (*zfsvfsp == NULL)
     return (SET_ERROR(EINVAL));

return (0);
avg marked 2 inline comments as done.Nov 28 2016, 10:25 AM
avg added inline comments.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
685 ↗(On Diff #22459)

I think that the comment already explains it to a degree. If you look at vfs_read_dirent it should be clearer. Unfortunately, ENAMETOOLONG is used to signal "buffer full" rather than any name being too long.

847 ↗(On Diff #22459)

Yes.

1158 ↗(On Diff #22459)

Yes, this is better.

avg updated this revision to Diff 22569.Nov 28 2016, 10:31 AM
avg marked 2 inline comments as done.

suggestions from smh

This revision now requires review to proceed.Nov 28 2016, 10:31 AM
smh accepted this revision.Nov 28 2016, 11:41 AM
This revision is now accepted and ready to land.Nov 28 2016, 11:41 AM
avg updated this revision to Diff 25461.Feb 21 2017, 10:53 AM

pre-commit synchronization with head

This revision now requires review to proceed.Feb 21 2017, 10:53 AM
smh requested changes to this revision.Feb 21 2017, 11:23 AM

Just a few seemingly redundant assignments to error vars, sorry didn't spot them before

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
124 ↗(On Diff #25461)

err seems redundant here, could just return the result from vfs_hash_get?

137 ↗(On Diff #25461)

Redundant?

412 ↗(On Diff #25461)

Redundant?

430 ↗(On Diff #25461)

redundant?

772 ↗(On Diff #25461)

SET_ERROR or is this some sort of special case?

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
1641 ↗(On Diff #25461)

redundant assignment

This revision now requires changes to proceed.Feb 21 2017, 11:23 AM
avg added inline comments.Feb 21 2017, 5:26 PM
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
124 ↗(On Diff #25461)

Well, it's just my personal preference. I do not like function calls within return statements. I don't like resulting longer lines too.
I guess that style(9) does not prohibit this?

137 ↗(On Diff #25461)

ditto

412 ↗(On Diff #25461)

ditto

430 ↗(On Diff #25461)

ditto

772 ↗(On Diff #25461)

Yeah, this is just a special return value, it does not mean that any error occurred.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
1641 ↗(On Diff #25461)

same as above :)

smh accepted this revision.Feb 21 2017, 5:30 PM
This revision is now accepted and ready to land.Feb 21 2017, 5:30 PM
This revision was automatically updated to reflect the committed changes.