Page MenuHomeFreeBSD

reimplement zfsctl (.zfs) support
ClosedPublic

Authored by avg on Aug 5 2016, 8:09 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Dec 11, 5:11 AM
Unknown Object (File)
Fri, Dec 6, 6:26 AM
Unknown Object (File)
Fri, Dec 6, 6:26 AM
Unknown Object (File)
Wed, Dec 4, 9:50 AM
Unknown Object (File)
Wed, Dec 4, 9:50 AM
Unknown Object (File)
Wed, Dec 4, 9:49 AM
Unknown Object (File)
Wed, Dec 4, 9:49 AM
Unknown Object (File)
Wed, Nov 27, 12:32 PM

Details

Summary

The current code is written on top of GFS, a library with the generic
support for writing filesystems, which was ported from illumos.
Because of significant differences between illumos VFS and FreeBSD
VFS models, both the GFS and zfsctl code were heavily modified to
work on FreeBSD. Nonetheless, they still contain quite a few ugly
hacks and bugs.

This is a reimplementation of the zfsctl code where the VFS-specific
bits are written from scratch and only the code that interacts with
the rest of ZFS is reused.

Some highlights.

We use two types of nodes, static and on-demand. The static nodes
are used for permanent directories like .zfs, .zfs/snapshot, etc. The
on-demand nodes are used for ephemeral directories that act as snapshot
mount points.
Initially only static nodes are created. Their vnodes are instantiated
when they are looked up. The on-demand nodes and vnodes are instantiated
as needed and the nodes are destroyed as soon as the corresponding
vnodes are reclaimed.
We also try very hard to ensure that uncovered snapshot vnodes do not
linger. They are supposed to becomes inactive as soon as they are
uncovered and we try to recycle them immediately.
When a filesystem is unmounted all snapshots under .zfs are unmounted
first, then all vnodes are flushed and finally the static .zfs vnodes
are destroyed.

There are some changes outside of zfsctl code too.
z_ctldir is never used directly (as it is an opaque pointer),
zfsctl_root() has to be used instead. The function returns a locked
vnode now, so it accepts a lock flags parameter. The function can
also fail now, e.g. during force unmounting, whereas previously it
was infallible.
zfsctl_root_lookup() is retired, instead of it VOP_LOOKUP() on the .zfs
vnode (obtained with zfsctl_root) is used.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 4891
Build 4954: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Found a typo in a comment.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
749

s/orinal/original/
Same for the next line...

rebase on the latest head

fix a typo spotted by bcr

avg marked an inline comment as done.Aug 23 2016, 8:32 AM

recycle an unused label

AFAIK zfs_ctldir.c is the only file that includes gfs.h. Since it no longer does, you should delete gfs.h and gfs.c. No point keeping dead code around.

AFAIK zfs_ctldir.c is the only file that includes gfs.h. Since it no longer does, you should delete gfs.h and gfs.c. No point keeping dead code around.

Yes, I am planning to do that.

IMHO it's sad to remove the ability to create a snapshot with mkdir. It's the only way to remotely create a snapshot over an NFS share. I would at least check with the FreeNAS guys before you remove that feature.

IMHO it's sad to remove the ability to create a snapshot with mkdir. It's the only way to remotely create a snapshot over an NFS share. I would at least check with the FreeNAS guys before you remove that feature.

Well, the feature is incomplete as it is impossible to destroy (or rename) a snapshot.
I understand that it might be useful even in that incomplete form, but a local access is still required unless infinite snapshots are okay.

I would at least check with the FreeNAS guys before you remove that feature.

Please feel free to point them to this review.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
253

should be

return (SET_ERROR(EINVAL))

here and on lines 266 and 281

263

should be

sizeof (entry)

(note the space) here and elsewhere in the file.

This patch causes EOVERFLOW errors when accessing the ctldir.

# zpool create foo da0
# zfs set snapdir=visible foo
# ls /foo/.zfs
ls: /foo/.zfs: Value too large to be stored in data type
# ls /foo/.zfs/snapshot
ls: /foo/.zfs/snapshot: Value too large to be stored in data type
avg marked an inline comment as done.Sep 5 2016, 3:49 PM

This patch causes EOVERFLOW errors when accessing the ctldir.

Thank you for testing and reporting this!
It looks like the problem is caused by uninitialized va_size.
A fix is coming.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
263

I prefer to use style(9) in this file.

  • rebase on the latest head
  • set va_size in getattr methods, reported by asomers
  • set va_nlink and va_size for .zfs/snapshot based on an actual snapshot count, code by will@
  • add missing SET_ERROR-s, reported by asomers
  • fix an edge case in sfs_readdir_common
In D7421#161576, @avg wrote:

This patch causes EOVERFLOW errors when accessing the ctldir.

Thank you for testing and reporting this!
It looks like the problem is caused by uninitialized va_size.
A fix is coming.

Thanks. I'll retest.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
263

Better to have consistent but nonstandard style than inconsistent style. If you're going to change a file's style, you should do it all at once. But we can't change this entire file's style, because that would make merges from OpenZFS harder. So please keep the whole file in Solaris style.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
263

I agree in general, but this file is so much rewritten now that it would be extremely hard to merge upstream changes to it anyway.

Running r305480 with your patch, I get a mostly repeatable (7 tries out of 8) panic during zil_002_pos from the ZFS test suite. I cannot reproduce this crash on another machine running r305663 with your patch, so it's hard to say whether this change is responsible.

panic: solaris assert: !zilog_is_dirty(zilog), file: /usr/home/alans/freebsd/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c, line: 1817

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: solaris assert: !zilog_is_dirty(zilog), file: /usr/home/alans/freebsd/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c, line: 1817
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe20b3bd1790
vpanic() at vpanic+0x182/frame 0xfffffe20b3bd1810
panic() at panic+0x43/frame 0xfffffe20b3bd1870
assfail() at assfail+0x1a/frame 0xfffffe20b3bd1880
zil_close() at zil_close+0xf0/frame 0xfffffe20b3bd18b0
zfsvfs_teardown() at zfsvfs_teardown+0x51/frame 0xfffffe20b3bd1900
zfs_umount() at zfs_umount+0xf2/frame 0xfffffe20b3bd1940
dounmount() at dounmount+0x6f4/frame 0xfffffe20b3bd19c0
sys_unmount() at sys_unmount+0x35d/frame 0xfffffe20b3bd1ae0
amd64_syscall() at amd64_syscall+0x2db/frame 0xfffffe20b3bd1bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe20b3bd1bf0

@asomers it seems that you experienced the same panic even before my patch:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200592#c1

I wonder if we could add some assertion to see how zilog_dirty gets called either after `zil_commit called from zil_close (if that's possible at all) or concurrently with it.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
263

In that case, you should convert the entire file rather than leave it with mixed style.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
824

This comment is obsolete.

@avg I notice you've recently made a few snapshot-related commits. They don't eliminate the need for this revision, do they?

@avg I notice you've recently made a few snapshot-related commits. They don't eliminate the need for this revision, do they?

Alan, not sure which commits you mean. I don't recall any recent commits in that area.

In D7421#170345, @avg wrote:

@avg I notice you've recently made a few snapshot-related commits. They don't eliminate the need for this revision, do they?

Alan, not sure which commits you mean. I don't recall any recent commits in that area.

I was thinking of 306801, 306665, and 306292. I guess they weren't snapshot-related, just vnops-related.

avg edited edge metadata.

rebase

use consistent (FreeBSD) style for sizeof

avg marked an inline comment as done.Nov 21 2016, 8:23 AM
avg added inline comments.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
263

Done.

asomers requested changes to this revision.Nov 22 2016, 6:16 PM
asomers edited edge metadata.
asomers added inline comments.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
824

Still need to fix this comment.

This revision now requires changes to proceed.Nov 22 2016, 6:16 PM
avg edited edge metadata.
avg marked an inline comment as done.

remove a pointless comment

avg marked 2 inline comments as done.Nov 23 2016, 11:32 AM
asomers edited edge metadata.
This revision is now accepted and ready to land.Nov 23 2016, 3:59 PM

Couple of little style nits and I'd like to understand why ENAMETOOLONG error gets turned into success in a few places.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
684

Its not clear to me why we want to loose the ENAMETOOLONG error here and below could you explain?

849

No need to assign to err, just return?
return (SET_ERROR(ENOENT));

1149

No need for an else here and personally I would check for the error code so that the natural return is success which I find clearer e.g.

if (*zfsvfsp == NULL)
     return (SET_ERROR(EINVAL));

return (0);
avg marked 2 inline comments as done.Nov 28 2016, 10:25 AM
avg added inline comments.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
684

I think that the comment already explains it to a degree. If you look at vfs_read_dirent it should be clearer. Unfortunately, ENAMETOOLONG is used to signal "buffer full" rather than any name being too long.

849

Yes.

1149

Yes, this is better.

avg edited edge metadata.
avg marked 2 inline comments as done.

suggestions from smh

This revision now requires review to proceed.Nov 28 2016, 10:31 AM
smh edited edge metadata.
This revision is now accepted and ready to land.Nov 28 2016, 11:41 AM
avg edited edge metadata.

pre-commit synchronization with head

This revision now requires review to proceed.Feb 21 2017, 10:53 AM
smh requested changes to this revision.Feb 21 2017, 11:23 AM
smh edited edge metadata.

Just a few seemingly redundant assignments to error vars, sorry didn't spot them before

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
124

err seems redundant here, could just return the result from vfs_hash_get?

137

Redundant?

412

Redundant?

430

redundant?

771

SET_ERROR or is this some sort of special case?

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
1644

redundant assignment

This revision now requires changes to proceed.Feb 21 2017, 11:23 AM
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
124

Well, it's just my personal preference. I do not like function calls within return statements. I don't like resulting longer lines too.
I guess that style(9) does not prohibit this?

137

ditto

412

ditto

430

ditto

771

Yeah, this is just a special return value, it does not mean that any error occurred.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
1644

same as above :)

smh edited edge metadata.
This revision is now accepted and ready to land.Feb 21 2017, 5:30 PM
This revision was automatically updated to reflect the committed changes.