Page MenuHomeFreeBSD

fusefs: merge from projects/fuse2
AbandonedPublic

Authored by asomers on Jul 12 2019, 8:18 PM.

Details

Reviewers
cem
ngie
scottl
Group Reviewers
manpages
Summary

fusefs: merge from projects/fuse2

This commit imports the new fusefs driver. It raises the protocol level
from 7.8 to 7.23, fixes many bugs, adds a test suite for the driver, and
adds many new features. New features include:

  • Optional kernel-side permissions checks (-o default_permissions)
  • Implement VOP_MKNOD, VOP_BMAP, and VOP_ADVLOCK
  • Allow interrupting FUSE operations
  • Support named pipes and unix-domain sockets in fusefs file systems
  • Forward UTIME_NOW during utimensat(2) to the daemon
  • kqueue support for /dev/fuse
  • Allow updating mounts with "mount -u"
  • Allow exporting fusefs file systems over NFS
  • Server-initiated invalidation of the name cache or data cache
  • Respect RLIMIT_FSIZE
  • Try to support servers as old as protocol 7.4

Performance enhancements include:

  • Implement FUSE's FOPEN_KEEP_CACHE and FUSE_ASYNC_READ flags
  • Cache file attributes
  • Cache lookup entries, both positive and negative
  • Server-selectable cache modes: writethrough, writeback, or uncached
  • Write clustering
  • Readahead
  • Use counter(9) for statistical reporting

PR: 199934 216391 233783 234581 235773 235774 235775 235775
PR: 236226 236231 236236 236291 236329 236379 236381 236405
PR: 236327 236466 236472 236473 236474 236530 236557 236560
PR: 236647 236844 237052 237181 237588 238565

Test Plan

Unit tests added. Additional testing with fsx and pjdfstest.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 25404
Build 24039: arc lint + arc unit

Event Timeline

asomers created this revision.Jul 12 2019, 8:18 PM

Do any of you intend to review this? I can understand if it's simply too much to review. But if so, I'd like to know. In that case I'll go ahead and commit, relying on the post-commit review that some of you have already given for my commits to the project branch.

markj added a subscriber: markj.Jul 16 2019, 9:22 PM

Shouldn't the VFS and VM changes be split into a separate review and committed separately?

lib/libc/gen/getvfsbyname.c
55

Extra newline.

83

Style's wrong here.

sys/vm/vnode_pager.c
467 ↗(On Diff #59711)

Why is this change needed?

kib added a subscriber: kib.Jul 16 2019, 9:27 PM
kib added inline comments.
lib/libc/gen/getvfsbyname.c
83

Style: '{' should be on prev line.

share/man/man9/VOP_FSYNC.9
11

This is a strange change.

sys/fs/fuse/fuse_ipc.c
505

As was discussed elsewere, this is racy. Also, even if ignoring the race, sig_isfatal() is not correct, for example consider the difference between trapsignal() and async signals.

sys/vm/vnode_pager.c
467 ↗(On Diff #59711)

this is wrong, for instance the pages for UFS indirect blocks are invalidated by the change, as well as the ext attr pages.

asomers marked 4 inline comments as done.Jul 16 2019, 10:05 PM
asomers added inline comments.
lib/libc/gen/getvfsbyname.c
83

Ok, I'll change it. I did it this way because I find multi-line if conditions to be confusing. It's easier to see where the condition ends by putting the { on its own line.

share/man/man9/VOP_FSYNC.9
11

The Foundation asked that I added this copyright notice to any file that I substantially modify. In r345677 I added a section to this man page but forgot to update the copyright, so I'm doing it now.

sys/fs/fuse/fuse_ipc.c
505

I believe your objection is based on the fact that trapsignal could end up delivering a signal to a thread that won't immediately return to userspace, if the daemon doesn't respond to FUSE_INTERRUPT. Is that correct? Could you suggest an alternate way to handle it? Does it make a difference to you that intr is a non-default mount option? (though weirdly, it's a "mount option" that's implemented in userspace within libfuse).

sys/vm/vnode_pager.c
467 ↗(On Diff #59711)

@kib: Are you sure? I though indirect blocks had negative offsets? At least, that's what the comment in ffs_blkpref_ufs1 says. And extended attributes look the same, judging by the code in ffs_extread.

@markj this change wasn't *required*, it's just a simplification I made while I was working in this area. AFAICT the special case handling for end == 0 was added in r32286, well after this call was written. So I think the author of 32286 simply didn't realize that this function could use that special case, too.

kib added inline comments.Jul 17 2019, 6:10 AM
share/man/man9/VOP_FSYNC.9
11

So why keep it on the large diff for review, when it should be trivially commitable ?

sys/fs/fuse/fuse_ipc.c
505

No, my objections are based on two facts:

  • after your checks, the disposition of the signal can be changed, or signal can be rescheduled to different thread at all.
  • even if we assume that neither disposition nor scheduling occur, the checks are still not correct (e.g. catchable traps can be delivered as fatal due to several factors, among them are stack state or signal mask state).
sys/vm/vnode_pager.c
467 ↗(On Diff #59711)

Yes, indirect and ext blocks have negative indexes. The vm_pindex_t is unsigned, which means that indexes of the indirect and ext blocks are converted into very large non-negative page indexes. Note that these page indexes cannot be instantiated by userspace.

asomers marked 2 inline comments as done.Jul 17 2019, 5:25 PM
asomers added inline comments.
sys/fs/fuse/fuse_ipc.c
505

Ok, so given the race potential maybe I should just remove sig_isfatal, and always wait for a response from the daemon, then. The downside is just increased signal latency.

sys/vm/vnode_pager.c
467 ↗(On Diff #59711)

Grr, stupid type punning. Ok, I'll revert that part.

kib added inline comments.Jul 17 2019, 7:36 PM
sys/fs/fuse/fuse_ipc.c
505

We already discussed this. I remember that I noted the "intr" mount option for NFS, and you can use it as an example. If "intr" is specified, then any signal interrupts the io, by means of specifying PCATCH to msleep(9). If it is not specified, you do not specify PCATCH.

There could be some refinement, as used by NFS, e.g. you can mask all signals except SIGINT/SIGTERM/SIGQUIT around msleep(PCATCH).

asomers added inline comments.Jul 17 2019, 7:47 PM
share/man/man9/VOP_FSYNC.9
11

Because it was made as part of a larger commit on the project branch, and the entire commit (r349502) shouldn't be merged to head before some of the other changes that are part of this review.

asomers updated this revision to Diff 59883.Jul 18 2019, 6:21 PM

Remerge from the projects branch.

Hopefully there shouldn't be any changes to the content of the review.

asomers updated this revision to Diff 59884.Jul 18 2019, 6:35 PM

Build fixes

Merge r349970 and r349977 from projects/fuse2

r349977 | asomers | 2019-07-13 15:41:12 -0600 (Sat, 13 Jul 2019) | 6 lines

fusefs: fix the build with some NODEBUG kernels

systm.h needs to be included before counter.h

Sponsored by: The FreeBSD Foundation


r349970 | asomers | 2019-07-13 08:42:09 -0600 (Sat, 13 Jul 2019) | 7 lines

projects/fuse2: build fixes

  • Fix the kernel build with gcc by removing a redundant extern declaration
  • In the tests, fix a printf format specifier that assumed LP64

Sponsored by: The FreeBSD Foundation

asomers updated this revision to Diff 59885.Jul 18 2019, 6:42 PM

Revert r346608

That change was intended to be cosmetic, but it inadvertenly caused
vnode_pager_setsize to discard cached indirect blocks and extended
attributes on UFS during truncation. The reason is because those blocks
have negative LBNs, which get sign-cast to positive VM indexes.

asomers updated this revision to Diff 59886.Jul 18 2019, 6:47 PM

style changes to getvfsbyname

asomers updated this revision to Diff 59887.Jul 18 2019, 6:52 PM

fusefs: multiple interruptility improvements

  1. Don't explicitly not mask SIGKILL. kern_sigprocmask won't allow it to be masked, anyway.
  1. Fix an infinite loop bug. If a process received both a maskable signal lower than 9 (like SIGINT) and then received SIGKILL, fticket_wait_answer would spin. msleep would immediately return EINTR, but cursig would return SIGINT, so the sleep would get retried. Fix it by explicitly checking whether SIGKILL has been received.
  1. Abandon the sig_isfatal optimization introduced by r346357. That optimization would cause fticket_wait_answer to return immediately, without waiting for a response from the server, if the process were going to exit anyway. However, it's vulnerable to a race:
    1. fatal signal is received while fticket_wait_answer is sleeping.
    2. fticket_wait_answer sends the FUSE_INTERRUPT operation.
    3. fticket_wait_answer determines that the signal was fatal and returns without waiting for a response.
    4. Another thread changes the signal to non-fatal.
    5. The first thread returns to userspace. Instead of exiting, the process continues.
    6. The application receives EINTR, wrongly believes that the operation was successfully interrupted, and restarts it. This could cause problems for non-idempotent operations like FUSE_RENAME.
asomers updated this revision to Diff 59889.Jul 18 2019, 6:57 PM

fusefs: fix another semi-infinite loop bug regarding signal handling

fticket_wait_answer would spin if it received an unhandled signal whose
default disposition is to terminate. The reason is because msleep(9) would
return EINTR even for a masked signal. One reason is when the thread is
stopped, which happens for example during sigexit(). Fix this bug by
returning immediately if fticket_wait_answer ever gets interrupted a second
time, for any reason.

asomers updated this revision to Diff 59890.Jul 18 2019, 7:01 PM

fusefs: add a intr/nointr mount option

FUSE file systems can optionally support interrupting outstanding
operations. However, the file system does not identify to the kernel at
mount time whether it's capable of doing that. Instead it signals its
noncapability by returning ENOSYS to the first FUSE_INTERRUPT operation it
receives. That's a problem for reliable signal delivery, because the kernel
must choose which thread should get a signal before it knows whether the
FUSE server can handle interrupts. The problem is even worse because the
FUSE protocol allows a file system to simply ignore all FUSE_INTERRUPT
operations.

Fix the signal delivery logic by making interruptibility an opt-in mount
option. This will require a corresponding change to libfuse, but not to
most file systems that link to libfuse.

Bump __FreeBSD_version due to the new mount option.

Would anybody like more time to review the merge? If not, I'll commit in about 24 hours.

kib added a comment.Mon, Jul 22, 7:19 PM

Would anybody like more time to review the merge? If not, I'll commit in about 24 hours.

Please split out non-fs/fusefs changes into individual reviews and give me some time to read over them.

I think that the comment update in vfs_cache.c can be committed, but I did not read vfs_subr.c changes to vtruncbuf(), and I am not sure if I missed anything else.

emaste added inline comments.Tue, Jul 23, 8:55 PM
lib/libc/gen/getvfsbyname.c
49–51

presumably this part can be reviewed/committed independently?
also I find the are_ prefix somewhat odd

share/man/man9/VOP_FSYNC.9
11

it could be split out and committed independently pretty easily though

share/mk/bsd.compiler.mk
204–207

this part ought to be reviewed/committed separately

asomers marked 3 inline comments as done.Tue, Jul 23, 11:15 PM
asomers added inline comments.
lib/libc/gen/getvfsbyname.c
49–51

Ok, I moved it into D21043 . The "are_" prefix refers to the fact that this method is validating two arguments rather than one.

share/man/man9/VOP_FSYNC.9
11

Ok.

share/mk/bsd.compiler.mk
204–207

Pulled it out into D21044.

Would you rebase/regenerate this patch so that those pieces that have been split out and committed separately drop out of the review?

asomers marked 3 inline comments as done.Mon, Jul 29, 7:36 PM

Would you rebase/regenerate this patch so that those pieces that have been split out and committed separately drop out of the review?

Yeah. I'll rebase as soon as D21095 completes.

asomers abandoned this revision.Tue, Jul 30, 4:41 AM

Superseded by D21110. I created a new revision because in my experience Phabricator doesn't handle rebased revisions very well.