Page MenuHomeFreeBSD

fusefs: merge from projects/fuse2
AbandonedPublic

Authored by asomers on Jul 12 2019, 8:18 PM.
Tags
None
Referenced Files
F80143312: D20940.diff
Thu, Mar 28, 12:49 PM
Unknown Object (File)
Fri, Mar 15, 10:39 PM
Unknown Object (File)
Dec 23 2023, 10:55 PM
Unknown Object (File)
Dec 21 2023, 8:42 PM
Unknown Object (File)
Dec 20 2023, 5:17 AM
Unknown Object (File)
Nov 18 2023, 8:23 PM
Unknown Object (File)
Nov 18 2023, 4:48 PM
Unknown Object (File)
Nov 18 2023, 7:30 AM

Details

Reviewers
cem
ngie
scottl
Group Reviewers
manpages
Summary

fusefs: merge from projects/fuse2

This commit imports the new fusefs driver. It raises the protocol level
from 7.8 to 7.23, fixes many bugs, adds a test suite for the driver, and
adds many new features. New features include:

  • Optional kernel-side permissions checks (-o default_permissions)
  • Implement VOP_MKNOD, VOP_BMAP, and VOP_ADVLOCK
  • Allow interrupting FUSE operations
  • Support named pipes and unix-domain sockets in fusefs file systems
  • Forward UTIME_NOW during utimensat(2) to the daemon
  • kqueue support for /dev/fuse
  • Allow updating mounts with "mount -u"
  • Allow exporting fusefs file systems over NFS
  • Server-initiated invalidation of the name cache or data cache
  • Respect RLIMIT_FSIZE
  • Try to support servers as old as protocol 7.4

Performance enhancements include:

  • Implement FUSE's FOPEN_KEEP_CACHE and FUSE_ASYNC_READ flags
  • Cache file attributes
  • Cache lookup entries, both positive and negative
  • Server-selectable cache modes: writethrough, writeback, or uncached
  • Write clustering
  • Readahead
  • Use counter(9) for statistical reporting

PR: 199934 216391 233783 234581 235773 235774 235775 235775
PR: 236226 236231 236236 236291 236329 236379 236381 236405
PR: 236327 236466 236472 236473 236474 236530 236557 236560
PR: 236647 236844 237052 237181 237588 238565

Test Plan

Unit tests added. Additional testing with fsx and pjdfstest.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 25397
Build 24032: arc lint + arc unit

Event Timeline

Do any of you intend to review this? I can understand if it's simply too much to review. But if so, I'd like to know. In that case I'll go ahead and commit, relying on the post-commit review that some of you have already given for my commits to the project branch.

Shouldn't the VFS and VM changes be split into a separate review and committed separately?

lib/libc/gen/getvfsbyname.c
55

Extra newline.

84

Style's wrong here.

sys/vm/vnode_pager.c
467

Why is this change needed?

kib added inline comments.
lib/libc/gen/getvfsbyname.c
84

Style: '{' should be on prev line.

share/man/man9/VOP_FSYNC.9
11

This is a strange change.

sys/fs/fuse/fuse_ipc.c
505

As was discussed elsewere, this is racy. Also, even if ignoring the race, sig_isfatal() is not correct, for example consider the difference between trapsignal() and async signals.

sys/vm/vnode_pager.c
467

this is wrong, for instance the pages for UFS indirect blocks are invalidated by the change, as well as the ext attr pages.

asomers added inline comments.
lib/libc/gen/getvfsbyname.c
84

Ok, I'll change it. I did it this way because I find multi-line if conditions to be confusing. It's easier to see where the condition ends by putting the { on its own line.

share/man/man9/VOP_FSYNC.9
11

The Foundation asked that I added this copyright notice to any file that I substantially modify. In r345677 I added a section to this man page but forgot to update the copyright, so I'm doing it now.

sys/fs/fuse/fuse_ipc.c
505

I believe your objection is based on the fact that trapsignal could end up delivering a signal to a thread that won't immediately return to userspace, if the daemon doesn't respond to FUSE_INTERRUPT. Is that correct? Could you suggest an alternate way to handle it? Does it make a difference to you that intr is a non-default mount option? (though weirdly, it's a "mount option" that's implemented in userspace within libfuse).

sys/vm/vnode_pager.c
467

@kib: Are you sure? I though indirect blocks had negative offsets? At least, that's what the comment in ffs_blkpref_ufs1 says. And extended attributes look the same, judging by the code in ffs_extread.

@markj this change wasn't *required*, it's just a simplification I made while I was working in this area. AFAICT the special case handling for end == 0 was added in r32286, well after this call was written. So I think the author of 32286 simply didn't realize that this function could use that special case, too.

share/man/man9/VOP_FSYNC.9
11

So why keep it on the large diff for review, when it should be trivially commitable ?

sys/fs/fuse/fuse_ipc.c
505

No, my objections are based on two facts:

  • after your checks, the disposition of the signal can be changed, or signal can be rescheduled to different thread at all.
  • even if we assume that neither disposition nor scheduling occur, the checks are still not correct (e.g. catchable traps can be delivered as fatal due to several factors, among them are stack state or signal mask state).
sys/vm/vnode_pager.c
467

Yes, indirect and ext blocks have negative indexes. The vm_pindex_t is unsigned, which means that indexes of the indirect and ext blocks are converted into very large non-negative page indexes. Note that these page indexes cannot be instantiated by userspace.

asomers added inline comments.
sys/fs/fuse/fuse_ipc.c
505

Ok, so given the race potential maybe I should just remove sig_isfatal, and always wait for a response from the daemon, then. The downside is just increased signal latency.

sys/vm/vnode_pager.c
467

Grr, stupid type punning. Ok, I'll revert that part.

sys/fs/fuse/fuse_ipc.c
505

We already discussed this. I remember that I noted the "intr" mount option for NFS, and you can use it as an example. If "intr" is specified, then any signal interrupts the io, by means of specifying PCATCH to msleep(9). If it is not specified, you do not specify PCATCH.

There could be some refinement, as used by NFS, e.g. you can mask all signals except SIGINT/SIGTERM/SIGQUIT around msleep(PCATCH).

share/man/man9/VOP_FSYNC.9
11

Because it was made as part of a larger commit on the project branch, and the entire commit (r349502) shouldn't be merged to head before some of the other changes that are part of this review.

Remerge from the projects branch.

Hopefully there shouldn't be any changes to the content of the review.

Build fixes

Merge r349970 and r349977 from projects/fuse2

r349977 | asomers | 2019-07-13 15:41:12 -0600 (Sat, 13 Jul 2019) | 6 lines

fusefs: fix the build with some NODEBUG kernels

systm.h needs to be included before counter.h

Sponsored by: The FreeBSD Foundation


r349970 | asomers | 2019-07-13 08:42:09 -0600 (Sat, 13 Jul 2019) | 7 lines

projects/fuse2: build fixes

  • Fix the kernel build with gcc by removing a redundant extern declaration
  • In the tests, fix a printf format specifier that assumed LP64

Sponsored by: The FreeBSD Foundation

Revert r346608

That change was intended to be cosmetic, but it inadvertenly caused
vnode_pager_setsize to discard cached indirect blocks and extended
attributes on UFS during truncation. The reason is because those blocks
have negative LBNs, which get sign-cast to positive VM indexes.

style changes to getvfsbyname

fusefs: multiple interruptility improvements

  1. Don't explicitly not mask SIGKILL. kern_sigprocmask won't allow it to be masked, anyway.
  1. Fix an infinite loop bug. If a process received both a maskable signal lower than 9 (like SIGINT) and then received SIGKILL, fticket_wait_answer would spin. msleep would immediately return EINTR, but cursig would return SIGINT, so the sleep would get retried. Fix it by explicitly checking whether SIGKILL has been received.
  1. Abandon the sig_isfatal optimization introduced by r346357. That optimization would cause fticket_wait_answer to return immediately, without waiting for a response from the server, if the process were going to exit anyway. However, it's vulnerable to a race:
    1. fatal signal is received while fticket_wait_answer is sleeping.
    2. fticket_wait_answer sends the FUSE_INTERRUPT operation.
    3. fticket_wait_answer determines that the signal was fatal and returns without waiting for a response.
    4. Another thread changes the signal to non-fatal.
    5. The first thread returns to userspace. Instead of exiting, the process continues.
    6. The application receives EINTR, wrongly believes that the operation was successfully interrupted, and restarts it. This could cause problems for non-idempotent operations like FUSE_RENAME.

fusefs: fix another semi-infinite loop bug regarding signal handling

fticket_wait_answer would spin if it received an unhandled signal whose
default disposition is to terminate. The reason is because msleep(9) would
return EINTR even for a masked signal. One reason is when the thread is
stopped, which happens for example during sigexit(). Fix this bug by
returning immediately if fticket_wait_answer ever gets interrupted a second
time, for any reason.

fusefs: add a intr/nointr mount option

FUSE file systems can optionally support interrupting outstanding
operations. However, the file system does not identify to the kernel at
mount time whether it's capable of doing that. Instead it signals its
noncapability by returning ENOSYS to the first FUSE_INTERRUPT operation it
receives. That's a problem for reliable signal delivery, because the kernel
must choose which thread should get a signal before it knows whether the
FUSE server can handle interrupts. The problem is even worse because the
FUSE protocol allows a file system to simply ignore all FUSE_INTERRUPT
operations.

Fix the signal delivery logic by making interruptibility an opt-in mount
option. This will require a corresponding change to libfuse, but not to
most file systems that link to libfuse.

Bump __FreeBSD_version due to the new mount option.

Would anybody like more time to review the merge? If not, I'll commit in about 24 hours.

Would anybody like more time to review the merge? If not, I'll commit in about 24 hours.

Please split out non-fs/fusefs changes into individual reviews and give me some time to read over them.

I think that the comment update in vfs_cache.c can be committed, but I did not read vfs_subr.c changes to vtruncbuf(), and I am not sure if I missed anything else.

lib/libc/gen/getvfsbyname.c
49–51

presumably this part can be reviewed/committed independently?
also I find the are_ prefix somewhat odd

share/man/man9/VOP_FSYNC.9
11

it could be split out and committed independently pretty easily though

share/mk/bsd.compiler.mk
204–207

this part ought to be reviewed/committed separately

asomers added inline comments.
lib/libc/gen/getvfsbyname.c
49–51

Ok, I moved it into D21043 . The "are_" prefix refers to the fact that this method is validating two arguments rather than one.

share/man/man9/VOP_FSYNC.9
11

Ok.

share/mk/bsd.compiler.mk
204–207

Pulled it out into D21044.

Would you rebase/regenerate this patch so that those pieces that have been split out and committed separately drop out of the review?

Would you rebase/regenerate this patch so that those pieces that have been split out and committed separately drop out of the review?

Yeah. I'll rebase as soon as D21095 completes.

Superseded by D21110. I created a new revision because in my experience Phabricator doesn't handle rebased revisions very well.