Add a virtiofs file system based on FUSE to interface with the virtio fs device. The file system communicates to a FUSE server on the host through this device.
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 70332 Build 67215: arc lint + arc unit
Event Timeline
I'm still a little shaky on the details of virtio-fs, but I can't find anything obviously wrong here.
| sys/fs/fuse/virtiofs_vfsops.c | ||
|---|---|---|
| 6 | I think we're not supposed to use "All rights reserved" in new files. | |
Thanks a lot for working on this!
| sys/fs/fuse/virtiofs_vfsops.c | ||
|---|---|---|
| 420 | Can we use the "from" for the tag like p9fs does? It's a bit strange to have an unused arbitrary string as the "from", and I haven't managed to get root mount to work even though it seemingly is passing the options… | |
| sys/fs/fuse/virtiofs_vfsops.c | ||
|---|---|---|
| 394 | The "strict access policy" is only breaking things for virtiofs, i.e. if you mount a home directory for a user, that user wouldn't be able to access it. | |
| sys/fs/fuse/virtiofs_vfsops.c | ||
|---|---|---|
| 420 | ah, re: mountroot, virtiofs needs to be added to vfs_mountroot_wait_if_neccessary (ugh why don't we have proper registered flags for this)… And then, remount support is expected because it mounts as RO first. | |
| sys/fs/fuse/virtiofs_vfsops.c | ||
|---|---|---|
| 420 | Actually that whole routine is bogus, as is waiting for the dev under the mountpoint. I've written it locally, but it was on a laptop that decided to die suddenly for reasons I've not been able to diagnose. | |
| sys/fs/fuse/virtiofs_vfsops.c | ||
|---|---|---|
| 376 | You also have to *set* the size here with fiov_adjust like the local FUSE transport does when it copies data from userspace in fticket_aw_pull_uio right after fuse_body_audit. Took me a few hours to debug why getdirentries was completely busted (: It relies on the buffer size to stop iteration; without fiov_adjust the buffer size was always 4096 so it would iterate past the actual message size, hitting a wall of zeroes, interpreting that as an entry with a 0-length name and bailing with EINVAL. With that fixed, virtiofs is already looking a lot better than p9fs. Namely, I could install a package with pkg under a virtiofs root, while under p9fs I'm hitting rather strange errors. Reusing the FUSE protocol was a really clever idea! | |
Thanks for the feedback! I will update the diff accordingly.
| sys/fs/fuse/virtiofs_vfsops.c | ||
|---|---|---|
| 376 | Great catch! I think this also fixes a bug I had been trying to triage for write back mode when heavily loading the VM. I will update and fix. | |
| 420 | Sounds good, just to make sure I've gotten it right: Changing the VFS option from "tag" to "from" should bring virtiofs mount options in line w/ other file systems. | |
| 420 | Re: remount, I can add it to this diff if it helps with testing. I omitted it to keep this diff set as small as possible, because it is already pretty large. | |
| sys/fs/fuse/virtiofs_vfsops.c | ||
|---|---|---|
| 125 | This just causes all buffers to leak. | |
A couple more notes from testing:
There's some other place where FUSE layer can pick up zero-padded-to-power-of-two things, now with 1024 instead of 4096:
WARNING: FUSE protocol violation for server mounted at /: Returned an embedded NUL from FUSE_READLINK. This warning will not be repeated. Len 1024 0000 62 61 73 68 00 00 00 00 00 00 00 00 00 00 00 00 |bash............| 0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| …
And trying to just find something in a src tree checkout, there are sporadic errors like
find: ./contrib/llvm-project/llvm/lib/DebugInfo/Symbolize: Bad file descriptor find: ./sys/dev/qat/qat_hw: Invalid argument find: ./sys/dev/qat/qat_api/common/stubs: Invalid argument // etc…
that seem completely random, on one invocation there are none, on another there's a couple…
| sys/fs/fuse/virtiofs_vfsops.c | ||
|---|---|---|
| 420 | yes, the "from" is the 'magic' one that's the source argument in the mount command that you have to pass anyway | |
| 473 | "virtiofs.virtio" (the last fs gets cut off) is kinda redundant, so this isn't necessary | |
Thank you for the testing! Are you using writethrough or writeback mode for FUSE? Also, what command line arguments are you using for virtiofsd on the host side?
Never adjusted any parameters on the client side. Adding --writeback to virtiofsd results in panic: FUSE: ticket still on answer delivery list.
I was using --inode-file-handles=prefer --allow-direct-io --allow-mmap mostly. But changing flags doesn't make the random errors go away.
(not surprising as write caching stuff shouldn't affect running find a bunch of times)
Will do, I will also update the diff with Val's feedback. Sorry about that, I have had no downtime since September to work on this.
I assume a script that spins up a VM and does reads/writes should be enough? Running the FUSE tests with this as originally planned would require D45370, but that feature is still very much a WIP.
There are two outstanding bugs that need fixing before this can be merged:
- Handling FUSE_DESTROY on unmount is incorrect. The virtiofsd client returns -ENOSYS and hangs the unmount operation.
- Reference counting for the FUSE messages is incorrect. The refcount_acquire() in line 123 does cause buffers to leak, but removing it causes read/write errors so I need to check the reference counting throughout the code.
- virtiofs: rename 'tag' VFS option to 'from'
- fs/fuse/virtiofs: adjust FUSE flags and disallow fsync
Just a note that KDE rolls out a new CI infrastructure and they would greatly benefit from the virtiofs support.
Hi Emil — thanks for this, I'm keen to get virtio-fs working on FreeBSD!! Anything I can do to help, please let me know!
I've been trying to build and test D46296 on an unusual target: a FreeBSD/arm64 guest under Apple's Virtualization.framework (macOS host), which shares host directories exclusively over virtio-fs — so this driver is exactly what I'm very keen to see working here! I rebased D46296 onto current main and tried to build the virtiofs module, and ran into a couple of things that I think currently block building/testing it standalone:
- The virtio-fs transport isn't included, and the glue needs a newer one than what's public. virtiofs_vfsops.c does #include <dev/virtio/fs/virtio_fs.h>, but sys/dev/virtio/fs/ isn't part of this revision, and it isn't in main either (only sys/dev/virtio/p9fs is). I couldn't find a companion review that adds it. I tried grafting the transport from your virtiofs-head branch, but the interface has moved on since then — virtiofs_vfsops.c calls:
vtfs_register_cb(vtfs, forget_ticket, complete_ticket, cancel_ticket, teardown, data); /* which is 6 args */
while the published vtfs_register_cb() takes 5 (no cancel_ticket callback). So the matching transport seems to be newer than anything that's public.
Is sys/dev/virtio/fs/ going up as a separate review? Is it available somewhere and I just missed it? (apologies if that's the case) If the version that matches this revision was pushed, I think it'd let reviewers (and (being selfish) me) actually build and exercise the whole stack.
- One other minor trouble, on rebase to current main: FSESS_VIRTIOFS 0x20000000 now collides — main added FSESS_WARN_LSEXTATTR_NUL 0x20000000 since this diff's base, so it needs the next free bit (0x40000000). Maybe this is all related.
One small build-infra note in case it might help someone else: building the module standalone needs SRCS+= vnode_if.h in sys/modules/virtiofs/Makefile (otherwise vnode.h's #include "vnode_if.h" isn't generated).
Once the transport is available I would be very happy to build and exercise the full stack on arm64 under Apple VZ and report back — it might be some platform coverage that's a bit different. Thanks again for the work on this!
Hi Stefan! Testing the patchset would be greatly helpful, thanks you for trying it out. I have been testing exclusively on an x86 Linux host so a macOS host is a great smoke test for it.
For testing, can you try out cherry-picking the patches from my tree here ? The tree has four patches that correspond to Diffs D46295, D46296, D55047, and D55046. The diffs add the virtio device and file system, and add two fixes to the FUSE code. D55046 is abandoned because it wasn't addressing the problem to its full extent. Still, please include it to avoid cache inconsistencies.
- The virtio-fs transport isn't included, and the glue needs a newer one than what's public. virtiofs_vfsops.c does #include <dev/virtio/fs/virtio_fs.h>, but sys/dev/virtio/fs/ isn't part of this revision, and it isn't in main either (only sys/dev/virtio/p9fs is). I couldn't find a companion review that adds it. I tried grafting the transport from your virtiofs-head branch, but the interface has moved on since then — virtiofs_vfsops.c calls:
vtfs_register_cb(vtfs, forget_ticket, complete_ticket, cancel_ticket, teardown, data); /* which is 6 args */while the published vtfs_register_cb() takes 5 (no cancel_ticket callback). So the matching transport seems to be newer than anything that's public.
I think the missing diff is D46295. The virtiofs-head branch is badly named and is behind virtiofs. The virtiofs branch should have the most up-to-date code.
Is sys/dev/virtio/fs/ going up as a separate review? Is it available somewhere and I just missed it? (apologies if that's the case) If the version that matches this revision was pushed, I think it'd let reviewers (and (being selfish) me) actually build and exercise the whole stack.
- One other minor trouble, on rebase to current main: FSESS_VIRTIOFS 0x20000000 now collides — main added FSESS_WARN_LSEXTATTR_NUL 0x20000000 since this diff's base, so it needs the next free bit (0x40000000). Maybe this is all related.
Thank you for the heads up! I will rebase to avoid the collision.
One small build-infra note in case it might help someone else: building the module standalone needs SRCS+= vnode_if.h in sys/modules/virtiofs/Makefile (otherwise vnode.h's #include "vnode_if.h" isn't generated).
Once the transport is available I would be very happy to build and exercise the full stack on arm64 under Apple VZ and report back — it might be some platform coverage that's a bit different. Thanks again for the work on this!
Thank you! I appreciate the kind words, if I can clarify anything further or there is any other issue please let me know, I'd be happy to help.
Thanks Emil — this is what I was looking for. I just built from the virtiofs branch and got the full stack compiling on main: fusefs.ko, the vtfs transport, and virtiofs.ko all build cleanly. The only needed fix is SRCS+= vnode_if.h in sys/modules/virtiofs/Makefile, so the standalone module build emits the generated header before compiling virtiofs_vfsops.c.
I also confirmed the target itself works: a FreeBSD 16.0-CURRENT/arm64 snapshot boots fine under Apple's Virtualization.framework, and VZ exposes both the virtio-fs device (0x105a, sitting unclaimed) and a vsock device — so everything's in place for a runtime test.
Where I got stuck is the branch's age. virtiofs is based on a main that's roughly 3,700 commits behind current -CURRENT, so the modules are KBI-incompatible with a current kernel — kldload panics. I tried rebasing the driver onto a current snapshot to get a matched build, but it conflicts in fuse_internal_cache_attrs(): main now takes CACHED_ATTR_LOCK(vp) there, which overlaps your "ensure called with the exclusive vnode lock" change (0aeffd6). That's a locking-semantics call I didn't want to start taking guesses at.
Just for the sake of clarity, this is a separate (and rather more demanding) request than the FSESS_VIRTIOFS bit you offered to fix in your post above. The blocker for testing is the branch's base being about 3700 commits behind today's -CURRENT: nothing built from it will load on a current kernel. Would it be possible for you to bring the whole virtiofs branch up onto a recent main? I think I could then finish off an arm64/VZ test fairly quickly — I've already got a snapshot that boots under VZ and the build working, so it'd just be loading the updated modules and mounting a share. Happy to report back what I find. Thanks again for the pointers!