Page MenuHomeFreeBSD

fusefs: fix list size calculation bug in fuse_vnop_listextattr
ClosedPublic

Authored by asomers on Aug 16 2019, 4:50 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Apr 11, 4:55 AM
Unknown Object (File)
Wed, Apr 10, 5:33 AM
Unknown Object (File)
Sat, Apr 6, 3:43 AM
Unknown Object (File)
Fri, Apr 5, 4:04 AM
Unknown Object (File)
Fri, Mar 29, 9:42 AM
Unknown Object (File)
Thu, Mar 21, 8:58 PM
Unknown Object (File)
Thu, Mar 21, 8:58 PM
Unknown Object (File)
Jan 16 2024, 12:48 AM
Subscribers

Details

Summary

fusefs: fix list size calculation bug in fuse_vnop_listextattr

A small error in r338152 let to the returned size always being exactly eight
bytes too large.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 26143
Build 24660: arc lint + arc unit

Event Timeline

Ohh, I just realized the purpose of line 2321. I'll fix the review in the morning.

sys/fs/fuse/fuse_vnops.c
2306

This was the error. list_xattr_in->size refers to the size of the attribute list, not the size of the FUSE transfer. Though neither fuse_kernel.h, the linux kernel, nor libfuse document it, the intention is clear from reading libfuse's passthrough example.

sys/fs/fuse/fuse_vnops.c
2312–2313

Do we not still need this?

Handle a race condition in fuse_vnop_listextattr

fuse_vnop_listextattr is fundamentally racy. Due to FreeBSD/Linux API
differences and the fact that FUSE was written for Linux, we must always
send FUSE_LISTXATTR twice; once to get the length of the attribute list, and
once to get the list. That presents an obvious race.

This update fixes three things:

  • Retry if the server returns ERANGE, indicating the list has grown
  • Save some RAM if the list has shrunk (prior to this DR the code already did that).
  • Warn if the server illegally lengthens the list

Sorry about the long delay — life has been busy.

This change looks like a step in the right direction, thanks! I added some suggestions below.

I didn't look at the test changes; I don't have any familiarity with googletest (or the C++ constructs used), so I'll take your word for those.

sys/fs/fuse/fuse_vnops.c
2325–2329

(1) Hm. This seems like a vector where an adversarial userspace FS could cause a kernel infinite loop, preventing victim process termination. Maybe we should limit the number of retries in some way?

If some threshold is exceeded, we could return ERESTART, which at least gets us out to userspace and allows signal delivery (like KILL/INT).

(2) In the slightly less adversarial case, sometimes a busy file might just have rapidly changing extattrs? (a) One common way we accommodate racy dynamic buffers is to size the request (fdi.indata->size) to double linux_list_len at each iteration.

(b) Another option would be to just move the retry: label down to line ca. 2320 and exponentially scale the provided buffer size (saving a FUSE IPC round-trip per attempt).

Both of (a) and (b) assume FUSE filesystems will not reject buffers that are not perfectly sized (on retry). Will FUSE filesystems reject that? If they are permitted to return short reads (handled below), I would be a little surprised if they rejected too large output buffers. But it is a possibility worth double-checking.

So my tl;dr suggestion is:

  1. Abort after N (5?) retries with some kind of return to userspace
  2. Allocate more than we need (at least on retry)
2340

suggest adding trailing "\n" to avoid mangling up some other kernel message

sys/fs/fuse/fuse_vnops.c
2325–2329

Good idea; I didn't think about the malicious case.

  1. Returning short data is allowed
  2. I don't think that skipping checking the required length on retry is a good idea, because the attribute could easily go from 1 to 1024 bytes, which would require many iterations if we blindly double the buffer size
  3. ERESTART seems like an appropriate error. I'll do that.
sys/fs/fuse/fuse_vnops.c
2325–2329

Cool.

(2) is a reasonable argument for that approach.

One refinement of pure exponential growth would be to just start the bidding at something like PAGE_SIZE. (Sure, extattr list could gow from PAGE_SIZE to 1024 * PAGE_SIZE too, but in the non-adversarial case it's probably less likely than 1->1000.) But I'm perfectly happy with either option.

Accomodate buggy or malicious fuse servers that always return ERANGE for
FUSE_LISTXATTR. Instead of restarting within the VOP; return ERESTART.
That way the blocked process will be killable.

This approach looks technically correct to me, and it simplifies the logic, but there is definitely some microbenchmark cost to exiting to userspace every retry instead of doing so "a few times" in the kernel (with some algorithm). But this is maybe just premature optimization on my part.

(Signals must be able to abort fdisp_wait_answ already, right? Otherwise a malicious FUSE filesystem could just not respond to tickets and tip over the kernel...)

This revision is now accepted and ready to land.Aug 28 2019, 12:50 AM
In D21287#466723, @cem wrote:

This approach looks technically correct to me, and it simplifies the logic, but there is definitely some microbenchmark cost to exiting to userspace every retry instead of doing so "a few times" in the kernel (with some algorithm). But this is maybe just premature optimization on my part.

Yeah, I choose the simpler solution because I didn't think that the race would be common enough to be worth optimizing for.

(Signals must be able to abort fdisp_wait_answ already, right? Otherwise a malicious FUSE filesystem could just not respond to tickets and tip over the kernel...)

SIGKILL is always allowed to interrupt fdisp_wait_answ. Whether other signals can depends on the -o intr mount option and whether the specific file system supports the FUSE_INTERRUPT operation.