Page MenuHomeFreeBSD

fusefs: fix list size calculation bug in fuse_vnop_listextattr
ClosedPublic

Authored by asomers on Aug 16 2019, 4:50 AM.

Details

Summary

fusefs: fix list size calculation bug in fuse_vnop_listextattr

A small error in r338152 let to the returned size always being exactly eight
bytes too large.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

asomers created this revision.Aug 16 2019, 4:50 AM
asomers planned changes to this revision.Aug 16 2019, 4:56 AM

Ohh, I just realized the purpose of line 2321. I'll fix the review in the morning.

sys/fs/fuse/fuse_vnops.c
2312 ↗(On Diff #60868)

This was the error. list_xattr_in->size refers to the size of the attribute list, not the size of the FUSE transfer. Though neither fuse_kernel.h, the linux kernel, nor libfuse document it, the intention is clear from reading libfuse's passthrough example.

cem added inline comments.Aug 16 2019, 4:56 AM
sys/fs/fuse/fuse_vnops.c
2320–2321 ↗(On Diff #60868)

Do we not still need this?

asomers updated this revision to Diff 60915.Aug 16 2019, 9:26 PM

Handle a race condition in fuse_vnop_listextattr

fuse_vnop_listextattr is fundamentally racy. Due to FreeBSD/Linux API
differences and the fact that FUSE was written for Linux, we must always
send FUSE_LISTXATTR twice; once to get the length of the attribute list, and
once to get the list. That presents an obvious race.

This update fixes three things:

  • Retry if the server returns ERANGE, indicating the list has grown
  • Save some RAM if the list has shrunk (prior to this DR the code already did that).
  • Warn if the server illegally lengthens the list
cem added a comment.EditedAug 26 2019, 9:04 PM

Sorry about the long delay — life has been busy.

This change looks like a step in the right direction, thanks! I added some suggestions below.

I didn't look at the test changes; I don't have any familiarity with googletest (or the C++ constructs used), so I'll take your word for those.

sys/fs/fuse/fuse_vnops.c
2326–2330 ↗(On Diff #60915)

(1) Hm. This seems like a vector where an adversarial userspace FS could cause a kernel infinite loop, preventing victim process termination. Maybe we should limit the number of retries in some way?

If some threshold is exceeded, we could return ERESTART, which at least gets us out to userspace and allows signal delivery (like KILL/INT).

(2) In the slightly less adversarial case, sometimes a busy file might just have rapidly changing extattrs? (a) One common way we accommodate racy dynamic buffers is to size the request (fdi.indata->size) to double linux_list_len at each iteration.

(b) Another option would be to just move the retry: label down to line ca. 2320 and exponentially scale the provided buffer size (saving a FUSE IPC round-trip per attempt).

Both of (a) and (b) assume FUSE filesystems will not reject buffers that are not perfectly sized (on retry). Will FUSE filesystems reject that? If they are permitted to return short reads (handled below), I would be a little surprised if they rejected too large output buffers. But it is a possibility worth double-checking.

So my tl;dr suggestion is:

  1. Abort after N (5?) retries with some kind of return to userspace
  2. Allocate more than we need (at least on retry)
2339 ↗(On Diff #60915)

suggest adding trailing "\n" to avoid mangling up some other kernel message

asomers added inline comments.Aug 26 2019, 9:16 PM
sys/fs/fuse/fuse_vnops.c
2326–2330 ↗(On Diff #60915)

Good idea; I didn't think about the malicious case.

  1. Returning short data is allowed
  2. I don't think that skipping checking the required length on retry is a good idea, because the attribute could easily go from 1 to 1024 bytes, which would require many iterations if we blindly double the buffer size
  3. ERESTART seems like an appropriate error. I'll do that.
cem added inline comments.Aug 26 2019, 9:24 PM
sys/fs/fuse/fuse_vnops.c
2326–2330 ↗(On Diff #60915)

Cool.

(2) is a reasonable argument for that approach.

One refinement of pure exponential growth would be to just start the bidding at something like PAGE_SIZE. (Sure, extattr list could gow from PAGE_SIZE to 1024 * PAGE_SIZE too, but in the non-adversarial case it's probably less likely than 1->1000.) But I'm perfectly happy with either option.

asomers updated this revision to Diff 61374.Aug 27 2019, 9:02 PM

Accomodate buggy or malicious fuse servers that always return ERANGE for
FUSE_LISTXATTR. Instead of restarting within the VOP; return ERESTART.
That way the blocked process will be killable.

cem accepted this revision.Aug 28 2019, 12:50 AM

This approach looks technically correct to me, and it simplifies the logic, but there is definitely some microbenchmark cost to exiting to userspace every retry instead of doing so "a few times" in the kernel (with some algorithm). But this is maybe just premature optimization on my part.

(Signals must be able to abort fdisp_wait_answ already, right? Otherwise a malicious FUSE filesystem could just not respond to tickets and tip over the kernel...)

This revision is now accepted and ready to land.Aug 28 2019, 12:50 AM
In D21287#466723, @cem wrote:

This approach looks technically correct to me, and it simplifies the logic, but there is definitely some microbenchmark cost to exiting to userspace every retry instead of doing so "a few times" in the kernel (with some algorithm). But this is maybe just premature optimization on my part.

Yeah, I choose the simpler solution because I didn't think that the race would be common enough to be worth optimizing for.

(Signals must be able to abort fdisp_wait_answ already, right? Otherwise a malicious FUSE filesystem could just not respond to tickets and tip over the kernel...)

SIGKILL is always allowed to interrupt fdisp_wait_answ. Whether other signals can depends on the -o intr mount option and whether the specific file system supports the FUSE_INTERRUPT operation.

This revision was automatically updated to reflect the committed changes.