Page MenuHomeFreeBSD

Implement linux_sendfile for the linuxulator

Authored by on Apr 15 2019, 6:01 PM.



Implement linux_sendfile for the linuxulator using FreeBSD's sendfile. FreeBSD's sendfile only works if the target descriptor is a socket stream. So linux_sendfile is only implemented for this case and currently doesn't work with any descriptor so it returns not implemented in the other cases.

Submitted by:
Bora Ozarslan
Yang Wang

Test Plan

Test linux_sendfile using linuxulator with these linux binaries both as 32 and 64 bit versions:

  • TCP client/server program: One uses sendfile to send file
  • socketpair: sends file from one end to another end
  • UDP client/server program: One uses sendfile to send file
  • Use sendfile to copy a file to another file.

In each case sendfile is tested multiple times using different parameters to check the correct error code is returned.

Tested it with linux binary of nginx which is the original use case.

DONE BY y2333wan

I repeated the tests above with my own (about 10) different C programs, and

Test linux_sendfile64. As far as I can understand this means have a 32-bit binary with Large File Support. I have a cross-compiled 32-bit linux binary with Large File Support.

I produced some cross-compiled versions of the programs that call sendfile64 and tested sending files by also repeating the procedures above.

Diff Detail

rS FreeBSD src repository - subversion
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

This generally looks good to me.

416 ↗(On Diff #56068)

I assume I'm missing something in the linux syscall implementation, but where is this version implemented?

1804 ↗(On Diff #56068)

It looks like there might be a whitespace consistency issue yet.

P.S. Not to be fixed in this commit, but this should really be using designated initializers (e.g. [LINUX_SENDFILE] = 4). added inline comments.
416 ↗(On Diff #56068)

I left this out to be corrected. I tried this and another version where I tried referencing linux_sendfile for sendfile64 but I can't seem to get it to work. It always crashes. I am open to any direction on how to properly implement the 64 bit version for 32 bit linux. There also shouldn't be any difference between the implementations, hence I tried just referencing linux_sendfile.

1804 ↗(On Diff #56068)

There are tabs before the second column in all instances. It looks fine using diff tools and on vim. This might just be phabricator not showing it correctly, unless my editor is doing something weird.

@dchagin Bora is back at school now but I will handle committing this after review is complete

@emaste, np )

1704 ↗(On Diff #56068)

int error prefered

1775 ↗(On Diff #56068)

missied fp, you need fdrop() marked 2 inline comments as done. edited the test plan for this revision. (Show Details) set the repository for this revision to rS FreeBSD src repository - subversion.

Implementing sendfile64. edited the summary of this revision. (Show Details) marked an inline comment as done.

Fixed a whitespace issue

We'll need to add support in other syscalls.master files too:

  • sys/amd64/linux/syscalls.master (amd64 64-bit)
  • sys/amd64/linux32/syscalls.master (amd64 32-bit, this one is done)
  • sys/arm/linux/syscalls.master (32-bit arm, not yet functional)
  • sys/arm64/linux/syscalls.master (64-bit arm)
  • sys/i386/linux/syscalls.master (i386 32-bit)

ah, so there are a number of linux_sendfiles / linux_sendfile64s in those already.
we need to get rid of the DUMMY(sendfile); for the archs though.

1612 ↗(On Diff #66808)

We don't seem to use the result of this call anywhere. I think it is unnecessary.

1612 ↗(On Diff #66808)

I will remove this call

added (edited) sys calls for amd64/linux, amd64/linux32, and especially arm/linux, arm64/linux, i386/linux. some of these folders already have the syscalls defined, in which case DUMMY were removed.

1578 ↗(On Diff #66850)

It seems we don't actually write to the offset?

1653 ↗(On Diff #66850)

this cast isn't right - we need something like:

l_loff_t offset;
int rv;

offset = arg->offset;
rv = linux_sendfile_common(..., &offset, ...);
// do something with offset
return (rv);
1639 ↗(On Diff #66850)

In Linux, sendfile() may also adjust the file's seek offset: "If offset is not NULL, then sendfile() does not modify the file offset of in_fd; otherwise the file offset is adjusted to reflect the number of bytes read from in_fd." In particular, if offset == NULL there is still some work that you need to do.

As a hint for how this should be fixed, read the manual page for the lseek() system call, which does nothing but adjust the seek offset for a file descriptor. Also take a look at its implementation in FreeBSD, in kern_lseek().

1663 ↗(On Diff #66850)

The way we are passing arg->offset (l_long) here is not right.

Suppose you are running a 32-bit Linux application on a 64-bit amd64 kernel. From the application's point of view, sizeof(offset) is 4 but from the kernel's point of view sizeof(offset) is 8. When linux_sendfile_common() uses copyout() to copy the returned offset back into userspace, it will always copy 8 bytes. 32-bit applications, which expect only 4 bytes, are going to experience memory corruption as a result.

1663 ↗(On Diff #66850)

arg->offset is l_loff_t*

417 ↗(On Diff #66850)

it looks like Linux's definition is sendfile64(..., __off64_t *offset, ...)

1663 ↗(On Diff #66850)

Sorry, I put the comment in the wrong place. The issue is with linux_sendfile(), where arg->offset is l_long_t *.

417 ↗(On Diff #66850)

l_loff_t * is equivalent. It would probably be clearer to define l_off64_t and use that instead, but l_loff_t is what we use elsewhere in this file.

1653 ↗(On Diff #66850)

ended up with some thing:

linux_sendfile(struct thread *td, struct linux_sendfile_args *arg)
	if (arg->offset == NULL)
		return linux_sendfile_common(td, arg->out, arg->in,
			NULL, arg->count);

	l_loff_t offset64;
	int ret;
	int error;
	l_long offset;

	error = copyin(arg->offset, &offset, sizeof(offset));
	if (error < 0)
		return (error);
	offset64 = (l_loff_t) offset;

	ret = linux_sendfile_common(td, arg->out, arg->in,
		&offset64, arg->count);

	offset = (l_long) offset64;
	error = copyout(&offset, arg->offset, sizeof(offset));
	if (error < 0)
		return (error);
	return (ret);
1578 ↗(On Diff #66850)

I will clarify the part according to mark's comment below

on 64-bit ABIs (amd64 linux, arm64) we have just linux_sendfile which has (64-bit) l_loff_t *offset.
on 32-bit ABIs (amd64 linux32, i386, arm) we have linux_sendfile which has a 32-bit *offset and linux_sendfile64 which has 64-bit *offset.

on 64-bit ABIs (amd64 linux, arm64) we have just linux_sendfile which has (64-bit) l_loff_t *offset.
on 32-bit ABIs (amd64 linux32, i386, arm) we have linux_sendfile which has a 32-bit *offset and linux_sendfile64 which has 64-bit *offset.

I think on 64bit platforms linux_sendfile has l_long offset, which is 64bit, which is same as l_loff_t
and yes on 32bit linux_sendfile has l_long, which is 32bit, but we force linux_sendfile64 to have a 64bit offser, which happens to be l_loff_t

Linux allows a maximum of (1<<31)-1 (MAX_NON_LFS) and will return EOVERFLOW if offset + count is greater.

Bora assumed offset to be 0 if no offset pointer is given. it is not true. in this case file descripter's reading offset will be used (in linux), which is the update. marked 6 inline comments as done and an inline comment as not done.Jan 17 2020, 7:51 PM

As a general note, please take a look at the FreeBSD C style manual. It is a man page: "man style".

1592 ↗(On Diff #66930)

kern_lseek() returns an error number or 0, not the offset.

It is an unusual interface in that it returns the offset by setting a field in the current thread's structure. Take a look at vn_seek() or shm_seek(), which implement the fo_seek() interface: the set td->td_uretoff.tdu_off to the offset, where td is the current thread (curthread, a magic global variable). You need to fetch the seek offset from there.

(In my opinion fo_seek() should return the offset in an output argument and kern_lseek() should set curthread->td_uretoff.tdu_off, but this is a separate discussion.)

1618 ↗(On Diff #66930)

Instead of using kern_lseek(), it is better to call fo_seek() directly, before fdrop()ing the file. Otherwise this function is doing three fget()s per call, which is unnecessary, and moreover it introduces some ugly error cases. For instance, suppose the process closes the fd after the fdrop() call but before the second kern_lseek() call. Then the second kern_lseek() call will fail, so sendfile() will return an error (EBADF) even though it technically succeeded. That is not a fatal problem, but it is better to avoid it.

In fact, using fo_seek() itself is a bit racy. Right now we have the following sequence of operations when offset == NULL:

  1. load current offset
  2. call fo_sendfile() with the current offset
  3. update the current offset

Suppose another thread changes the current offset between 1 and 3: the result of the call isn't really consistent. If you look at the fo_seek() implementations, like vn_seek(), you will see that foffset_lock() and foffset_unlock() are used to synchronize operations on the file's current offset. I think it is ok to ignore this problem, I am just mentioning it for completeness.

1657 ↗(On Diff #66930)

As a matter of style I think this function and the one below should contain only a single call to linux_sendfile_common(). That is, write:

if (offset != NULL)
ret = linux_sendfile_common()
if (offset != NULL)

At least, per FreeBSD style, local variable declarations always come before code.

1592 ↗(On Diff #66930)

Should I not use kern_lseek here as well and go with fo_seek?

1592 ↗(On Diff #66930)

Yes, I think you can fetch the current offset after the fget_read() call, using fo_seek().

This looks ok to me, just a few nits.

1589 ↗(On Diff #67051)

The indentation looks wrong here, like there's a mix of tabs and spaces being used.

1613 ↗(On Diff #67051)

fo_sendfile() returns a positive error number (see sys/sys/errno.h).

1625 ↗(On Diff #67051)

Style: there should be no space after the cast.

1680 ↗(On Diff #67051)

Style: there should be no space after the cast.

markj added inline comments.
1581 ↗(On Diff #67110)

I think this comment isn't useful.

1657 ↗(On Diff #67110)

I think these comments can be deleted or at least written more concisely.

This revision is now accepted and ready to land.Jan 24 2020, 6:47 PM

I edited the comments (without really making semantic changes) to:

 * Differences between FreeBSD and Linux sendfile:
 * - Linux doesn't send anything when count is 0 (FreeBSD uses 0 to
 *   mean send the whole file.)  In linux_sendfile given fds are still
 *   checked for validity when the count is 0.
 * - Linux can send to any fd whereas FreeBSD only supports sockets.
 *   The same restriction follows for linux_sendfile.
 * - Linux doesn't have an equivalent for FreeBSD's flags and sf_hdtr.
 * - Linux takes an offset pointer and updates it to the read location.
 *   FreeBSD takes in an offset and a 'bytes read' parameter which is
 *   only filled if it isn't NULL.  We use this parameter to update the
 *   offset pointer if it exists.
 * - Linux sendfile returns bytes read on success while FreeBSD
 *   returns 0.  We use the 'bytes read' parameter to get this value.

Perhaps still too verbose, will edit further.

This revision was automatically updated to reflect the committed changes.