socket: Implement SO_SPLICE
This is a feature which allows one to splice two TCP sockets together
such that data which arrives on one socket is automatically pushed into
the send buffer of the spliced socket. This can be used to make TCP
proxying more efficient as it eliminates the need to copy data into and
out of userspace.
The interface is copied from OpenBSD, and this implementation aims to be
compatible. Splicing is enabled by setting the SO_SPLICE socket option.
When spliced, data that arrives on the receive buffer is automatically
forwarded to the other socket. In particular, splicing is a
unidirectional operation; to splice a socket pair in both directions,
SO_SPLICE needs to be applied to both sockets. More concretely, when
setting the option one passes the following struct:
struct splice { int fd; off_t max; struct timveval idle; };
where "fd" refers to the socket to which the first socket is to be
spliced, and two setsockopt(SO_SPLICE) calls are required to set up a
bi-directional splice.
select(), poll() and kevent() do not return when data arrives in the
receive buffer of a spliced socket, as such data is expected to be
removed automatically once space is available in the corresponding send
buffer. Userspace can perform I/O on spliced sockets, but it will be
unpredictably interleaved with splice I/O.
A splice can be configured to unsplice once a certain number of bytes
have been transmitted, or after a given time period. Once unspliced,
the socket behaves normally from userspace's perspective. The number of
bytes transmitted via the splice can be retrieved using
getsockopt(SO_SPLICE); this works after unsplicing as well, up until the
socket is closed or spliced again. Userspace can also manually trigger
unsplicing by splicing to -1.
Splicing work is handled by dedicated threads, similar to KTLS. A
worker thread is assigned at splice creation time. At some point it
would be nice to have a direct dispatch mode, wherein the thread which
places data into a receive buffer is also responsible for pushing it
into the sink, but this requires tighter integration with the protocol
stack in order to avoid reentrancy problems.
Currently, sowakeup() and related functions will signal the worker
thread assigned to a spliced socket. so_splice_xfer() does the hard
work of moving data between socket buffers.
Co-authored by: gallatin
Reviewed by: brooks (interface bits)
MFC after: 3 months
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D46411