Page MenuHomeFreeBSD

unix: new implementation of unix/stream & unix/seqpacket
Needs ReviewPublic

Authored by glebius on Feb 10 2025, 10:21 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, May 14, 11:50 AM
Unknown Object (File)
Wed, May 14, 10:42 AM
Unknown Object (File)
Fri, May 2, 9:19 PM
Unknown Object (File)
Fri, May 2, 8:19 PM
Unknown Object (File)
Apr 22 2025, 6:18 PM
Unknown Object (File)
Apr 22 2025, 6:17 PM
Unknown Object (File)
Apr 22 2025, 6:06 PM
Unknown Object (File)
Apr 22 2025, 6:05 PM

Details

Reviewers
markj
Group Reviewers
network
Summary
unix: new implementation of unix/stream & unix/seqpacket

[this is an updated version of d80a97def9a1, that had been reverted]

Provide protocol specific pr_sosend and pr_soreceive for PF_UNIX
SOCK_STREAM sockets and implement SOCK_SEQPACKET sockets as an extension
of SOCK_STREAM.  The change meets three goals: get rid of unix(4) specific
stuff in the generic socket code, provide a faster and robust unix/stream
sockets and bring unix/seqpacket much closer to specification.  Highlights
follow:

- The send buffer now is truly bypassed.  Previously it was always empty,
but the send(2) still needed to acquire its lock and do a variety of
tricks to be woken up in the right time while sleeping on it.  Now the
only two things we care about in the send buffer is the I/O sx(9) lock
that serializes operations and value of so_snd.sb_hiwat, which we can read
without obtaining a lock.  The sleep of a send(2) happens on the mutex of
the receive buffer of the peer.  A bulk send/recv of data with large
socket buffers will make both syscalls just bounce between owning the
receive buffer lock and copyin(9)/copyout(9), no other locks would be
involved.  Since event notification mechanisms, such as select(2), poll(2)
and kevent(2) use state of the send buffer to monitor writability, the new
implementation provides protocol specific pr_sopoll and pr_kqfilter.  The
sendfile(2) over unix/stream is preserved, providing protocol specific
pr_send and pr_sendfile_wait methods.

- The implementation uses new mchain structure to manipulate mbuf chains.
Note that this required converting to mchain two functions that are shared
with unix/dgram: unp_internalize() and unp_addsockcred() as well as adding
a new shared one uipc_process_kernel_mbuf().  This induces some non-
functional changes in the unix/dgram code as well.  There is a space for
improvement here, as right now it is a mix of mchain and manually managed
mbuf chains.

- unix/seqpacket previously marked as PR_ADDR & PR_ATOMIC and thus treated
as a datagram socket by the generic socket code, now becomes a true stream
socket with record markers.

- Note on aio(4).  First problem with socket aio(4) is that it uses socket
buffer locks for queueing and piggybacking on this locking it calls
soreadable() and sowriteable() directly.  Ideally it should use
pr_sopoll() method.  Second problem is that unlike a syscall, aio(4) wants
a consistent uio structure upon return.  This is incompatible with our
speculative read optimization, so in case of aio(4) write we need to
restore consistency of uio.   At this point we workaround those problems
on the side of unix(4), but ideally those workarounds should be socket
aio(4) problem (not a first class citizen) rather than problem of unix(4),
definitely a primary facility.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 63811
Build 60695: arc lint + arc unit

Event Timeline

I believe this is final version. All syzkallers finds are fixed. Writing
this from a desktop running the branch.

Updated with actual commit message I plan to use.

Some performance data made on a hardware, however not 100% idle:

bulk 512k buffer: -43.0088% +/- 1.5984%
bulk 128k buffer: -52.0892% +/- 2.13568%
bulk 8k buffer (default): -69.9878% +/- 1.31744%
ping/pong: -8.67937% +/- 3.41407%