unix: new implementation of unix/stream & unix/seqpacket [this is an updated version of d80a97def9a1, that had been reverted] Provide protocol specific pr_sosend and pr_soreceive for PF_UNIX SOCK_STREAM sockets and implement SOCK_SEQPACKET sockets as an extension of SOCK_STREAM. The change meets three goals: get rid of unix(4) specific stuff in the generic socket code, provide a faster and robust unix/stream sockets and bring unix/seqpacket much closer to specification. Highlights follow: - The send buffer now is truly bypassed. Previously it was always empty, but the send(2) still needed to acquire its lock and do a variety of tricks to be woken up in the right time while sleeping on it. Now the only two things we care about in the send buffer is the I/O sx(9) lock that serializes operations and value of so_snd.sb_hiwat, which we can read without obtaining a lock. The sleep of a send(2) happens on the mutex of the receive buffer of the peer. A bulk send/recv of data with large socket buffers will make both syscalls just bounce between owning the receive buffer lock and copyin(9)/copyout(9), no other locks would be involved. Since event notification mechanisms, such as select(2), poll(2) and kevent(2) use state of the send buffer to monitor writability, the new implementation provides protocol specific pr_sopoll and pr_kqfilter. The sendfile(2) over unix/stream is preserved, providing protocol specific pr_send and pr_sendfile_wait methods. - The implementation uses new mchain structure to manipulate mbuf chains. Note that this required converting to mchain two functions that are shared with unix/dgram: unp_internalize() and unp_addsockcred() as well as adding a new shared one uipc_process_kernel_mbuf(). This induces some non- functional changes in the unix/dgram code as well. There is a space for improvement here, as right now it is a mix of mchain and manually managed mbuf chains. - unix/seqpacket previously marked as PR_ADDR & PR_ATOMIC and thus treated as a datagram socket by the generic socket code, now becomes a true stream socket with record markers. - Note on aio(4). First problem with socket aio(4) is that it uses socket buffer locks for queueing and piggybacking on this locking it calls soreadable() and sowriteable() directly. Ideally it should use pr_sopoll() method. Second problem is that unlike a syscall, aio(4) wants a consistent uio structure upon return. This is incompatible with our speculative read optimization, so in case of aio(4) write we need to restore consistency of uio. At this point we workaround those problems on the side of unix(4), but ideally those workarounds should be socket aio(4) problem (not a first class citizen) rather than problem of unix(4), definitely a primary facility.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped - Build Status
Buildable 63811 Build 60695: arc lint + arc unit
Event Timeline
Comment Actions
I believe this is final version. All syzkallers finds are fixed. Writing
this from a desktop running the branch.
Comment Actions
Some performance data made on a hardware, however not 100% idle:
bulk 512k buffer: -43.0088% +/- 1.5984% bulk 128k buffer: -52.0892% +/- 2.13568% bulk 8k buffer (default): -69.9878% +/- 1.31744% ping/pong: -8.67937% +/- 3.41407%