Page MenuHomeFreeBSD

unix/stream: fix a race with MSG_PEEK on SOCK_SEQPACKET with MSG_EOR
ClosedPublic

Authored by glebius on Nov 7 2025, 2:22 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Dec 3, 3:44 PM
Unknown Object (File)
Mon, Dec 1, 10:54 AM
Unknown Object (File)
Tue, Nov 25, 5:18 AM
Unknown Object (File)
Mon, Nov 24, 10:22 PM
Unknown Object (File)
Mon, Nov 24, 9:07 PM
Unknown Object (File)
Mon, Nov 24, 4:42 PM
Unknown Object (File)
Sun, Nov 23, 10:40 AM
Unknown Object (File)
Sat, Nov 22, 7:26 PM
Subscribers

Details

Summary

The pr_soreceive method first scans the buffer holding the both I/O sx(9)
and socket buffer mutex(9) and after figuring out how much needs to be
copied out drops the mutex. Since the other side may only append to the
buffer, it is safe to continue the operation holding the sx(9) only.
However, the code had a bug that it used pointer in the very last mbuf as
marker of the place where to stop. This worked both in a case when we
drain a buffer completely (marker points at NULL) and in a case when we
wanted to stop at MSG_EOR (marker points at next mbuf after MSG_EOR).
However, this pointer is not consistent after we dropped the socket buffer
mutex.

Rewrite the logic to use the data length as bounds for the copyout cycle.

Provide a test case that reproduces the race. Note that the race is very
hard to hit, thus test will pass on unmodified kernel as well. In a
virtual machine I needed to add tsleep(9) for 10 nanoseconds into the
middle of function to be able to reproduce.

PR: 290658
Fixes: d15792780760ef94647af9b377b5f0a80e1826bc

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

sys/kern/uipc_usrreq.c
1499

I suspect it's possible to have last == NULL here, if there is no data in the socket buffer.

sys/kern/uipc_usrreq.c
1499

Cycle starts with m = last = first, so last is always initialized.

sys/kern/uipc_usrreq.c
1499

But how do you know that first != NULL?

sys/kern/uipc_usrreq.c
1499

You a right, this can happen with presence with presence of control data. I will update the patch.

sys/kern/uipc_usrreq.c
1632

I think we only want to do this subtraction if !peek is true.

Fix pointer for partial mbuf in peek case.

sys/kern/uipc_usrreq.c
1642

Here we do not hold any locks... is it really safe to rely on this m != part test? That is, we are assuming that part will stay resident in the socket buffer, but I do not see anything which guarantees that.

1644

I would assert m->m_len <= datalen in the loop body.

  • Keep the error case freeing loop under the lock.
  • Assert before subtracting.
sys/kern/uipc_usrreq.c
1642

I still don't really like it. The mbuf pointer is formally protected by the sockbuf lock, not the I/O lock. The I/O lock is just there to serialize readers. Something else could free mbufs from the socket buffer in principle.

Is there a better way?

sys/kern/uipc_usrreq.c
1642

The formally is true for generic socket code. For this particular kind of socket holding I/O lock guarantees that existing mbufs, previously observed with sockbuf lock, they won't go away. This is not true for tail of the queue, it may grow, hence the race you discovered and this patch.

This revision is now accepted and ready to land.Thu, Nov 13, 7:28 PM