sockets: remove the socket-on-stack hack from sorflush()
ClosedPublic
Actions

Authored by glebius on May 4 2022, 6:45 PM.

Details

Reviewers

markj

Group Reviewers

network

Commits

rGa982ce04428e: sockets: remove the socket-on-stack hack from sorflush()

Summary

The hack can be tracked down to 4.4BSD, where copy was performed
under splimp() and then after splx() dom_dispose was called.
Stevens has a chapter on this function, but he doesn't answer why
this trick is necessary. Why can't we call into dom_dispose under
splimp()? Anyway, with multithreaded kernel the hack seems to be
necessary to avoid LORs between socket buffer lock and different
filesystem locks, especially network file systems.

The new socket buffers KPI sbcut() from 1d2df300e9b allow us to get
rid of the hack.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

glebius created this revision.May 4 2022, 6:45 PM

Herald added a subscriber: imp. · View Herald TranscriptMay 4 2022, 6:45 PM

glebius requested review of this revision.May 4 2022, 6:45 PM

glebius added a parent revision: D35124: sockets: don't call dom_dispose() on a listening socket.

Harbormaster completed remote builds in B45470: Diff 105682.May 4 2022, 6:45 PM

Seems ok, but I'm still not sure about LORs. I would love to get rid of this hack though.

sys/kern/uipc_usrreq.c
2785	Isn't this the same as `m = sbcut_locked(sb, sb->sb_ccc);`?
2788	sbrelease_locked() asserts this already.
2792	So we still hold the IO lock here, and unp_dispose_mbuf() can close arbitrary descriptors, so a lot of code is reachable from this point (i.e., most implementations of fo_close). How can we be certain that there are no LORs possible here?

glebius added inline comments.May 4 2022, 7:47 PM

sys/kern/uipc_usrreq.c
2785	sbflush_internal() has a comment that suggests a problem exists here. It could be outdated, though.
2792	Good concern. I will update the revision.

glebius added inline comments.May 4 2022, 7:51 PM

sys/kern/uipc_usrreq.c
2788	sbrelease_locked() asserts this already.
2788	sbrelease_locked() asserts this already. It does so after calling sbcut(). We want to make sure socket buffer is empty after sbcut done by us.

markj added inline comments.May 4 2022, 8:09 PM

sys/kern/uipc_usrreq.c
2785	I suspect it is outdated. Right now the code doesn't even panic in that case, unless INVARIANTS is on.
2786

Drop the I/O lock earlier
Don't be afraid to call sbcut_internal() with 0 length

Harbormaster completed remote builds in B45474: Diff 105692.May 4 2022, 9:24 PM

markj accepted this revision as: markj.May 4 2022, 9:47 PM

markj added inline comments.

sys/kern/uipc_usrreq.c
2792	Actually, I don't see why we need to hold the io lock here at all. Can't it be dropped before calling pru_dispose? We marked the buffer with CANTRCVMORE already. Hmm, no, it seems that soreceive_generic() removes mbufs from the recv socket buffer after copying them out to userspace, and to do that it needs to drop the socket buffer mutex. That's surprising to me, to be honest.