Page MenuHomeFreeBSD

PR 194264: Fix race between unp_dispose and unp_gc
ClosedPublic

Authored by cem on Jul 9 2015, 10:24 PM.

Details

Summary

Minor cleanup of MJG's patch on Current list:
https://lists.freebsd.org/pipermail/freebsd-current/2015-July/056481.html

unp_dispose and unp_gc can race to teardown the same mbuf chains, which can
lead to dereferencing freed pointers.

To serialize against unp_gc, unp_dispose needs the socket object. Change the dom_dispose() KPI to take a socket object instead of mbuf chain directly.

Unix domain socket code serializes against unp_gc
with a flag on the unpcb under UNP_LIST_LOCK.

Test Plan

https://people.freebsd.org/~mjg/reproducers/unp-gc-panic.c

$ kyua test -k Kyuafile.auto unix_seqpacket_test
...
41/41 passed (0 failed)

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

cem retitled this revision from to PR 194264: Fix race between unp_dispose and unp_gc.
cem updated this object.
cem edited the test plan for this revision. (Show Details)
cem added reviewers: markj, mjg.

Adding a unp_dispose2 hook which accepts a different parameter is the conservative approach. It would be nice if it could assert that the domain does not register both old and the new one.

Other thing to note here is that unp_dispose is the only in-tree handler we got, so it may make sense to just change struct domain so that dom_dispose grabs the socket instead and be done with it.

However, ownership of the code is unclear and I'm not making the call.

Finally, while the patch seems fine (apart from being a hack), a deeper review is in order. In particular I'm not confindent unp_gc wont be able to detect the socket as "unreachable" as we are closing it, making us race the other way.

In D3044#60027, @mjg wrote:

Adding a unp_dispose2 hook which accepts a different parameter is the conservative approach. It would be nice if it could assert that the domain does not register both old and the new one.

Other thing to note here is that unp_dispose is the only in-tree handler we got, so it may make sense to just change struct domain so that dom_dispose grabs the socket instead and be done with it.

Oh, ok. That seems reasonable to me. I'll change that.

However, ownership of the code is unclear and I'm not making the call.

Finally, while the patch seems fine (apart from being a hack), a deeper review is in order. In particular I'm not confindent unp_gc wont be able to detect the socket as "unreachable" as we are closing it, making us race the other way.

I think we are ok here. Because we bail out of unp_gc_process() early on UNPGC_IGNORE, we never increment unp_unreachable or set UNPGC_DEAD on the socket. unp_gc will leave such sockets alone.

cem edited edge metadata.

Just change existing dom_dispose KPI rather than adding dom_dispose2; only Unix
sockets use dom_dispose method in-tree.

s/unp_dispose2/unp_dispose_so/

cem edited the test plan for this revision. (Show Details)
sys/sys/unpcb.h
109 ↗(On Diff #6833)

This should probably be 0x8?

sys/sys/unpcb.h
109 ↗(On Diff #6835)

Yep, whoops.

cem edited the test plan for this revision. (Show Details)

Fix flag value to be distinct

cem marked an inline comment as done.Jul 10 2015, 1:17 AM
sys/kern/uipc_socket.c
2364 ↗(On Diff #6836)

This comment doesn't really apply anymore.

Now we're invoking dom_dispose without a lock held on the corresponding socket buffer, and that's presumably unsafe. And we probably can't just hold the buffer lock across the call to dom_dispose(), e.g. because unp_discard() does an M_WAITOK allocation.

sys/kern/uipc_socket.c
2364 ↗(On Diff #6836)

Now we're invoking dom_dispose without a lock held on the corresponding socket buffer, and that's presumably unsafe.

We were before too. I don't think it's any more unsafe than it was before.

The only new behavior in unp_dispose_so is (1) accessing so_pcb with sotounpcb. I don't see a lock annotation on that struct member and it seems reasonably something that might have the same lifetime as the socket itself.

And, (2) setting a unp_gcflag under the UNP_LIST_LOCK.

It seems safe so long as all users of dom_dispose are aware of what they're getting. In-tree, that's just Unix sockets.

cem planned changes to this revision.Jul 10 2015, 1:47 AM
cem added inline comments.
sys/kern/uipc_socket.c
2364 ↗(On Diff #6836)

This comment doesn't really apply anymore.

Hm... Do we need the SOCKBUF_LOCK or sblock to access sb_mb?

Anyway, you're right, this isn't quite right... the bzero(&sb->sb_startzero) will clear sb_mb on the original so object and unp_dispose won't have much to do.

Hmm.

sys/kern/uipc_socket.c
2364 ↗(On Diff #6836)

mjg proposes just faking the entire so instead of the recv sb: https://lists.freebsd.org/pipermail/freebsd-current/2015-July/056521.html

Fake the entire socket obj in sorflush() for dom_dispose API change.

sys/kern/uipc_usrreq.c
2197 ↗(On Diff #6845)

Are you sure it's also correct to skip scanning sockets in our accept queue (line 2225)?

sys/kern/uipc_usrreq.c
2197 ↗(On Diff #6845)

Yes. The IGNORE flag just means that someone else (unp_dispose) is taking over for cleanup on this socket and we don't want the GC to do it for us. GC will not tear down anything marked IGNORE.

sys/kern/uipc_usrreq.c
2197 ↗(On Diff #6845)

unp_dispose doesn't look at the accept queue, which is my reason for asking. This change is saying "also ignore sockets in my accept queue."

However, I think it's safe anyway - if a socket is in an accept queue, it should still be attached, in which case it'll be visited by the GC at some other point.

sys/kern/uipc_usrreq.c
2197 ↗(On Diff #6845)

I don't understand the concern here. dom_dispose is called by the socket teardown logic or by GC -> sorflush.

Are you concerned we will accidentally dispose of sockets in the accept queue? Or that we will accidentally not dispose of them? I don't follow.

If a socket is in an accept queue it probably should not be in sofree() -> dom_dispose().

sys/kern/uipc_usrreq.c
2197 ↗(On Diff #6845)

Let's say there's three sockets: A, B and C. Suppose B is in A's accept queue, and C is in B's receive buffer. Previously, when the GC visited A, it would look at B's rcvbuf and mark C. Now, if UNPGC_IGNORE is set on A, it won't. So the GC could conceivably call sorflush() -> dom_dispose() on C even though C is reachable via A -> B -> C, which seems like an unintentional side-effect of this change.

It's probably not an issue though, because if B is in A's accept queue, I believe B must still be in one of the global unp socket lists, so the GC will find B and mark C. So, sorry for the noise.

sys/kern/uipc_usrreq.c
2197 ↗(On Diff #6845)

If a socket is tagged IGNORE, it really isn't reachable — it's going away immediately. I think collecting C (and B) in the scenario described is a valid effect, if there are no other references to B or C.

Revise unp_gc_process to only skip scanning rights for UNPGC_IGNORE sockets.
Otherwise, chase references and tag them like before. This is needed because
accept sockets can get into a half-shutdown state where they still refer to
acceptable connections, which should not be collected.

Also, rename UNPGC_IGNORE to IGNORE_RIGHTS to more closely match the intended
behavior.

Be careful not to scan accept queue sockets for RIGHTS if they are
UNPGC_IGNORE_RIGHTS.

sys/kern/uipc_usrreq.c
2231 ↗(On Diff #6849)

This check is redundant - you can't connect to a unix domain socket on a non-PF_LOCAL socket.

Drop excess localdomain check, thanks markj@.

markj edited edge metadata.
This revision is now accepted and ready to land.Jul 12 2015, 11:58 PM
This revision was automatically updated to reflect the committed changes.

I dont see this patch in stable 10. Is there a reason this diff was not committed to stable 10 ? I am having a conflict when I try to merge https://reviews.freebsd.org/rS303855 to stable 10 ?

I dont see this patch in stable 10. Is there a reason this diff was not committed to stable 10 ? I am having a conflict when I try to merge https://reviews.freebsd.org/rS303855 to stable 10 ?

It breaks the ABI and so cannot be merged as-is. I will MFC a modified version of this change that does not break the ABI.