Page MenuHomeFreeBSD

Implement cycle-detecting garbage collector for AF_UNIX sockets
ClosedPublic

Authored by jah on Jan 12 2020, 4:58 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 4, 3:45 PM
Unknown Object (File)
Tue, Dec 17, 11:25 AM
Unknown Object (File)
Sat, Dec 14, 5:23 PM
Unknown Object (File)
Dec 5 2024, 12:14 AM
Unknown Object (File)
Nov 8 2024, 1:51 AM
Unknown Object (File)
Oct 14 2024, 9:52 PM
Unknown Object (File)
Oct 13 2024, 10:19 PM
Unknown Object (File)
Oct 3 2024, 3:28 PM
Subscribers

Details

Summary

The existing AF_UNIX socket garbage collector destroys any socket
which may potentially be in a cycle, as indicated by its file reference
count being equal to its enqueue count. However, this can produce false
positives for in-flight sockets which aren't part of a cycle but are
part of one or more SCM_RIGHTS mssages and which have been closed
on the sending side. If the garbage collector happens to run at
exactly the wrong time, destruction of these sockets will render them
unusable on the receiving side, such that no previously-written data
may be read.

This change rewrites the garbage collector to precisely detect cycles:

  1. The existing check of msgcount==f_count is still used to determine whether the socket is potentially in a cycle.
  2. The socket is now placed on a local "dead list", which is used to reduce iteration time (and therefore contention on the global unp_link_rwlock).
  3. The first pass through the dead list removes each potentially-dead socket's outgoing references from the socket graph, using a gc-specific copy of the original reference count.
  4. The second series of passes through the dead list removes from the list any socket whose GC refcount is non-zero, as these sockets remain accessible outside of any possible cycle. Iteration is repeated until no further sockets are removed from the dead list.
  5. Sockets remaining in the dead list are destroyed as before.

PR: 227285
Submitted by: jan.kokemueller@gmail.com (prior version)

Test Plan

Test programs attached to bug 227285 and 239803 now run indefinitely.
net.local.sockcount indicates no leaked sockets.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 28630
Build 26662: arc lint + arc unit

Event Timeline

sys/kern/uipc_usrreq.c
2584–2585

This can be turned into a KASSERT on !(unp_gcflag & ~UNPGC_IGNORE_RIGHTS). Otherwise it can unnecessarily dirty a lot of cachelines.

Replace assignment of unp_gcflag with KASSERT

This mostly looks right to me. Would you be willing to add some basic regression tests for the garbage collector to tests/sys/kern/unix_passfd_test.c?

sys/kern/uipc_usrreq.c
2498

The preferred style is an explicit comparison with 0, i.e., (unp->unp_gcflag & UNPGC_DEAD) == 0.

2500

It looks like this update is serialized by virtue of the fact that it is only ever done by the one taskqueue thread. We should add an assertion to that effect.

2518

Ditto regarding an assertion that these updates aren't racy.

2580

This comment is a bit stale, there is no state to clear anymore.

2584–2585

Sorry, can you explain why? uipc_dispose is called before the socket is removed from the global lists.

This mostly looks right to me. Would you be willing to add some basic regression tests for the garbage collector to tests/sys/kern/unix_passfd_test.c?

Of course. I think @pho already integrated Jan's test program from PR 227285 (which does exercise the gc) into his stress-test branch. It seems like that should just go into tests/ now. Anything else you'd like to see?

sys/kern/uipc_usrreq.c
2500

Yes, combined with the fact that the link lock is held as a reader by the gc task, while all other paths that update these fields hold it as a writer.

2584–2585

We were unconditionally writing unp_gcflag in a cacheline-aligned struct for all active sockets. IMO it seems reasonably likely that the portion of still-still active sockets that have had unp_dispose() called on them will be fairly small, just as it seems likely the portion of potentially-cyclic sockets will be small.

sys/kern/uipc_usrreq.c
2584–2585

"still-still active" -> "still-active". Haven't had my coffee yet.
Also worth noting that unp_gcflag is still on a separate (64-byte) cacheline from everything else modified during the gc's scanning phase. I haven't changed that here, since the 64-bit version of struct unpcb is already at 248 bytes with these changes, and I'd like to leave a little room for further expansion.

sys/kern/uipc_usrreq.c
2500

Dumb question: what's the best way to check that a specific taskqueue callout is active on the current thread? I'd naively either use taskqueue_member(taskqueue_thread, curthread) or stash off curthread in unp_gc and check it in these callbacks for INVARIANTS builds, but is there a better way?

In D23142#507529, @jah wrote:

This mostly looks right to me. Would you be willing to add some basic regression tests for the garbage collector to tests/sys/kern/unix_passfd_test.c?

Of course. I think @pho already integrated Jan's test program from PR 227285 (which does exercise the gc) into his stress-test branch. It seems like that should just go into tests/ now. Anything else you'd like to see?

If there's already a test case in Peter's stress2, then that seems sufficient to me. IMO stress2 is already the right place for test cases which attempt to load the system and provoke race conditions. tests/ is used for component testing in practice and its tests should aim to run quickly. That said, it would be nice to have a simple tests/ test case that validates the scenario described in the review description, where a socket is closed after it is sent as part of an SCM_RIGHTS message but before it is externalized into the receiving process.

sys/kern/uipc_usrreq.c
2500

I was thinking that checking taskqueue_member(curthread) would be fine. I don't really have a better suggestion.

2584–2585

I see now. My concern was about correctness, but it was because I had missed the "~" in the assertion condition. Sorry for the noise.

In D23142#507529, @jah wrote:

This mostly looks right to me. Would you be willing to add some basic regression tests for the garbage collector to tests/sys/kern/unix_passfd_test.c?

Of course. I think @pho already integrated Jan's test program from PR 227285 (which does exercise the gc) into his stress-test branch. It seems like that should just go into tests/ now. Anything else you'd like to see?

If there's already a test case in Peter's stress2, then that seems sufficient to me. IMO stress2 is already the right place for test cases which attempt to load the system and provoke race conditions. tests/ is used for component testing in practice and its tests should aim to run quickly. That said, it would be nice to have a simple tests/ test case that validates the scenario described in the review description, where a socket is closed after it is sent as part of an SCM_RIGHTS message but before it is externalized into the receiving process.

I saved Jan's test program as https://svnweb.freebsd.org/base/user/pho/stress2/misc/socketpair3.sh?revision=333634&view=markup
I have been running tests with D23142.66673.diff for 3 hours without finding any problems.

Code review feedback from markj

jah marked 3 inline comments as done.Jan 14 2020, 6:13 PM
sys/kern/uipc_usrreq.c
2594

To be pedantic, we are looking for unreachable sockets in a cycle.

2612

I can't quite see why this is true. Suppose I create a socket pair <s1, s2> and send s2 over s1 to s2. Then I close s1 and s2. s2's receive buffer holds an internalized reference to s2 and the socket is orphaned so we need GC to clean it up, but unp_unreachable will be 1. Of course, as soon as a second socket is leaked that way we will find and dispose of both of them, so it should not really be a problem in practice.

2668

Perhaps assert that UNPGC_DEAD is set?

sys/kern/uipc_usrreq.c
2612

I think that's right. I very nearly deleted this check when writing the patch anyway, because it's not very clean. So let's just get rid of it.

LGTM aside from the pending comments. Thanks for picking this up and thanks to Jan for the investigation and initial patch.

This revision is now accepted and ready to land.Jan 15 2020, 6:21 PM
This revision now requires review to proceed.Jan 16 2020, 7:27 AM
This revision is now accepted and ready to land.Jan 16 2020, 2:23 PM