Page MenuHomeFreeBSD

Implement cycle-detecting garbage collector for AF_UNIX sockets
ClosedPublic

Authored by jah on Jan 12 2020, 4:58 AM.

Details

Summary

The existing AF_UNIX socket garbage collector destroys any socket
which may potentially be in a cycle, as indicated by its file reference
count being equal to its enqueue count. However, this can produce false
positives for in-flight sockets which aren't part of a cycle but are
part of one or more SCM_RIGHTS mssages and which have been closed
on the sending side. If the garbage collector happens to run at
exactly the wrong time, destruction of these sockets will render them
unusable on the receiving side, such that no previously-written data
may be read.

This change rewrites the garbage collector to precisely detect cycles:

  1. The existing check of msgcount==f_count is still used to determine whether the socket is potentially in a cycle.
  2. The socket is now placed on a local "dead list", which is used to reduce iteration time (and therefore contention on the global unp_link_rwlock).
  3. The first pass through the dead list removes each potentially-dead socket's outgoing references from the socket graph, using a gc-specific copy of the original reference count.
  4. The second series of passes through the dead list removes from the list any socket whose GC refcount is non-zero, as these sockets remain accessible outside of any possible cycle. Iteration is repeated until no further sockets are removed from the dead list.
  5. Sockets remaining in the dead list are destroyed as before.

PR: 227285
Submitted by: jan.kokemueller@gmail.com (prior version)

Test Plan

Test programs attached to bug 227285 and 239803 now run indefinitely.
net.local.sockcount indicates no leaked sockets.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

jah created this revision.Jan 12 2020, 4:58 AM
jah added inline comments.Jan 12 2020, 7:49 PM
sys/kern/uipc_usrreq.c
2587 ↗(On Diff #66643)

This can be turned into a KASSERT on !(unp_gcflag & ~UNPGC_IGNORE_RIGHTS). Otherwise it can unnecessarily dirty a lot of cachelines.

jah updated this revision to Diff 66673.Jan 13 2020, 6:15 AM

Replace assignment of unp_gcflag with KASSERT

markj added a comment.Jan 13 2020, 2:37 PM

This mostly looks right to me. Would you be willing to add some basic regression tests for the garbage collector to tests/sys/kern/unix_passfd_test.c?

sys/kern/uipc_usrreq.c
2587 ↗(On Diff #66643)

Sorry, can you explain why? uipc_dispose is called before the socket is removed from the global lists.

2498 ↗(On Diff #66673)

The preferred style is an explicit comparison with 0, i.e., (unp->unp_gcflag & UNPGC_DEAD) == 0.

2500 ↗(On Diff #66673)

It looks like this update is serialized by virtue of the fact that it is only ever done by the one taskqueue thread. We should add an assertion to that effect.

2518 ↗(On Diff #66673)

Ditto regarding an assertion that these updates aren't racy.

2580 ↗(On Diff #66673)

This comment is a bit stale, there is no state to clear anymore.

jah added a subscriber: pho.Jan 13 2020, 4:51 PM

This mostly looks right to me. Would you be willing to add some basic regression tests for the garbage collector to tests/sys/kern/unix_passfd_test.c?

Of course. I think @pho already integrated Jan's test program from PR 227285 (which does exercise the gc) into his stress-test branch. It seems like that should just go into tests/ now. Anything else you'd like to see?

sys/kern/uipc_usrreq.c
2587 ↗(On Diff #66643)

We were unconditionally writing unp_gcflag in a cacheline-aligned struct for all active sockets. IMO it seems reasonably likely that the portion of still-still active sockets that have had unp_dispose() called on them will be fairly small, just as it seems likely the portion of potentially-cyclic sockets will be small.

2500 ↗(On Diff #66673)

Yes, combined with the fact that the link lock is held as a reader by the gc task, while all other paths that update these fields hold it as a writer.

jah added inline comments.Jan 13 2020, 5:13 PM
sys/kern/uipc_usrreq.c
2587 ↗(On Diff #66643)

"still-still active" -> "still-active". Haven't had my coffee yet.
Also worth noting that unp_gcflag is still on a separate (64-byte) cacheline from everything else modified during the gc's scanning phase. I haven't changed that here, since the 64-bit version of struct unpcb is already at 248 bytes with these changes, and I'd like to leave a little room for further expansion.

jah added inline comments.Jan 13 2020, 6:36 PM
sys/kern/uipc_usrreq.c
2500 ↗(On Diff #66673)

Dumb question: what's the best way to check that a specific taskqueue callout is active on the current thread? I'd naively either use taskqueue_member(taskqueue_thread, curthread) or stash off curthread in unp_gc and check it in these callbacks for INVARIANTS builds, but is there a better way?

markj added a comment.Jan 13 2020, 7:56 PM
In D23142#507529, @jah wrote:

This mostly looks right to me. Would you be willing to add some basic regression tests for the garbage collector to tests/sys/kern/unix_passfd_test.c?

Of course. I think @pho already integrated Jan's test program from PR 227285 (which does exercise the gc) into his stress-test branch. It seems like that should just go into tests/ now. Anything else you'd like to see?

If there's already a test case in Peter's stress2, then that seems sufficient to me. IMO stress2 is already the right place for test cases which attempt to load the system and provoke race conditions. tests/ is used for component testing in practice and its tests should aim to run quickly. That said, it would be nice to have a simple tests/ test case that validates the scenario described in the review description, where a socket is closed after it is sent as part of an SCM_RIGHTS message but before it is externalized into the receiving process.

sys/kern/uipc_usrreq.c
2587 ↗(On Diff #66643)

I see now. My concern was about correctness, but it was because I had missed the "~" in the assertion condition. Sorry for the noise.

2500 ↗(On Diff #66673)

I was thinking that checking taskqueue_member(curthread) would be fine. I don't really have a better suggestion.

pho added a comment.Jan 13 2020, 8:12 PM
In D23142#507529, @jah wrote:

This mostly looks right to me. Would you be willing to add some basic regression tests for the garbage collector to tests/sys/kern/unix_passfd_test.c?

Of course. I think @pho already integrated Jan's test program from PR 227285 (which does exercise the gc) into his stress-test branch. It seems like that should just go into tests/ now. Anything else you'd like to see?

If there's already a test case in Peter's stress2, then that seems sufficient to me. IMO stress2 is already the right place for test cases which attempt to load the system and provoke race conditions. tests/ is used for component testing in practice and its tests should aim to run quickly. That said, it would be nice to have a simple tests/ test case that validates the scenario described in the review description, where a socket is closed after it is sent as part of an SCM_RIGHTS message but before it is externalized into the receiving process.

I saved Jan's test program as https://svnweb.freebsd.org/base/user/pho/stress2/misc/socketpair3.sh?revision=333634&view=markup
I have been running tests with D23142.66673.diff for 3 hours without finding any problems.

jah updated this revision to Diff 66747.Jan 14 2020, 6:11 PM

Code review feedback from markj

jah marked 3 inline comments as done.Jan 14 2020, 6:13 PM
markj added inline comments.Jan 15 2020, 6:04 PM
sys/kern/uipc_usrreq.c
2607 ↗(On Diff #66747)

To be pedantic, we are looking for unreachable sockets in a cycle.

2625 ↗(On Diff #66747)

I can't quite see why this is true. Suppose I create a socket pair <s1, s2> and send s2 over s1 to s2. Then I close s1 and s2. s2's receive buffer holds an internalized reference to s2 and the socket is orphaned so we need GC to clean it up, but unp_unreachable will be 1. Of course, as soon as a second socket is leaked that way we will find and dispose of both of them, so it should not really be a problem in practice.

2682 ↗(On Diff #66747)

Perhaps assert that UNPGC_DEAD is set?

jah added inline comments.Jan 15 2020, 6:17 PM
sys/kern/uipc_usrreq.c
2625 ↗(On Diff #66747)

I think that's right. I very nearly deleted this check when writing the patch anyway, because it's not very clean. So let's just get rid of it.

markj accepted this revision.Jan 15 2020, 6:21 PM

LGTM aside from the pending comments. Thanks for picking this up and thanks to Jan for the investigation and initial patch.

This revision is now accepted and ready to land.Jan 15 2020, 6:21 PM
jah updated this revision to Diff 66817.Jan 16 2020, 7:27 AM

More review feedback

This revision now requires review to proceed.Jan 16 2020, 7:27 AM
markj accepted this revision.Jan 16 2020, 2:23 PM
This revision is now accepted and ready to land.Jan 16 2020, 2:23 PM
This revision was automatically updated to reflect the committed changes.