Page MenuHomeFreeBSD

rpcsec_tls: Avoid a socket reference underflow in rpctls_server()
Needs ReviewPublic

Authored by markj on Fri, Jun 12, 9:40 PM.

Details

Reviewers
rmacklem
glebius
Summary

The upcall_sockets tree owns a ref on any resident socket. When a
socket is removed after a TLS handshake failure, rpctls_rpc_failed()
thus calls soclose().

rpctls_server() does not acquire an extra ref to compensate for this.
So, if the upcall fails, e.g., because rpc.tlsservd is not running,
we'll call soclose() to drop the reference, but this effectively
releases the xprt layer's reference.

Fix the problem by acquiring an extra ref in rpctls_server(). Set a
flag to ensure that svc_vc_destroy_common() does not try to close the
socket a second time.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 73855
Build 70738: arc lint + arc unit

Event Timeline

markj requested review of this revision.Fri, Jun 12, 9:40 PM
This revision is now accepted and ready to land.Fri, Jun 12, 10:19 PM

This fixes the refcount overflow, but introduces a socket leak when rpc.tlsservd is actually running: sys_rpctls_syscall() takes an extra ref on the socket. We can't simply stop doing that, as the client-side code in rpctls_connect() expects it. I think the cleanest thing to do is take an explicit ref in rpctls_connect() too.

This revision now requires review to proceed.Sat, Jun 13, 9:15 PM

I don't see why adding another soref() stops a socket
from being leaked?

I'll admit that, after clicking "reviewed" I thought the
old version might cause a leak, but discovered you
cannot "unreview" after clicking Reviewed.

Anyhow, I'm going to stick some printf()s in the code
and see what happens for both the daemon running and
daemon not running cases.

To be honest, if I understand the crash you reproduce,
it is caused by soclose() being called before svc_vc_stat()
and svc_vc_destroy(). (soclose() does sorele(), so calling it
twice isn't a good idea.)
--> How about just changing the soclose() in rpctls_rpc_failed()

to a soshutdown() and allow svc_vc_destroy_common() do
the soclose() later?

(That's a one line change from the unpatched code.)

I don't see why adding another soref() stops a socket
from being leaked?

I also removed an soref() from sys_rpctls_syscall().

I'll admit that, after clicking "reviewed" I thought the
old version might cause a leak, but discovered you
cannot "unreview" after clicking Reviewed.

I'm not in a rush. :)

Anyhow, I'm going to stick some printf()s in the code
and see what happens for both the daemon running and
daemon not running cases.

To be honest, if I understand the crash you reproduce,
it is caused by soclose() being called before svc_vc_stat()
and svc_vc_destroy(). (soclose() does sorele(), so calling it
twice isn't a good idea.)

That's right. rpctls_server() was not taking an extra ref on the xprt socket before inserting it into the upcall tree, but then if the RPC fails, it would call soclose(), releasing the xprt layer's reference.

--> How about just changing the soclose() in rpctls_rpc_failed()

to a soshutdown() and allow svc_vc_destroy_common() do
the soclose() later?

(That's a one line change from the unpatched code.)

That would revert 26ee059392094. I haven't tested the case where rpc.tlsservd is running but rpc.tlsclntd isn't. I'll give that a try too.

That said, I do somewhat prefer that we explicitly take a reference on the socket when inserting it into the upcall tree...

Ok, so now I see what you've done. You are now taking the
soref() when it is inserted in the RB tree instead of when it
is taken out.

I suspect the "else" case in rpctls_rpc_failed() needs a
sorele() along with the soshutdown(), to get rid of the
reference.

It does look like the commit I did that added the soclose()
in rpctls_rpc_failed() wasn't done correctly. (I didn't realize
that svc_vc_destroy() was going to soclose it later.)

Ok, so now I see what you've done. You are now taking the
soref() when it is inserted in the RB tree instead of when it
is taken out.

I suspect the "else" case in rpctls_rpc_failed() needs a
sorele() along with the soshutdown(), to get rid of the
reference.

Now that I am testing it, this does not seem to be needed.
(You get to the "else" case if rpc.tlsservd is running on the
server and you use your little tls_trigger.c test against it.)

It does look like the commit I did that added the soclose()
in rpctls_rpc_failed() wasn't done correctly. (I didn't realize
that svc_vc_destroy() was going to soclose it later.)