Page MenuHomeFreeBSD

rpctls_impl.c: Add KRPC_CURVNET_SET() and KRPC_CURVNET_RESTORE()
AbandonedPublic

Authored by rmacklem on Jun 21 2025, 1:45 AM.
Tags
None
Referenced Files
F132459039: D50962.id157403.diff
Fri, Oct 17, 2:52 AM
Unknown Object (File)
Sat, Oct 4, 8:54 AM
Unknown Object (File)
Fri, Oct 3, 6:15 AM
Unknown Object (File)
Sat, Sep 27, 2:24 AM
Unknown Object (File)
Wed, Sep 24, 12:50 PM
Unknown Object (File)
Sep 16 2025, 8:02 AM
Unknown Object (File)
Sep 11 2025, 6:22 AM
Unknown Object (File)
Sep 10 2025, 5:34 AM
Subscribers

Details

Reviewers
glebius
Summary

Under some circumstances, the CLNT_CALL_MBUF()
in clnt_call_private() needs the correct vnet set.
Since the clnt_call_private() call in rpctls_impl.c
is always for the NFS client, it must always set
vnet0. (NFS client mounts can only be done in
vnet0.)

This patch adds KRPC_CURVNET_SET() and
KRPC_CURVNET_RESTORE() around the
clnt_call_private() call to fix this.

Test Plan

Tested during a recent IETF NFSv4 Bakeathon
testing event.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

What is the problem we are trying to fix here? What is the panic trace?

The function rpctls_connect() is always called with socket's vnet set by clnt_reconnect_connect(). In general setting vnet0 is never correct. I understand that right now NFS may not fully support the VIMAGE, but we should slowly aim towards correct support, rather than just blindly plugging the panics.

What is the problem we are trying to fix here? What is the panic trace?

The function rpctls_connect() is always called with socket's vnet set by clnt_reconnect_connect(). In general setting vnet0 is never correct. I understand that right now NFS may not fully support the VIMAGE, but we should slowly aim towards correct support, rather than just blindly plugging the panics.

The panic occurred because the CLNT_CALL_MBUF() called clnt_nl_call()
which asserts CURVNET_ASSERT_SET(). I don't have the stack trace. It
happened during Bakeathon testing over a month ago.
I fixed it by putting CURVNET_SET(TD_TO_VNET(curthread), but that
won't work when in a jail.

The CURVNET_SET(so->so_vnet) just before rpctls_connect()
doesn't always work. My guess would be something like...

  • readahead/writebehind thread (from taskqueue, so it is only a kernel thread) does a clnt_reconnect_connect(). clnt_reconnect_connect()->__rpc_nconf2socket()->socreate()->soalloc(CRED_TO_VNET(cred)) I have no idea what curthread->td_ucred is going to be for taskqueue threads.

Anyhow, I cannot recreate this on local testing (during the Bakeathon I am testing against
non-FreeBSD servers over a very slow vpn) and don't have the entire panic/crash.

All I know is that the NFS client mounts are always vnet0, since they cannot be
done in a jail, even though processes/threads running in a jail can use the mount.
(I suspect you are correct, in that the CURVNET_SET()/CURVNET_RESTORE() around
are broken and don't always work. If they can't use so->so_vnet and they can't use
CRED_TO_VNET(), setting vnet0 seems like the only option?)

What is the problem we are trying to fix here? What is the panic trace?

The function rpctls_connect() is always called with socket's vnet set by clnt_reconnect_connect(). In general setting vnet0 is never correct. I understand that right now NFS may not fully support the VIMAGE, but we should slowly aim towards correct support, rather than just blindly plugging the panics.

The panic occurred because the CLNT_CALL_MBUF() called clnt_nl_call()
which asserts CURVNET_ASSERT_SET(). I don't have the stack trace. It
happened during Bakeathon testing over a month ago.
I fixed it by putting CURVNET_SET(TD_TO_VNET(curthread), but that
won't work when in a jail.

The CURVNET_SET(so->so_vnet) just before rpctls_connect()
doesn't always work. My guess would be something like...

  • readahead/writebehind thread (from taskqueue, so it is only a kernel thread) does a clnt_reconnect_connect(). clnt_reconnect_connect()->__rpc_nconf2socket()->socreate()->soalloc(CRED_TO_VNET(cred)) I have no idea what curthread->td_ucred is going to be for taskqueue threads.

Anyhow, I cannot recreate this on local testing (during the Bakeathon I am testing against
non-FreeBSD servers over a very slow vpn) and don't have the entire panic/crash.

All I know is that the NFS client mounts are always vnet0, since they cannot be
done in a jail, even though processes/threads running in a jail can use the mount.
(I suspect you are correct, in that the CURVNET_SET()/CURVNET_RESTORE() around
are broken and don't always work. If they can't use so->so_vnet and they can't use
CRED_TO_VNET(), setting vnet0 seems like the only option?)

Ignore the above paragraph. I found where I scribbled down the crash. The calls were:

rpc_gss_init()
gss_import_name()
gssd_import_name_1()
clnt_call_private()
clnt_nl_call
--> panic

So, it is rpc_gss_init() that doesn't set CURVNET.

Because my previous fix was in clnt_call_private()
it fixed the problem, whereas this one would not.

I'll abandon this patch and come up with another one.
(Sorry for the noise.)

This patch is bogus and will not fix the crash.
It is rpc_gss_init() that needs to set CURVNET.