Page MenuHomeFreeBSD

modify the nfsd so that it can be run in a vnet prison
Needs ReviewPublic

Authored by rmacklem on Nov 28 2022, 3:22 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 28, 11:10 PM
Unknown Object (File)
Sat, Jan 7, 5:21 AM
Unknown Object (File)
Fri, Jan 6, 11:58 PM
Unknown Object (File)
Dec 14 2022, 11:53 PM
Unknown Object (File)
Dec 2 2022, 10:02 AM

Details

Summary

This patch modifies the nfsd so that it can be run in a
vnet prison. The only outside of kernel change at this
time is a modification to the rc.d/nfsd script, so replacing
the kernel and putting the patched nfsd script in the prison's
etc/rc.d.

It is still very much a "work in progress" at this time.

Possible changes coming are:

  • Doing the nfsstats per prison. They are now global to all prisons.
  • Moving some of the "vfs.nfsd.*" sysctls into the prisons. These are now all global. Doing "sysctl -a | fgrep vfs,nfsd" will show you what these are. Note that, since minthreads, maxthreads are set on the nfsd command line, they can be adjusted per prison.
  • All global mutexes are not in the prisons. This might result in increased lock contention. Most (except one of them) can be moved into the prisons, if needed.
  • Nothing cleans up mutexes and kernel malloc'd space when a prison is shut down. I need to work on how to do this.
  • Guards against starting it up incorrectly need to be added. This should be straightforward, but I need to know if the nfsuserd will need to be run inside a prison and whether or not NFSv3 support will be added before doing so.

Please post if you see the need to changes.

There is a brief setup document here:
https://people.freebsd.org/~rmacklem/nfsd-vnet-prison-setup.txt

Test Plan

Sofar I have done minimal testing using a single vnet
prison and nfsd running within and outside the prison.
I will be setting up an additional prison soon.

Hopefully others will be able to test and make suggestions
for improvements.

If downloading the patch from here is awkward, just email
me and I'll send you the patch.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This patch is quite different from the last one.
This version allows mountd to run inside the jail.

It requires the jail to be a vnet on its own file
system (ie. the root of the jail must be a file
system mount point).

PR_ALLOW_NFSD is defined to allow it to work
and is enabled via:

allow.nfsd = "true";

in the jail's entry in /etc/jail.conf.

There is also an associated PRIV_NFSD_VIMAGE defined,
that is permitted if the above PR_ALLOW_NFSD is set
and it is called from within a vnet prison.

Configuration of the NFSv4 only server is done within
the prison in exactly the same way it would be done
outside of a prison. (I have not tried NFSv3 as yet.)

I will update the patch and setup document on
https://people.freebsd.org/~rmacklem in the next
day or so.

I have not yet done so, but I believe that, if/when this
goes into main, it should be under a new kernel build
option.

Added small patch missed in previous version.

This patch is quite different from the last one.
This version allows mountd to run inside the jail.

It requires the jail to be a vnet on its own file
system (ie. the root of the jail must be a file
system mount point).

This does not impose any significant restriction (as jails tend to have their own root filesystem anyway nowdays).

PR_ALLOW_NFSD is defined to allow it to work
and is enabled via:

allow.nfsd = "true";

in the jail's entry in /etc/jail.conf.

Is this because you want to avoid a mysterious nested mounts loops?

I have not yet done so, but I believe that, if/when this
goes into main, it should be under a new kernel build
option.

But why? Is is a breaking change preventing it from being built along with common NFS code?

In reply to otis@.

The allow.nfsd in /etc/jail.conf doesn;t work.
I thought it was how you set the PR_ALLOW_NFSD flag,
but for some reason the flag is set all the time right now.
I have asked bz@ about this, since it needs to be fixed.

As far as a new kernel buidl option, it was mainly to
avoid bloat of the vnet for people who do not want
to run nfsd in their vnet jails.
--> Chris didn't like the idea of putting nfsd in the

vnet jail for this reason, plus concerns w.r.t.
introducing security issues.

In the end, it will be up to the "collective" to decide.
If you have an opinion, please post it on the mailing
list thread.

Added a function I called prison_check_nfsd() to
verify that it is being run in an appropriate prison.

  • vnet type
  • PR_ALLOW_NFSD set
  • in its own file system

I also moved three sysctls into the vnet so that the
rc scripts can set them:

vfs.nfsd.server_min_nfsvers
vfs.nfsd.server_max_nfsvers
vfs.nfsd.nfs_privport

I think it is ready for others to test now. See
https://people.freebsd.org/~rmacklem/nfsd-vnet-prison-setup.txt

Adding jamie for jail (mgmt) reviews (and more if he likes)

sys/kern/vfs_mount.c
1299 ↗(On Diff #113784)

That sounds spooky to me as uid 15 inside the jail has nothing to do with uid 15 outside a jail and uid 0 inside a jail definitively is not uid 0 outside the jail.

1508 ↗(On Diff #113784)

Doesn't this also need the usermount check? I don't admit I have looked but I don't understand as-to yet why we'll need two cases and two PRIVs.

1510 ↗(On Diff #113784)

stray printf

sys/rpc/svc.c
175

This will hide the sysctls from the vnet jails as opposed to make them read-only but available? Which version do we really need?

sys/kern/vfs_mount.c
1299 ↗(On Diff #113784)

Well, it is exactly what is done in vfs_suser(), except that
vfs_suser() fails automatically if a mount was done outside
of the jail.

This code is only executed if doing exporting while in the
jail.

What do you think is correct here?
Just a check for uid == 0, maybe, since I would expect that
root in the jail would run mountd.

1508 ↗(On Diff #113784)

This is handling the case of setting exports (normally
done by mountd) while in the jail.
I would not require this to expect vfs.usermount to
be set, but if you think that should be set to allow mountd
to run within a jail, I can add that.

The hassle is that mountd uses nmount(2) to do exporting,
although it really has nothing to do with doing a file system
mount, per se. It is kinda an update to the mount.

I figured that PRIV_NFSD_VIMAGE could be used to allow
exporting to be done from within the jail?

If you think some other PRIV_xx is appropriate, please let
me know.

1510 ↗(On Diff #113784)

I find it useful for testing at this time. It will go away before
any commit to main, but I think that is still a ways off.

1529 ↗(On Diff #113784)

By the way, I ifdef'd this because it was never being
executed before (MNT_EXPORTED is never set here in
the code in main, etc).
Also, PRIV_VFS_MOUNT_EXPORTED is defined in sys/priv.h,
but is never used except here. (I honestly don't know if that
means "priv_check(td, PRIV_VFS_MOUNT_EXPORTED)" will
always fail or always succeed. I plan to test to see which it
is, but it is not useful now and it is not obvious to me what
the original author's intent was?

sys/rpc/svc.c
175

Yea, I wasn't sure what to do with these.
They are in the krpc, so really don't apply to nfs
specifically and the minthreads/maxthreads get
set via the nfsd daemon without using them,
so I'm not sure they are useful.

I did it this way because I didn't know how to
make them work.
Note that "pool" is in the vnet, so they are vnet
specific, but I don't think anyone will want to change
them within a vnet. (Even I have to look at the code
for a while to remember what they do.)

This version has the server side nfsstats brought into
the jail, as some suggested that would be preferable.

It also brings nfsrvboottime into the vnets, since they
are specific to a set of nfsd and using a global one
could screw up "grace time" detection when the nfsd
are started.

sys/kern/vfs_mount.c
1509 ↗(On Diff #113797)

I realized this is no different than PRIV_NFS_DAEMON, so I
have replaced PRIV_NFSD_VIMAGE with that and tested
PRIV_NFS_DAEMON inside prison_priv_check() now.

This version of the patch works for NFSv3 over TCP.
NFSv3 over UDP caused a crash in the krpc. It wasn't
obvious why, so I just disabled UDP when jailed, since
NFSv3 over UDP is deprecated anyhow.

rpc.lockd does not run, but I would rather see people
use NFSv4 if they need locking across multiple clients.

I also changed the check in vfs_domount_update() to
just check for cr_uid == 0. That is at least better than
the bogus code that checked for same uid as who did the
mount, as pointed out by bz@.

I also added code that defines the VNET macros as nil
(cribbing them from vnet.h) when VIMAGE is defined,
but NFSD_VIMAGE is now defined.
Although I just set it for now, I plan on creating a separate
patch that makes NFSD_VIMAGE a kernel build option and
without that build option, there will not be any vnet stuff
in the nfsd code.

There are things on my todo list that I think will be left
for separate patches later:

  • Making nfsuserd work within the vnet.
  • Making rpc.tlsservd work within the vnet.
rmacklem added inline comments.
sys/kern/vfs_mount.c
1299 ↗(On Diff #113784)

I have changed this check to simply cr_uid == 0.
If there is a better check, please let me know.

1508 ↗(On Diff #113784)

I have gotten rid of PRIV_NFSD_VIMAGE and now
just use PRIV_NFS_DAEMON for the case of loading
exports here. I don't see the vfs.usermount is relevant?

This version includes checks so that the nfsuserd daemon
will not start up in a prison, since it cannot work correctly
in a prison at this time.

I do plan on making nfsuserd work in a prison, but that
will be a separate patch done in the future.

This rendition of the patch adds a kernel build option called
VNET_NFSD that the changes are #ifdef'd under.
The PR_ALLOW_NFSD flag is also now working and to enable
mountd/nfsd in a vnet prison, you need "allow.nfsd;" in your
/etc/jail.conf.

There are some things still missing, but I think I will do those
as separate patches, since this patch is getting ridiculously long.
My current list of these are:

  • Cleanup when a jail is terminated. Without this, mutexes inside the vnet do not get destroyed when the jail is destroyed. There will also be a bunch of kernel malloc'd space leaked, since it does not get free'd during jail destruction.
  • Make nfsuserd work within the vnet, so that it can run within a vnet prison. Since setting NFSv4 so that is uses numbers in user/group strings works and is the preferred configuration for Linux as well, I do not see this as a high priority.
  • Making rpc.tlsservd and gssd work within a vnet prison, so that non-auth_sys mounts can work.

I am also planning to put the kern_jail.c changes up as a separate
patch within phabricator (same changes as here) to simplify
review of them.

In summary, from here on the only changes I plan on doing to
this patch are bugfixes and anything recommended by reviewers.

A few things are fixed for this version:

  • nfsd_timer() now works correctly when there are multiple jails running nfsd. (The callout structure needed to be in the vnet.)
  • The "-z" nfsstat option now zeros out the correct stuff.
  • There is a sanity check to ensure that a pNFS configuration cannot work in a prison.
  • I have added a VNET_SYSUNINIT() to clean up stuff when a prison is destroyed. I haven't figured out how to make this work, since "jail -r" doesn't cause vnet_destroy() to be called.

This version of the patch supports cleanup when a
jail is removed. VNET_SYSUNINIT() registered the
function, but it was neve called, since vnet_destroy()
was never called when the jail was removed.

As such, an alternate plan (suggested by jamie@)
was used, where a call was added to prison_cleanup().

My to do list for this patch is now empty, but I suspect
there will still be changes needed for bugfixes or reviewer
suggestions.
--> The VNET_NFSD build option is currently only implemented

for a kernel which has "options NFSD". I need to figure out
how to deal with the loadable module case.
*** I think the build option will become a separate review/commit
     once I have one ready.

This version uses the OSD framework to set the
cleanup function called when a jail is removed,
as suggested by jamie@ in the other review.

This avoids any need for a custom nfsd_call_vnetcleanup
function pointer in kern_jail.c.

This version is fixed so that kernels built without
VNET_NFSD function correctly. It also tightened
the #if in kern_jail.c so that "allow.nfsd" only
works when all three build options
NFSD, VIMAGE and VNET_NFSD
have been specified. A nfsd.ko built with VNET_NFSD
will not load, due to the vnet being "out of space".

This version generates a warning console message
w.r.t. running mountd both within and outside of
jails.

If mountd is run both inside and outside of jails,
the exports within the jails can become corrupted
when mountd outside of the jails reloads exports.

This version of the patch supports cleanup when a
jail is removed. VNET_SYSUNINIT() registered the
function, but it was neve called, since vnet_destroy()
was never called when the jail was removed.

WHY? Is there a possible cred leak? Sometimes things can simply take time (e.g. TCP session and others to close properly -- maybe not the best example for the vnet case but still one)

I would really love to understand why "shutdown" wasn't reached before changing the system.

This version generates a warning console message
w.r.t. running mountd both within and outside of
jails.

If mountd is run both inside and outside of jails,
the exports within the jails can become corrupted
when mountd outside of the jails reloads exports.

That sounds like more abstraction and more work in the future once the single-mountd version has landed?

sys/fs/nfs/nfsport.h
231

Is is the below all duplicated here?

sys/fs/nfsserver/nfs_nfsdport.c
3719

Why would we need this @jamie if we ever only run with or w/o vnet but never allow this in a plain-old-jail?

sys/fs/nfsserver/nfs_nfsdport.c
3719

Why would we need this @jamie if we ever only run with or w/o vnet but never allow this in a plain-old-jail?

That's exactly what we're testing for here, no? I'll admit I don't see the logic in prison_check_nfsd() disallowing non-jailed callers (this requiring the explicit jailed() check here). But then I don't know just how separated the vnet-nfsd and regular-nfsd code paths are.

I didn't see it inline, but w.r.t. VNET_SYSUNINIT() { or whatever it is called }
I just know it didn't work. When debugging, I say it was registered, but never
got called, because vnet_destroy() never got called.
I'm not 100% sure, but the PD_KILL flag didn't seem to be set when I thought
it would have been?

Anyhow, the OSD approach works fine, so I'm happy with that.

sys/fs/nfs/nfsport.h
231

Not sure what you are referring to w.r.t. duplicated.

There are macros for the VNET_NFSD case and macros
that basically do nothing for the non-VNET_NFSD case.
(The nfsstatsv1 structure has, correctly or not, both client
and server stats. For the VNET_NFSD case, I defined a
separate structure that is updated per-vnet and then the
contents are copied into nfsstatsv1 for output to userland.)

For the non-VNET_NFSD case, the macros basically just do
what the code did without the macros-->update nfsstatsv1.

sys/fs/nfsserver/nfs_nfsdport.c
3719

Yea, when I first did prison_jail_nfsd() it allowed the
non-jailed case, but that seemed "inconsistent", so
I changed it to fail when not jailed.

I really don't care what the semantics is for the
non-jailed case, since the NFS code will probably
only call it when in a jail.

If you want me to change it, I can.

sys/fs/nfsserver/nfs_nfsdport.c
3719

If you want me to change it, I can.

I think for consistency it should change then. prison_xxx() checks are typically "return true if the prison/cred is allowed to do the thing." Granted, that's not always what you want to know, but this seems like a case where it is.

W.r.t. mountd. At this point I'm not sure that mountd will be runable
both inside and outside of jails safely.

It will run. In fact I had that running "accidentally" during testing.

The problem is that mountd outside of the jails will "stomp" on the
exports done within the jails when loads exports initially.

  • If you can guarantee that mountd outside the jails is running before mountd starts in any jail.

AND

  • The exports outside of the jails does not refer to any file systems that are exported within the jails.

It can work.

The trouble is, I do not know of a way to guarantee both of the
above without doing a lot of work on mountd. (mountd.c is an
ancient and rather messy piece of C code that was originally
written in 1986 and it has only gotten messier over the years.
As such, making any changes to it are scary.)

The startup ordering is problematic too, since the mountd in
the jail (it is isolated from mountd outside the jails, as jails
are supposed to do), has no way of knowing if/when mountd
might be started outside the jails.

The most workable variant might be to delay starting mountd
outside the jails until after the jails have been started and have
the mountd(s) running within the jails mark their file systems
with some new MNT_xxx flag visible to the mountd outside the
jails via statfs().
--> Then mountd outside the jails could avoid "stomping" on

the file system exports done via mountd within any jail.

Someday maybe, but for now it is not going to happen, so
I think the warning makes sense. This way if someone does
run mountd both within and outside the jails, they'll have
some indication that things will probably break in ways they
don't understand.

sys/kern/kern_jail.c
3499 ↗(On Diff #114070)

Btw. If jamie is ok with only enabling "allow.nfsd" when
VIMAGE is defined, we could just drop the first two
tests.

NFS is only going to be called when it is in a jail and
if PR_ALLOW_NFSD is set, VIMAGE must have been defined
so all prisons are vnet prisons, I think?

It is the test for PR_ALLOW_NFSD and pr_root being a file
system mount point that matter to the NFS code.
(Or I can just put the test in the NFS code and throw this
function away. I just thought it was cleaner to have a function
here in kern_jail.c to do the tests.)

sys/kern/kern_jail.c
3499 ↗(On Diff #114070)

Btw. If jamie is ok with only enabling "allow.nfsd" when
VIMAGE is defined, we could just drop the first two
tests.

I'm fine with that, or only when VNET_NFSD is defined.

NFS is only going to be called when it is in a jail and
if PR_ALLOW_NFSD is set, VIMAGE must have been defined
so all prisons are vnet prisons, I think?

No - they are allowed to be, but it's controlled on a per-prison basis.

It is the test for PR_ALLOW_NFSD and pr_root being a file
system mount point that matter to the NFS code.
(Or I can just put the test in the NFS code and throw this
function away. I just thought it was cleaner to have a function
here in kern_jail.c to do the tests.)

Allowing non-jails to pass would then mean that prison0 isn't constrained to the pr_root test. It doesn't really matter if it's never called in the prison0 case, but I like it there for neatness/safety.

sys/fs/nfs/nfsport.h
186

This CTASSERT is bogus and should be deleted.

sys/fs/nfsserver/nfs_nfsdport.c
3719

@jamie I never envisioned this to possibly work without vnets.

And if that is what we want, plain-old-jail support as well, then some if these VNET changes highly likely need to be handled differently as well.

I just think we need to be very clear about this very quickly as a lot of abstractions and security checks will highly depend on one or the other.

sys/fs/nfsserver/nfs_nfsdport.c
3719

No, it will never work in a non-vnet jail.

The first test:

if (!jailed(cred))
     return (false);

was only there to simplify the code that calls it:
without this clause, all calling cases change from:

if (...!prison_check_nfsd())

to

if (... jailed() && !prison_check_nfs())

but I have no problem with making the change.
Do you prefer replacing it with..

if (!jailed())
   return (true);

OR

  • just deleting the check?
sys/fs/nfsserver/nfs_nfsdport.c
3719

@jamie I never envisioned this to possibly work without vnets.

And if that is what we want, plain-old-jail support as well, then some if these VNET changes highly likely need to be handled differently as well.

No, I don't want it to work without vnets - that part of prison_check_nfsd() would remain as it. The only thing I was considering was if it allows prison0.

3719

The first test:

if (!jailed(cred))
     return (false);

was only there to simplify the code that calls it:
without this clause, all calling cases change from:

if (...!prison_check_nfsd())

to

if (... jailed() && !prison_check_nfs())

I'm all for simplifying the code in multiple locations, so I'll just let this rest. It was only ever a minor nit at most.

sys/fs/nfsserver/nfs_nfsdport.c
3719

Okay, the comment has long been moved with updates; if we only do vnet can someone explain to me why we need OSD? Why are vnet abstractions not good enough?

sys/fs/nfsserver/nfs_nfsdport.c
3719

Ok, I am assuming you are referring to the use
of osd_jail_register() down at line#7195.

The reason is that VNET_SYSUNINT() did not work.
As I said, it registered the function, but that function
nfsrv_cleanup() never got called when I did "jail -r <jid>",
although the jail went away.

When I use osd_jail_register(), the function does get called
and works fine.

Is VNET_SYSUNINT() broken? Maybe, but I'm not the guy to
figure that out.

sys/fs/nfsserver/nfs_nfsdport.c
3719

The solution to the jail sticking around was to move the nfsd cleanup from the end of prison_deref() to prison_cleanup(), which happens when the jail starts dying instead of when it finishes.

It would have sufficed to just add ifdef'd code to prison_cleanup itself, but since nfsd is dynamically loadable, OSD was the canonical way to go. It's not necessary to do it that way, given that VNET_NFSD already has its fingers all over the place; and since the dynamically loading part seems not to work, it may no longer be the right paradigm. But it still works.

sys/fs/nfs/nfsdvnet.h
41 ↗(On Diff #114070)

I have a feeling that if you want that VNET_NFSD option (apart from that it'll be a big kernel compile mess in the future) we should not re-define VNET macros for parts of headers. If that leaks into anything else it'll just be weird stuff. Can you call them VNET_NFSD_XXX and either define them to their VNET_XXX counterparts or to nothing (still some duplication) but at least the VNET_XXX macros will not be affected in any way.

XXX-BZ also if we should ever decide to make this a vnfs proper (depending on vnet) (like I once did vproc stuck in other reviews and outdated) then that'll be easier and I have a suspicion that even now that would help us get rid of VNET_NFSD entirely but let's do it step by step.

sys/fs/nfsserver/nfs_nfsdport.c
3719

If the jail went away entirely (and wasn't stuck dying on other things which makes jls by default no longer to show it but it can with an option) the VNET_SYSUNINT function should have been called. If it wasn't there is a problem somewhere which must be solved.

@jamie forgive me for asking as all of the #ifdef options and loadable of the module shouldn't make a difference, was the real problem that the NFS code would hold references somewhere which prevented the jail from actually dying?

You two seem to have an implicit understanding of something I missed (never learnt) in the original patch and I am trying to understand that.

sys/fs/nfsserver/nfs_nfsdport.c
3719

@jamie forgive me for asking as all of the #ifdef options and loadable of the module shouldn't make a difference, was the real problem that the NFS code would hold references somewhere which prevented the jail from actually dying?

You two seem to have an implicit understanding of something I missed (never learnt) in the original patch and I am trying to understand that.

Yes, it all comes to holding references somewhere, and I don't think there's anything you're not getting - I don't know what reference this might be or anything like that.

So part of why I suggested an earlier cleanup was partly practical: it seems to solve the issue. It could be that whatever reference is held open shouldn't be at that point, or it could be that there's something working completely as it should, which includes that reference not going away until the NFS stuff is shut down. Either way, moving the NFS shutdown earlier avoids the problem, even if it doesn't really identify it.

The other half of my reasoning is jail philosophy: I'd like to get rid of this idea of a dying jail being anything other than the last blip while it finishes disposing of itself, and as long as something isn't needed to exist during the orderly shutdown, I'd like it to go away as soon as possible (when it enters dying state). That's what prison_cleanup() is about.

sys/fs/nfsserver/nfs_nfsdport.c
3719

Okay, then my main concern probably is (was) mixing the two techniques to abstract things (I would still hope the vnet bits were enough but that's investigate this once we are done here).
@rmacklem can you add a /* XXX-BZ OSD to VNET? */ comment to the osd_jail_register() call for me?

sys/fs/nfsserver/nfs_nfsdport.c
3719

I have added the comment, as requested.

Here's a script of killing the vnet jails with
mountd/nfsd running in them:
Script started on Fri Dec 16 17:38:05 2022
root@freebsd-mds:~ # jls

JID  IP Address      Hostname                      Path
  1                  foo                           /foo
  2                  foo2                          /foo2

root@freebsd-mds:~ # jail -r 1
2
Stopping sshd.
Waiting for PIDS: 1973.
Stopping cron.
Waiting for PIDS: 1986.
Stopping nfsd.
Waiting for PIDS: 1943 1957.
.
Terminated
foo2: removed
root@freebsd-mds:~ # jls

JID  IP Address      Hostname                      Path
  1                  foo                           /foo

root@freebsd-mds:~ # jail -r 1
Stopping sshd.
Waiting for PIDS: 1969.
Stopping cron.
Waiting for PIDS: 2003.
Stopping nfsd.
Waiting for PIDS: 1916 1925.
.
Terminated
foo: removed
root@freebsd-mds:~ # jls

JID  IP Address      Hostname                      Path

root@freebsd-mds:~ # ^D

Script done on Fri Dec 16 17:38:39 2022

After this, a "ps ax" does not show any processes
marked with "J", so how do I see what is still holding
a reference, since you both seem to think that is why
VNET_SYSUNINT() doesn't work?

sys/fs/nfsserver/nfs_nfsdport.c
3719

After this, a "ps ax" does not show any processes
marked with "J", so how do I see what is still holding
a reference, since you both seem to think that is why
VNET_SYSUNINT() doesn't work?

jls -d

That will list "dying" jails in addition to others. The definition of a dying jail is one that's not yet gone only because there are still references held to it somewhere. The "somewhere" is almost always a cred, but the list of creds that may be sitting around on a system is huge.

When there are no processes left (or child jails, or "persist"), i.e. when there are no user references (pr_uref), that's when the jail starts to die. Or you can force that with jail_remove(2). Then when there are no references at all (pr_ref), it is finished dying.

Mostly cleanup.
Added comment requested by bz@.
Made it compile/work for all variants
of no jail, jail, vnet jail and without VIMAGE.

Removed kern_jail.c and jail.h since they are
now patched in main.

For here on, I only expect to update the patch
to delete files that are already patched in main.

Removed the kernel rpc bits, since they are now
committed to main.

Changed the code to use macros with NFSD_ in the
names instead of #undef'ng the VNET macros, as
suggested by bz@.

I had actually thought of doing this, but was
"on the fence" about it, thinking it might obscure
the code. I now believe bz@ is correct and it makes
the code cleaner and less likely to get broken by future
changes.

This version has the vnet'd arrays and the two large
structures malloc'd (the vnet variable is just a pointer
to them). The vnet storage is now around 300bytes
and nfsd.ko will load ok, at least for mot test system.

I have left VNET_NFSD ifdefs, since bz@ seems to
prefer them at this stage. They can easily be removed
if/when that seems appropriate.

The only thing left on my "to do" list for this patch is
to fix all the lines that exceed 80chars. Exciting work
I will do over the next few days.

This version has no semantics or syntactic changes.
The only changes are to lines that exceeded 80chars.

I hope to commit this to main in a week or so, so
please take a look before then, if you can.

This version has a few fixes related to cleanup when
a jail is destroyed or nfsd.ko is kldunload'ed.

  • A bogus jailed_without_vnet() in nfsrv_cleanup(), which is called from the osd remove method is replaced with a test for the PR_VNET flag.
  • A redundant check for prison_check_nfsd() is removed.
  • nfsrv_cleanup() is now called during MOD_UNLOAD whether or not VNET_NFSD was specified.

Add NFSD_CURVNET_SET_QUIET() macros and use them
in a couple of places where recursion is inevitable.

Found a case where NFSD_CURVNET_SET_QUIET() and
NFSD_CURVNET_RESTORE() needed to be added to
a function that can be upcalled from the krpc.

Three files in sys/rpc were missing from the diff.

This version has the kernel build option VNET_NFSD
removed, as suggested by kib@ and markj@.

It also has the vnet changes for the krpc, kgssapi
and support of nfsuserd put in this patch instead of
trying to maintain them as separate patches.