Page MenuHomeFreeBSD

Define and use functions to test specific VNET states during shutdown
Needs ReviewPublic

Authored by hselasky on Apr 25 2019, 10:59 AM.
Tags
None
Referenced Files
F105783731: D20051.diff
Fri, Dec 20, 3:46 PM
Unknown Object (File)
Sat, Dec 7, 6:55 PM
Unknown Object (File)
Mon, Nov 25, 5:53 PM
Unknown Object (File)
Mon, Nov 25, 2:55 PM
Unknown Object (File)
Mon, Nov 25, 1:11 PM
Unknown Object (File)
Sat, Nov 23, 11:25 PM
Unknown Object (File)
Sat, Nov 23, 8:59 PM
Unknown Object (File)
Sat, Nov 23, 5:24 PM
Subscribers

Details

Reviewers
bz
Summary

Define and use functions to test specific VNET states during shutdown.
This makes the code more readable and easier to maintain.

MFC after: 1 week
Sponsored by: Mellanox Technologies

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 26908

Event Timeline

I kept thinking which states we can really observe and when:

  1. during alloc: vnet_state is 0
  2. during startup: it's anything from SI_SUB_VNET up to VNET_DONE
  3. while up and running in a stable state: it's VNET_DONE
  4. during shutdown: it's down from VNET_DONE towards SI_SUB_VNET
  5. when shutdown: it's the last VNET_SYSUNINT state

States (1) and (5) should not really observable. So Maybe the original states were too precise as we really only are interested "are we in an up/running/stable" state? What if we'd simply call the function "running" or "up" or "stable"? Not sure which one.. ready always makes me think of "READY? " asking for input.

bool
vnet_is_up_running_and_stable(struct vnet *vnet)
{

        return (vnet->vnet_state == SI_SUB_VNET_DONE);
}

I cannot remember why I ended up with the more precise checks initially. Sadly history was in p4 which is gone.

sys/net/vnet.c
236 ↗(On Diff #56637)

No matter that the outcome of the naming/state discussion will be, can this please be moved under vnet_sysinit_done() before the comment for vnet_data_alloc()?

hselasky marked an inline comment as done.

Update patch as per @bz's suggestions.

This revision is now accepted and ready to land.May 10 2019, 9:47 AM

Make sure vnet0 gets its vnet_state set aswell and add a KASSERT() for valid vnet_state.

This revision now requires review to proceed.May 10 2019, 11:09 AM

In general this seems good.

sys/net/vnet.c
333 ↗(On Diff #57257)

I seem to remember you had a NULL panic somewhere a long time ago; however this should not be needed as vnet0_init() calls vnet_alloc() for the default network stack and that in turn calls vnet_sysinit() which will update the vnet_state for every VNET_SYSINIT and with that vnet_sysinit_done should set the state to SI_SUB_VNET_DONE.

Can you remind me how you ended up with a curvnet/vnet0 without state set?

367 ↗(On Diff #57257)

Could be a single-line comment.

369 ↗(On Diff #57257)

I am still considerung if calling it "vnet_is_up()" would be more clear than "running".

375 ↗(On Diff #57257)

During startup and shutdown of a vnet the vnet is not on the vnet list anymore. However it does exist ouside these boundries.

Could you, (a) for documentation purposes, and (b) to make sure no bogus code calls into this (or rather the calling functions) add the vnet_state > SI_SUB_VNET to the KASSERT as well again? Getting VNETs "stable" was hard and tightening checks wherever possible helps to keep it that way.

hselasky added inline comments.
sys/net/vnet.c
333 ↗(On Diff #57257)

The NULL panic was because I wan't checking the VNET state in the other patch (D19622), which eventually leaded up to this patch.

After inspecting the code I believe the chunk above can be removed. You are right.

hselasky added inline comments.
sys/net/vnet.c
333 ↗(On Diff #57257)

After an actual check, it appears the sysinit for setting the done, has not been registered yet. So the vnet_state is actually 0, for the curvnet at this point :-)

hselasky edited the summary of this revision. (Show Details)

Replace vnet_state with simpler vnet_is_running variable.
Compile time assert a valid subsystem for all VNET_SYSINIT's.

bz requested changes to this revision.Sep 30 2019, 10:53 AM
bz added inline comments.
sys/net/vnet.c
333 ↗(On Diff #57257)

That's eaxtly the problem I am trying to describe. So you are using something in the frag code (the other change) which panics because you are no up yet? That's an ordering issue then if I understand that correctly. Please don't fix the symtom, we need to fix the problem or over the next months a lot of similar things will show up again...

sys/net/vnet.h
75

The reason you get the state is that it is massively helpful for debugging dependency panics (say interfaces, firewalls, stack) and having a way to get it from the crashdump can tell you that you tried to use something which you haven't initialized yet.

This revision now requires changes to proceed.Sep 30 2019, 10:53 AM
hselasky added inline comments.
sys/net/vnet.c
333 ↗(On Diff #57257)

Hi,

In the old code both SI_SUB_VNET_DONE and 0 would be treated as running state. So this bug has been hidden for a while!

	shutdown = (ifp->if_vnet->vnet_state > SI_SUB_VNET &&
		 ifp->if_vnet->vnet_state < SI_SUB_VNET_DONE) ? 1 : 0;

The problem is like this:

vnet_state is only updated inside vnet_sysinit() and vnet_sysuninit(). But for vnet0, there are no VNET SYSINIT's registered when vnet_sysinit() is called, so the state never gets set. However when vnet_register_sysinit() is called and run the VNET's done sysinit is executed, but it does not update the vnet_state :-(

Is my approach to this OK? Get rid of vnet_state and set the running variable from the done callback function itself. Add a similar unload callback function aswell.

hselasky added inline comments.
sys/net/vnet.c
333 ↗(On Diff #57257)

The panic happened when I assumed that all VNET's have reached SI_SUB_VNET_DONE. This manifested itself when destroying vnets, with the other patch applied.

@bz: Just add a printf() in the done function to print the state of the vnet and you'll see!

Okay. I see why the vnet0 state is not updated. I'll take that as a different problem as it requires a bit of trickyness as it goes through the code module loading also uses.

For your, this, current problem, what about simply doing something like this: https://people.freebsd.org/~bz/20191004-01.diff ?

I'm fine with your change. If you want you can commander this revision.
Maybe also add a db_printf() for the field in the vnet ?

If you are happy, just go ahead and commit it. I didn't boot it.

hselasky retitled this revision from Factor out vnet ready check to Compile time assert a valid subsystem for all VNET init and uninit functions..
hselasky edited the summary of this revision. (Show Details)
hselasky updated this revision to Diff 63053.
hselasky retitled this revision from Compile time assert a valid subsystem for all VNET init and uninit functions. to Define and use functions to test specific VNET states during shutdown.
hselasky edited the summary of this revision. (Show Details)
hselasky added a reviewer: bz.

Update patch.

I am sorry, was I not clear enough in email that we actually need none of this changed (neither the previous commits nor any of this) in order to solve your original problem? Can we please try not to further touch what was a working system (or complicate it more).

Yes I am sorry I had not fully understood the initial complications of the D19622 panic and that the VNET specific solution could be something local (as done elsewhere as well in the past). It is understood now, and I would greatly love the original state to be restored again.

@bz: The old checks don't respect if we are bringing up or destroying a VNET. For example this one in if_detach:

-	shutdown = (ifp->if_vnet->vnet_state > SI_SUB_VNET &&
-		 ifp->if_vnet->vnet_state < SI_SUB_VNET_DONE) ? 1 : 0;

If during attach we are creating an ifnet and then for some reason need to destroy it, the code will do nothing for that ifnet.
So I think it is good for the case of error paths, that we also check vnet_shutdown here.

The answer to your question: No. These additional checks are not needed for the good case, but there might be bad cases which need them.