Page MenuHomeFreeBSD

Allow ng_bridge to automatically assign [up]link numbers for connect messages.
ClosedPublic

Authored by dave_freedave.net on Apr 3 2024, 2:25 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Jan 14, 6:32 PM
Unknown Object (File)
Sun, Jan 12, 7:00 AM
Unknown Object (File)
Dec 6 2024, 8:05 AM
Unknown Object (File)
Dec 6 2024, 8:05 AM
Unknown Object (File)
Dec 6 2024, 8:05 AM
Unknown Object (File)
Dec 6 2024, 7:59 AM
Unknown Object (File)
Dec 2 2024, 9:19 AM
Unknown Object (File)
Dec 2 2024, 9:19 AM

Details

Summary

ng_bridge hooks auto assign number

Test Plan

The sample tests assume you are already in ngctl(8). This is a better formatted version of the testing.txt in original bug report (278130).

First we need to create a bridge to use. We also want it to not shutdown when all hooks are disconnected since we will disconnect the control socket. Notice we are using the old link0 here. It could just be link now.

mkpeer bridge b link0
name .:b br0
msg br0: setpersistent
rmhook br0: link0

This looks like you need link0, you do not. We could have used rmhook .: b instead. This is why you don't need link numbers, you can remove them from either side. Next show that the bridge currently has no connections (could also use ls -l but moving forward this limits output).

show br0:
  Name: br0             Type: bridge          ID: 0000000f   Num hooks: 0

Now lets make an ng_eiface(4), these can be given to VNET jails although you will need to give it a MAC address for that which I don't show. Also connect the ng_eiface(4) to the ng_bridge(4) using only link.

mkpeer eiface e ether
name .:e jail0
rmhook .: e
connect br0: jail0: link ether

Show the ng_bridge(4) again, it should have picked link0 as that is lowest available number:

show br0:
  Name: br0             Type: bridge          ID: 0000000a   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  link0           jail0           eiface       0000000b        ether

Repeat for jail1 and jail2 so we have 3 ng_eiface(4) attached to our ng_bridge(4), this time specify an unused link1 for the first (remember we are mostly retaining backward compatibility):

mkpeer eiface e ether
name .:e jail1
rmhook jail1: ether
connect jail1: br0: ether link1
mkpeer eiface e ether
name .:e jail2
rmhook jail2: ether
connect jail2: br0: ether link
show br0:
  Name: br0             Type: bridge          ID: 0000000a   Num hooks: 3
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  link2           jail2           eiface       0000000d        ether
  link1           jail1           eiface       0000000c        ether
  link0           jail0           eiface       0000000b        ether

Next let's remove jail1, which we can see has link1 but we don't need that information:

rmhook jail1: ether
shutdown jail1:
show br0:
  Name: br0             Type: bridge          ID: 0000000a   Num hooks: 2
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  link2           jail2           eiface       0000000d        ether
  link0           jail0           eiface       0000000b        ether

Lets make jail3 now and first try connecting to link2 which will fail, then let it auto assign:

mkpeer eiface e ether
name .:e jail3
rmhook jail3: ether
connect jail3: br0: ether link2
ngctl: send msg: File exists  # <-- error returned from ngctl
connect jail3: br0: ether link
show br0:
  Name: br0             Type: bridge          ID: 0000000a   Num hooks: 3
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  link1           jail3           eiface       0000000e        ether
  link2           jail2           eiface       0000000d        ether
  link0           jail0           eiface       0000000b        ether

Hooks are unsorted linked list which is why we have link0, link2, and link1 for local hooks.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

My main objection is that backward compatibility is broken. We can no longer have uplink[N] and link[N] at the same time. This may break existing software.

Also, this makes the code more complicated and additional checks appear.

Therefore, perhaps using separate unr(9) for different types of hooks will be better, despite the additional overhead.

sys/netgraph/ng_bridge.c
424

Please use standard C type uint32_t in the new code.

973

uint32_t

1015

I really don't see how this speed things up except for negative lookups, which normally do not happen. What about not including this arguable ng_bridge_findhook optimization into the changeset and discuss that separately? Seems not necessary for the hooks autonaming.

1143

Should be declared returning a pointer to const data.

1146

Can also be declared const..

1149

This can be coded as

.len = sizeof(hook_prefix[0].prefix)
1167

I'd suggest to make the prefixes public to the rest of the file, this will allow to avoid black magic comparisons like isUplink = (name[0] == 'u'); in the newhook method.

Suggested code that should be located right after struct ng_link_prefix declaration around line 150:

static const struct ng_link_prefix link_pfx = {
       .prefix = NG_BRIDGE_HOOK_LINK_PREFIX,
       .len = sizeof(link_pfx.prefix),
};
static const struct ng_link_prefix uplink_pfx = {
        .prefix = NG_BRIDGE_HOOK_UPLINK_PREFIX,
        .len = sizeof(uplink_pfx.prefix),
};

static const struct ng_link_prefix *
ng_get_link_prefix(const char *name)
{
          static const struct ng_link_prefix *pfxs[] = { &link_pfx, &uplink_pfx, };

          for (u_int i = 0; i < nitems(pfxs); i++)
                if (strncmp(pfxs[i]->prefix, name, pfxs[i]->len) == 0)
                            return (pfxs[i]);

           return (0);
}

Then in the newhook/connect methods, you can compare like isUplink = (pfx == &uplink_pfx) instead of black magic.

I agree with Alexander that impossibility of linkX and uplinkX to coexist is a bummer and we don't really have any real problem to explain that. Adding a second unrhdr is not a problem at all. The bridge node is a node that normally exists in a system in the order of one, we don't really care about it getting heavier by 64 bytes + pointer. Alternatively the autonumbering can be turned off for uplink.

Now more fundamental question, that you probably have answer for, cause you did it the way you did: why can't we alloc_unr() and possible return EEXIST in the (almost) synchronous newhook method? Why do we postpone that to later connect method?

Also the documentation update of share/man/man4/ng_bridge.4 should be done in the same commit.

My main objection is that backward compatibility is broken. We can no longer have uplink[N] and link[N] at the same time. This may break existing software.

I had the same concern and point this out in the bug report. I am still not sure why uplink0 was never allowed, but kept that restriction. uplink is for when you attach a real network interface, an ng_ether(4) and although netgraph makes just about anything possible, I wonder if anybody does have more than one uplink on an ng_bridge(4)?

Also, this makes the code more complicated and additional checks appear.

It does add the check of using the unr(9) in ng_bridge_connect/ng_bridge_disconnect. But otherwise the check in ng_bridge_newhook is more refactored than made more complicated, but it is no longer a place where you can always get the [up]link number. When you can't though you just don't care and let connect handle it.
All that gives you enormous freedom when scripting as you no longer need to keep track of numbering of links. Previously I used a C program to loop over NG_BRIDGE_MAX_LINKS which isn't too bad, but its still way easier to just give ngctl(8) link. I would also say its solving a race when you try to hook to an ng_bridge(4) and is not just adding complication for convenience.

Therefore, perhaps using separate unr(9) for different types of hooks will be better, despite the additional overhead.

The addition of uplink is fantastic, so I could be talked into this. I'll just feel bad given that a lot of ng_bridge(4) will have zero uplink and those that do will often (always?) have exactly one. Do people really connect an ng_ether with link1 and uplink1 now? Originally there was just link and you had to use link0 and link1 and probably when it changed you moved one to uplink. But that doesn't change that you are right and this is the one thing that is not backward compatible. I intentionally still let you specify link numbers and specify peerhook in bhyve because you may have something elaborate like an ng_tee(4) or something instead of simplifying.

Now more fundamental question, that you probably have answer for, cause you did it the way you did: why can't we alloc_unr() and possible return EEXIST in the (almost) synchronous newhook method? Why do we postpone that to later connect method?

Actually there is not and I can definitely refactor to do this. It is this way because my previous attempt at this didn't use unr(9) and just kept hooks sorted. I can definitely take time to refactor to do this way. It also sounds like two unr(9) are acceptable to others so I can look into that as well.

Actually before I go move things to newhook can I ask do we have consensus that I should just have a separate unr(9) for uplink?
Also, the bhyve works without having it "figure out" it has an ng_bridge(4). You can pass "link" to it and it does the right thing. Given that should I pull bhyve change out so the patch is more focused on ng_bridge(4) which is my main concern?

usr.sbin/bhyve/net_backend_netgraph.c
55 ↗(On Diff #136474)

Maybe add "eiface" too? But it’s better to consider the bhyve part in a separate review.

81 ↗(On Diff #136474)

Modern compilers catch this moment. But at the very least I am against mixing different styles for comparison within the same unit.

86 ↗(On Diff #136474)

Ditto.

Actually before I go move things to newhook can I ask do we have consensus that I should just have a separate unr(9) for uplink?

I'd say "go for it". That will add just 64+8 bytes to the node but a simpler code and the compatibility question vanishes.

I have to clean up my patch and figure out how to update (I'm new to this). I will be removing bhyve from this so apologies to all the bhyve folks that got tagged on this.

sys/netgraph/ng_bridge.c
424

will change

973

will change

1015

agreed will remove as normal logic is still fine.

1143

will change

1146

well i'm taking your bigger suggestion below but otherwise would change ;)

1149

This is where you are wrong. that gives you the size of a pointer and led me to many a panic for a while there. I'm using sizeof for compile time determination of a string length.

1167
.len = sizeof(uplink_pfx.prefix)

That unfortunately gives a size of the pointer. What I want is the strlen() but at compile time. So I do need:

.len = sizeof(DEFINED_STRING) -1 /* added the \0 and we don't want */

but otherwise totally taking all advise on this.

usr.sbin/bhyve/net_backend_netgraph.c
55 ↗(On Diff #136474)

I'm going to pull byhve out of this review. It isn't required to be in same patch (or indeed at all to get what I'm after).

dave_freedave.net edited the test plan for this revision. (Show Details)

I have incorporated all the suggestions and uplink has a separate unr(9) now. Note the bhyve change is removed entirely as it is not necessary, you can still use peerhook=link with bhyve though for it to auto assign.

I'm not sure how to clear the comments as they are often on wrong line numbers now. Sorry about that.

sys/netgraph/ng_bridge.c
412

Given that ng_add_hook() always does a check for a hook with the user supplied new name, that shall never happen. Since you do strtoul+snprintf+strcmp check, you make sure that synthesized hook name is equal to user supplied name. If we are past strcmp() we are sure a hook with such a name doesn't exist, that means unr shall not be allocated.

Can you please check that you can't intentionally hit this EINVAL and you always get either EEXIST from ng_add_hook() or EINVAL from the strcmp above? If so, than better to make a KASSERT() that alloc_unr_specific() always succeeds.

1167

Yep, you are right, and looks like there is no way to put a proper value at compile time except using DEFINED_STRING for a second time. If I use sizeof(*uplink_pfx.profix) I'd get 1.

share/man/man4/ng_bridge.4
117

IMHO, this paragraph should be placed earlier, after line 82. It should also be more verbose. IMHO. If you don't mind, I will edit the manual page before committing.

share/man/man4/ng_bridge.4
117

I would appreciate that. Thank you!

sys/netgraph/ng_bridge.c
412

There is exactly one case where this can fail: uplink0 was requested.

That will get by the strcmp() of the default ng_findhook() in ng_base.c because it can't be on the list. It also has a valid prefix.
But look at like 347: the unr(9) is initialized with a start value of 1 so it will reject a 0.

I think the fact you asked about it means it is not obvious and I should go back to this:

if (linkNum == 0 && isUplink)
		return (EINVAL);

Then I would not need to check the return value if that came first.

At minimum this may deserve a comment. Let me know what you prefer and I'll change.

I removed the "black magic" I added of relying upon alloc_unr_specific to return -1 when a user requested uplink0.
What was there before is more obvious and reads almost like english:

if (linkNum == 0 && isUplink)
        return (EINVAL);

That says exactly why we return EINVAL. Also I still get benefit of initializing unr(9) to start at 1 in the code. No longer have to grab a number and make sure its at least 1 for uplink.

dave_freedave.net added inline comments.
sys/netgraph/ng_bridge.c
412

I went ahead and did change back and retested. I was just over eager to use return value of alloc_unr_specific for no reason after I realized I could initialize the uplink unr(9) to start counting at 1. It was objectively a bad change and the original test is better.

sys/netgraph/ng_bridge.c
402
412

Let's assert that alloc_unr* is always successful.

415
This revision is now accepted and ready to land.Apr 4 2024, 6:34 PM
des added inline comments.
sys/netgraph/ng_bridge.c
413

You never tested this with a debugging kernel, did you? alloc_unr_specific() can sleep.

sys/netgraph/ng_bridge.c
413

I tested this patch on CURRENT with debug enabled. Can you provide more information?

Where did the restriction come from that you can’t sleep in the context of creating a hook?

Atleast, I know one more node where allocation occurs when creating a hook with M_WAITOK.

Also, if NGM_MKPEER is used, then in one context, a new node is first created with a call to the constructor in which sleeping is allowed, and then the newhook() is called.

sys/netgraph/ng_bridge.c
413

I must confess I did not test with debugging kernel. I tested on 14/stable and CURRENT but not debug.

I was aware from the man page that it could sleep but was unaware this is an issue for same reasons Aleksandr mentioned.

Originally I was keeping a sorted linked list of hooks but that code was so much more convoluted than using unr(9) and letting ng_base place the hook on an unsorted list. What issue(s) are you seeing with the sleep?

markj added inline comments.
sys/netgraph/ng_bridge.c
413

In some cases this function is called by one of the netgraph threads while in an epoch section:

sys/netgraph/bridge:many_unicasts  ->  panic: malloc(M_WAITOK) with sleeping prohibited
cpuid = 1                                                                      
time = 1713559142                                                              
KDB: stack backtrace:                                                          
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00afc66af0
vpanic() at vpanic+0x135/frame 0xfffffe00afc66c20
panic() at panic+0x43/frame 0xfffffe00afc66c80
malloc_dbg() at malloc_dbg+0xe2/frame 0xfffffe00afc66ca0
malloc() at malloc+0x2c/frame 0xfffffe00afc66ce0
alloc_unr_specific() at alloc_unr_specific+0x42/frame 0xfffffe00afc66d30       
ng_bridge_newhook() at ng_bridge_newhook+0x118/frame 0xfffffe00afc66db0
ng_con_part2() at ng_con_part2+0x9b/frame 0xfffffe00afc66df0
ng_apply_item() at ng_apply_item+0x195/frame 0xfffffe00afc66e80
ngthread() at ngthread+0x26a/frame 0xfffffe00afc66ef0
fork_exit() at fork_exit+0x82/frame 0xfffffe00afc66f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00afc66f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

It doesn't happen deterministically.

ngthread() has a workaround to avoid entering an epoch section for certain message types, precisely to avoid this constraint, but it apparently doesn't apply here.

sys/netgraph/ng_bridge.c
413

I think I understand what's going on. When connecting hooks, ng_send_fn2 is used (https://github.com/freebsd/freebsd-src/blob/main/sys/netgraph/ng_base.c#L1516C15-L1516C26) which queues an ng_item with type NGQF_FN2. Therefore, the workaround does not work because it only checks for NGQF_MESG.

This is non-deterministic because the ng_item is placed on the queue unless it is empty or it is explicitly stated that it is necessary. Then ngthread() starts processing items.

Therefore, we can prohibit the processing of all non-NGQF_DATA ng_item's in the context of no epochs. Which I think is right.

diff --git a/sys/netgraph/ng_base.c b/sys/netgraph/ng_base.c
index 5bff0663e03e..5567d875e47f 100644
--- a/sys/netgraph/ng_base.c
+++ b/sys/netgraph/ng_base.c
@@ -3440,9 +3440,9 @@ ngthread(void *arg)
                                NG_QUEUE_UNLOCK(&node->nd_input_queue);
                                NGI_GET_NODE(item, node); /* zaps stored node */
 
-                               if ((item->el_flags & NGQF_TYPE) == NGQF_MESG) {
+                               if ((item->el_flags & NGQF_TYPE) != NGQF_DATA) {
                                        /*
-                                        * NGQF_MESG items should never be processed in
+                                        * NGQF_MESG, NGQF_FN1 and NGQF_FN2 items should never be processed in
                                         * NET_EPOCH context. So, temporary exit from EPOCH.
                                         */
                                        NET_EPOCH_EXIT(et);
sys/netgraph/ng_bridge.c
413

I'm sure that that patch will make the panic go away, but why is it ok to exit the net epoch?

sys/netgraph/ng_bridge.c
413

Looks like you have the fix. Is there anything more I can do to assist since I introduced the panic?

sys/netgraph/ng_bridge.c
413

I'm sure that that patch will make the panic go away, but why is it ok to exit the net epoch?

The main problem is that the node has one queue (node::nd_input_queue) in which ng_item's of different types are added. ngthread() process this queue in a loop and calls ng_apply_item() for all types in the NET_EPOCH context, which is incorrect, because for example, processing an item of the type NGQF_MESG can cause malloc with M_WAITOK.

But, in most cases (hot path), the queue contains items with the NGQF_DATA type, so we call NET_EPOCH_ENTER outside the loop. Therefore, in the previous change (https://github.com/freebsd/freebsd-src/commit/0e6e2c4ef3d1244fa21e7b691e76fdc09f8eacae), I added NET_EPOCH_EXIT/NET_EPOCH_ENTER inside the loop for optimization.

sys/netgraph/ng_bridge.c
413

I understand the problem. Let me ask a different way: why do we enter the net epoch at all?

sys/netgraph/ng_bridge.c
413

I think this is the status quo now. Entrance nodes (and ngthread()) are responsible for entering the epoch, the exit nodes are not.

If we don't enter the epoch, exit nodes may call code from the network stack that requires the epoch.

There is also an idea that eventually netgraph internal synchronization shall be converted to epoch.

Maybe, @glebius will give additional information.

sys/netgraph/ng_bridge.c
413

All correct. And I think that your suggested patch to ng_base.c will make things better. Still very far from perfect, though.

netgraph(4) definitely needs a deep refactor to modern kernel, hopefully GSoC will provide us with at least a PoC. Meanwhile, let's check in your patch.

sys/netgraph/ng_bridge.c
413

@afedorov is there any reason not to commit it? The test suite still occasionally panics due to the M_WAITOK allocation. I ran the netgraph tests in a loop with your patch and haven't observed any panics or failures.

sys/netgraph/ng_bridge.c
413

@markj

The main problem is that I took too long with this. I tried to double check everything ten times. What bothered me was the challenge of calling NGQF_FN1 and NGQF_FN2 in the NET_EPOCH context. But I didn't find any problems in my tests.

So, I think we should to commit the patch, but my committed bit is now under safe keeping.

Of course, if necessary, I can open a separate review for this patch.

sys/netgraph/ng_bridge.c
413

You mean, you can't reproduce the panic? I did so by running kyua -v parallelism=4 sys/netgraph/bridge in a loop. The panic doesn't occur every time.

If you mail me a git patch that I can apply with git am, I'll commit it for you. I have no strong feelings on the patch itself, but these test suite panics need to be addressed one way or another.