ng_bridge hooks auto assign number
Details
- Reviewers
glebius donner afedorov - Group Reviewers
bhyve - Commits
- rG86a6393a7d67: ng_bridge: allow to automatically assign numbers to new hooks
The sample tests assume you are already in ngctl(8). This is a better formatted version of the testing.txt in original bug report (278130).
First we need to create a bridge to use. We also want it to not shutdown when all hooks are disconnected since we will disconnect the control socket. Notice we are using the old link0 here. It could just be link now.
mkpeer bridge b link0 name .:b br0 msg br0: setpersistent rmhook br0: link0
This looks like you need link0, you do not. We could have used rmhook .: b instead. This is why you don't need link numbers, you can remove them from either side. Next show that the bridge currently has no connections (could also use ls -l but moving forward this limits output).
show br0: Name: br0 Type: bridge ID: 0000000f Num hooks: 0
Now lets make an ng_eiface(4), these can be given to VNET jails although you will need to give it a MAC address for that which I don't show. Also connect the ng_eiface(4) to the ng_bridge(4) using only link.
mkpeer eiface e ether name .:e jail0 rmhook .: e connect br0: jail0: link ether
Show the ng_bridge(4) again, it should have picked link0 as that is lowest available number:
show br0: Name: br0 Type: bridge ID: 0000000a Num hooks: 1 Local hook Peer name Peer type Peer ID Peer hook ---------- --------- --------- ------- --------- link0 jail0 eiface 0000000b ether
Repeat for jail1 and jail2 so we have 3 ng_eiface(4) attached to our ng_bridge(4), this time specify an unused link1 for the first (remember we are mostly retaining backward compatibility):
mkpeer eiface e ether name .:e jail1 rmhook jail1: ether connect jail1: br0: ether link1 mkpeer eiface e ether name .:e jail2 rmhook jail2: ether connect jail2: br0: ether link show br0: Name: br0 Type: bridge ID: 0000000a Num hooks: 3 Local hook Peer name Peer type Peer ID Peer hook ---------- --------- --------- ------- --------- link2 jail2 eiface 0000000d ether link1 jail1 eiface 0000000c ether link0 jail0 eiface 0000000b ether
Next let's remove jail1, which we can see has link1 but we don't need that information:
rmhook jail1: ether shutdown jail1: show br0: Name: br0 Type: bridge ID: 0000000a Num hooks: 2 Local hook Peer name Peer type Peer ID Peer hook ---------- --------- --------- ------- --------- link2 jail2 eiface 0000000d ether link0 jail0 eiface 0000000b ether
Lets make jail3 now and first try connecting to link2 which will fail, then let it auto assign:
mkpeer eiface e ether name .:e jail3 rmhook jail3: ether connect jail3: br0: ether link2 ngctl: send msg: File exists # <-- error returned from ngctl connect jail3: br0: ether link show br0: Name: br0 Type: bridge ID: 0000000a Num hooks: 3 Local hook Peer name Peer type Peer ID Peer hook ---------- --------- --------- ------- --------- link1 jail3 eiface 0000000e ether link2 jail2 eiface 0000000d ether link0 jail0 eiface 0000000b ether
Hooks are unsorted linked list which is why we have link0, link2, and link1 for local hooks.
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
My main objection is that backward compatibility is broken. We can no longer have uplink[N] and link[N] at the same time. This may break existing software.
Also, this makes the code more complicated and additional checks appear.
Therefore, perhaps using separate unr(9) for different types of hooks will be better, despite the additional overhead.
sys/netgraph/ng_bridge.c | ||
---|---|---|
424 | Please use standard C type uint32_t in the new code. | |
973 | uint32_t | |
1015 | I really don't see how this speed things up except for negative lookups, which normally do not happen. What about not including this arguable ng_bridge_findhook optimization into the changeset and discuss that separately? Seems not necessary for the hooks autonaming. | |
1143 | Should be declared returning a pointer to const data. | |
1146 | Can also be declared const.. | |
1149 | This can be coded as .len = sizeof(hook_prefix[0].prefix) | |
1167 | I'd suggest to make the prefixes public to the rest of the file, this will allow to avoid black magic comparisons like isUplink = (name[0] == 'u'); in the newhook method. Suggested code that should be located right after struct ng_link_prefix declaration around line 150: static const struct ng_link_prefix link_pfx = { .prefix = NG_BRIDGE_HOOK_LINK_PREFIX, .len = sizeof(link_pfx.prefix), }; static const struct ng_link_prefix uplink_pfx = { .prefix = NG_BRIDGE_HOOK_UPLINK_PREFIX, .len = sizeof(uplink_pfx.prefix), }; static const struct ng_link_prefix * ng_get_link_prefix(const char *name) { static const struct ng_link_prefix *pfxs[] = { &link_pfx, &uplink_pfx, }; for (u_int i = 0; i < nitems(pfxs); i++) if (strncmp(pfxs[i]->prefix, name, pfxs[i]->len) == 0) return (pfxs[i]); return (0); } Then in the newhook/connect methods, you can compare like isUplink = (pfx == &uplink_pfx) instead of black magic. |
I agree with Alexander that impossibility of linkX and uplinkX to coexist is a bummer and we don't really have any real problem to explain that. Adding a second unrhdr is not a problem at all. The bridge node is a node that normally exists in a system in the order of one, we don't really care about it getting heavier by 64 bytes + pointer. Alternatively the autonumbering can be turned off for uplink.
Now more fundamental question, that you probably have answer for, cause you did it the way you did: why can't we alloc_unr() and possible return EEXIST in the (almost) synchronous newhook method? Why do we postpone that to later connect method?
Also the documentation update of share/man/man4/ng_bridge.4 should be done in the same commit.
I had the same concern and point this out in the bug report. I am still not sure why uplink0 was never allowed, but kept that restriction. uplink is for when you attach a real network interface, an ng_ether(4) and although netgraph makes just about anything possible, I wonder if anybody does have more than one uplink on an ng_bridge(4)?
Also, this makes the code more complicated and additional checks appear.
It does add the check of using the unr(9) in ng_bridge_connect/ng_bridge_disconnect. But otherwise the check in ng_bridge_newhook is more refactored than made more complicated, but it is no longer a place where you can always get the [up]link number. When you can't though you just don't care and let connect handle it.
All that gives you enormous freedom when scripting as you no longer need to keep track of numbering of links. Previously I used a C program to loop over NG_BRIDGE_MAX_LINKS which isn't too bad, but its still way easier to just give ngctl(8) link. I would also say its solving a race when you try to hook to an ng_bridge(4) and is not just adding complication for convenience.
Therefore, perhaps using separate unr(9) for different types of hooks will be better, despite the additional overhead.
The addition of uplink is fantastic, so I could be talked into this. I'll just feel bad given that a lot of ng_bridge(4) will have zero uplink and those that do will often (always?) have exactly one. Do people really connect an ng_ether with link1 and uplink1 now? Originally there was just link and you had to use link0 and link1 and probably when it changed you moved one to uplink. But that doesn't change that you are right and this is the one thing that is not backward compatible. I intentionally still let you specify link numbers and specify peerhook in bhyve because you may have something elaborate like an ng_tee(4) or something instead of simplifying.
Actually there is not and I can definitely refactor to do this. It is this way because my previous attempt at this didn't use unr(9) and just kept hooks sorted. I can definitely take time to refactor to do this way. It also sounds like two unr(9) are acceptable to others so I can look into that as well.
Actually before I go move things to newhook can I ask do we have consensus that I should just have a separate unr(9) for uplink?
Also, the bhyve works without having it "figure out" it has an ng_bridge(4). You can pass "link" to it and it does the right thing. Given that should I pull bhyve change out so the patch is more focused on ng_bridge(4) which is my main concern?
usr.sbin/bhyve/net_backend_netgraph.c | ||
---|---|---|
55 ↗ | (On Diff #136474) | Maybe add "eiface" too? But it’s better to consider the bhyve part in a separate review. |
81 ↗ | (On Diff #136474) | Modern compilers catch this moment. But at the very least I am against mixing different styles for comparison within the same unit. |
86 ↗ | (On Diff #136474) | Ditto. |
I'd say "go for it". That will add just 64+8 bytes to the node but a simpler code and the compatibility question vanishes.
I have to clean up my patch and figure out how to update (I'm new to this). I will be removing bhyve from this so apologies to all the bhyve folks that got tagged on this.
sys/netgraph/ng_bridge.c | ||
---|---|---|
424 | will change | |
973 | will change | |
1015 | agreed will remove as normal logic is still fine. | |
1143 | will change | |
1146 | well i'm taking your bigger suggestion below but otherwise would change ;) | |
1149 | This is where you are wrong. that gives you the size of a pointer and led me to many a panic for a while there. I'm using sizeof for compile time determination of a string length. | |
1167 | .len = sizeof(uplink_pfx.prefix) That unfortunately gives a size of the pointer. What I want is the strlen() but at compile time. So I do need: .len = sizeof(DEFINED_STRING) -1 /* added the \0 and we don't want */ but otherwise totally taking all advise on this. | |
usr.sbin/bhyve/net_backend_netgraph.c | ||
55 ↗ | (On Diff #136474) | I'm going to pull byhve out of this review. It isn't required to be in same patch (or indeed at all to get what I'm after). |
I have incorporated all the suggestions and uplink has a separate unr(9) now. Note the bhyve change is removed entirely as it is not necessary, you can still use peerhook=link with bhyve though for it to auto assign.
I'm not sure how to clear the comments as they are often on wrong line numbers now. Sorry about that.
sys/netgraph/ng_bridge.c | ||
---|---|---|
412 | Given that ng_add_hook() always does a check for a hook with the user supplied new name, that shall never happen. Since you do strtoul+snprintf+strcmp check, you make sure that synthesized hook name is equal to user supplied name. If we are past strcmp() we are sure a hook with such a name doesn't exist, that means unr shall not be allocated. Can you please check that you can't intentionally hit this EINVAL and you always get either EEXIST from ng_add_hook() or EINVAL from the strcmp above? If so, than better to make a KASSERT() that alloc_unr_specific() always succeeds. | |
1167 | Yep, you are right, and looks like there is no way to put a proper value at compile time except using DEFINED_STRING for a second time. If I use sizeof(*uplink_pfx.profix) I'd get 1. |
share/man/man4/ng_bridge.4 | ||
---|---|---|
117 | IMHO, this paragraph should be placed earlier, after line 82. It should also be more verbose. IMHO. If you don't mind, I will edit the manual page before committing. |
share/man/man4/ng_bridge.4 | ||
---|---|---|
117 | I would appreciate that. Thank you! | |
sys/netgraph/ng_bridge.c | ||
412 | There is exactly one case where this can fail: uplink0 was requested. That will get by the strcmp() of the default ng_findhook() in ng_base.c because it can't be on the list. It also has a valid prefix. I think the fact you asked about it means it is not obvious and I should go back to this: if (linkNum == 0 && isUplink) return (EINVAL); Then I would not need to check the return value if that came first. At minimum this may deserve a comment. Let me know what you prefer and I'll change. |
I removed the "black magic" I added of relying upon alloc_unr_specific to return -1 when a user requested uplink0.
What was there before is more obvious and reads almost like english:
if (linkNum == 0 && isUplink) return (EINVAL);
That says exactly why we return EINVAL. Also I still get benefit of initializing unr(9) to start at 1 in the code. No longer have to grab a number and make sure its at least 1 for uplink.
sys/netgraph/ng_bridge.c | ||
---|---|---|
412 | I went ahead and did change back and retested. I was just over eager to use return value of alloc_unr_specific for no reason after I realized I could initialize the uplink unr(9) to start counting at 1. It was objectively a bad change and the original test is better. |
sys/netgraph/ng_bridge.c | ||
---|---|---|
413 | You never tested this with a debugging kernel, did you? alloc_unr_specific() can sleep. |
sys/netgraph/ng_bridge.c | ||
---|---|---|
413 | I tested this patch on CURRENT with debug enabled. Can you provide more information? Where did the restriction come from that you can’t sleep in the context of creating a hook? Atleast, I know one more node where allocation occurs when creating a hook with M_WAITOK. Also, if NGM_MKPEER is used, then in one context, a new node is first created with a call to the constructor in which sleeping is allowed, and then the newhook() is called. |
sys/netgraph/ng_bridge.c | ||
---|---|---|
413 | I must confess I did not test with debugging kernel. I tested on 14/stable and CURRENT but not debug. I was aware from the man page that it could sleep but was unaware this is an issue for same reasons Aleksandr mentioned. Originally I was keeping a sorted linked list of hooks but that code was so much more convoluted than using unr(9) and letting ng_base place the hook on an unsorted list. What issue(s) are you seeing with the sleep? |
sys/netgraph/ng_bridge.c | ||
---|---|---|
413 | In some cases this function is called by one of the netgraph threads while in an epoch section: sys/netgraph/bridge:many_unicasts -> panic: malloc(M_WAITOK) with sleeping prohibited cpuid = 1 time = 1713559142 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00afc66af0 vpanic() at vpanic+0x135/frame 0xfffffe00afc66c20 panic() at panic+0x43/frame 0xfffffe00afc66c80 malloc_dbg() at malloc_dbg+0xe2/frame 0xfffffe00afc66ca0 malloc() at malloc+0x2c/frame 0xfffffe00afc66ce0 alloc_unr_specific() at alloc_unr_specific+0x42/frame 0xfffffe00afc66d30 ng_bridge_newhook() at ng_bridge_newhook+0x118/frame 0xfffffe00afc66db0 ng_con_part2() at ng_con_part2+0x9b/frame 0xfffffe00afc66df0 ng_apply_item() at ng_apply_item+0x195/frame 0xfffffe00afc66e80 ngthread() at ngthread+0x26a/frame 0xfffffe00afc66ef0 fork_exit() at fork_exit+0x82/frame 0xfffffe00afc66f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00afc66f30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- It doesn't happen deterministically. ngthread() has a workaround to avoid entering an epoch section for certain message types, precisely to avoid this constraint, but it apparently doesn't apply here. |
sys/netgraph/ng_bridge.c | ||
---|---|---|
413 | I think I understand what's going on. When connecting hooks, ng_send_fn2 is used (https://github.com/freebsd/freebsd-src/blob/main/sys/netgraph/ng_base.c#L1516C15-L1516C26) which queues an ng_item with type NGQF_FN2. Therefore, the workaround does not work because it only checks for NGQF_MESG. This is non-deterministic because the ng_item is placed on the queue unless it is empty or it is explicitly stated that it is necessary. Then ngthread() starts processing items. Therefore, we can prohibit the processing of all non-NGQF_DATA ng_item's in the context of no epochs. Which I think is right. diff --git a/sys/netgraph/ng_base.c b/sys/netgraph/ng_base.c index 5bff0663e03e..5567d875e47f 100644 --- a/sys/netgraph/ng_base.c +++ b/sys/netgraph/ng_base.c @@ -3440,9 +3440,9 @@ ngthread(void *arg) NG_QUEUE_UNLOCK(&node->nd_input_queue); NGI_GET_NODE(item, node); /* zaps stored node */ - if ((item->el_flags & NGQF_TYPE) == NGQF_MESG) { + if ((item->el_flags & NGQF_TYPE) != NGQF_DATA) { /* - * NGQF_MESG items should never be processed in + * NGQF_MESG, NGQF_FN1 and NGQF_FN2 items should never be processed in * NET_EPOCH context. So, temporary exit from EPOCH. */ NET_EPOCH_EXIT(et); |
sys/netgraph/ng_bridge.c | ||
---|---|---|
413 | I'm sure that that patch will make the panic go away, but why is it ok to exit the net epoch? |
sys/netgraph/ng_bridge.c | ||
---|---|---|
413 | Looks like you have the fix. Is there anything more I can do to assist since I introduced the panic? |
sys/netgraph/ng_bridge.c | ||
---|---|---|
413 |
The main problem is that the node has one queue (node::nd_input_queue) in which ng_item's of different types are added. ngthread() process this queue in a loop and calls ng_apply_item() for all types in the NET_EPOCH context, which is incorrect, because for example, processing an item of the type NGQF_MESG can cause malloc with M_WAITOK. But, in most cases (hot path), the queue contains items with the NGQF_DATA type, so we call NET_EPOCH_ENTER outside the loop. Therefore, in the previous change (https://github.com/freebsd/freebsd-src/commit/0e6e2c4ef3d1244fa21e7b691e76fdc09f8eacae), I added NET_EPOCH_EXIT/NET_EPOCH_ENTER inside the loop for optimization. |
sys/netgraph/ng_bridge.c | ||
---|---|---|
413 | I understand the problem. Let me ask a different way: why do we enter the net epoch at all? |
sys/netgraph/ng_bridge.c | ||
---|---|---|
413 | I think this is the status quo now. Entrance nodes (and ngthread()) are responsible for entering the epoch, the exit nodes are not. If we don't enter the epoch, exit nodes may call code from the network stack that requires the epoch. There is also an idea that eventually netgraph internal synchronization shall be converted to epoch. Maybe, @glebius will give additional information. |
sys/netgraph/ng_bridge.c | ||
---|---|---|
413 | All correct. And I think that your suggested patch to ng_base.c will make things better. Still very far from perfect, though. netgraph(4) definitely needs a deep refactor to modern kernel, hopefully GSoC will provide us with at least a PoC. Meanwhile, let's check in your patch. |
sys/netgraph/ng_bridge.c | ||
---|---|---|
413 | The main problem is that I took too long with this. I tried to double check everything ten times. What bothered me was the challenge of calling NGQF_FN1 and NGQF_FN2 in the NET_EPOCH context. But I didn't find any problems in my tests. So, I think we should to commit the patch, but my committed bit is now under safe keeping. Of course, if necessary, I can open a separate review for this patch. |
sys/netgraph/ng_bridge.c | ||
---|---|---|
413 | You mean, you can't reproduce the panic? I did so by running kyua -v parallelism=4 sys/netgraph/bridge in a loop. The panic doesn't occur every time. If you mail me a git patch that I can apply with git am, I'll commit it for you. I have no strong feelings on the patch itself, but these test suite panics need to be addressed one way or another. |