Allow ng_nat to be attached to a ethernet interface directly via ng_ether(4) or the likes. Add new control message types: setdlt and getdlt to switch from default DLT_RAW (no encapsulation) to DLT_EN10MB (ethernet).
Details
- Reviewers
glebius eugen_grosbein.net - Group Reviewers
manpages - Commits
- rS359698: MFC r342168,357786: Allow ng_nat to be attached to a ethernet interface
rS359697: MFC r342168,357786: Allow ng_nat to be attached to a ethernet interface
rS342168: Allow ng_nat to be attached to a ethernet interface directly via ng_ether(4)
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
Looks good with one exception: additional plain panic(). Can it be replaced with KASSERT?
IDK, there is no way this option to be set to anything but DLT_RAW or DLT_EN10MB in the course of normal operation of the node. So it would require some form of memory corruption to actually happen. IDK, panic(9) seems an appropriate action in that case. There are other panic(9) call in the code in similar situations.
Looks good with one exception: additional plain panic(). Can it be replaced with KASSERT?
IDK, there is no way this option to be set to anything but DLT_RAW or DLT_EN10MB in the course of normal operation of the node. So it would require some form of memory corruption to actually happen. IDK, panic(9) seems an appropriate action in that case. There are other panic(9) call in the code in similar situations.
Yes, it requires corruption of private node memory. As owner of multiple routers mass servicing thousands of customers using multiple NETGRAPH nodes I can assure you that panic is not appropriatie action. Appropriate action is some form of block for traffic flow trough the node in question (with logging) leaving other nodes working.
I would like to see a generic code in netgraph that marks hooks with DLT. So, no special messages to be needed, nodes will autosense what's connected to them. However, this is just a wish not a blocker for this change.
Well, that's where I respectively disagree. As an owner of hundreds of FreeBSD systems servicing many millions of customers I think that rebooting the system immediately after any slight kernel heap/stack memory corruption is detected is not just appropriate but the only sane action available. Shutting down particular netgraph node and hope for the best would just leave the service down indefinitely with no hope for any sorts of automatic recovery. On top of which, longer the system stays in this state the bigger there is a chance for it to self-inflict some long-lasting persistent data corruption.
Moreover, arguably sooner you panic the easier it would be to narrow down code which has stomped upon you data structures. It many case it would be just function being called or some of the functions in the call chain.
Node destruction and recreation is easily implemented automatic recovery. We obviously have different usage patterns. I use net/mpd5 that can use ng_nat connecting it to the session that can be destroyed and re-created by any of "client" or "server" recovering from corruption and not interfering with other sessions. Other usage patterns can react on such event to re-create nodes too.
On top of which, longer the system stays in this state the bigger there is a chance for it to self-inflict some long-lasting persistent data corruption.
This heavily depend on particular case. For example, our netgraph(3) userland library has unfixed race in NgSendMsg/NgSendAsciiMsg functions that can corrupt useland/kernel socket data flow protocol for threaded application that can lead to garbage in the data at userland side due to desync and mis-interpretation of data received over AF_NETGRAPH socket.
I agree there may be usage patterns that benefit from instant panic but disagree on that this is universal rule.
In our case the node sits in the default network path attached directly to the interface. So if it's internal configuration goes bad and it shuts itself down the box gotta be cut out of the network permanently. And since it's EC2 instance it has no console either. So the only recourse to recover is to reboot instance manually.
On top of which, longer the system stays in this state the bigger there is a chance for it to self-inflict some long-lasting persistent data corruption.
This heavily depend on particular case. For example, our netgraph(3) userland library has unfixed race in NgSendMsg/NgSendAsciiMsg functions that can corrupt useland/kernel socket data flow protocol for threaded application that can lead to garbage in the data at userland side due to desync and mis-interpretation of data received over AF_NETGRAPH socket.
Well, this does not apply to this particular check. Input coming from userland is validated in different code path *before* in-kernel configuration is modified. Thus, the only way for this field to go bad is for a kernel code to go haywire, and this is exactly the condition that panic(9) is supposed to "handle".