Page MenuHomeFreeBSD

mbuf(9): Implement a leaf network interface field in the mbuf packet header.
ClosedPublic

Authored by hselasky on May 27 2022, 7:44 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Dec 13, 11:13 PM
Unknown Object (File)
Mon, Dec 2, 6:03 PM
Unknown Object (File)
Tue, Nov 26, 3:26 AM
Unknown Object (File)
Mon, Nov 25, 6:55 AM
Unknown Object (File)
Sat, Nov 23, 5:03 PM
Unknown Object (File)
Thu, Nov 21, 11:47 AM
Unknown Object (File)
Nov 5 2024, 7:36 AM
Unknown Object (File)
Oct 16 2024, 9:50 PM

Details

Summary

When packets are received they may traverse several network interfaces like
vlan(4) and lagg(9). When doing receive side offloads it is important to
know the first network interface entry point, because that is where all
offloading is taking place. This makes it possible to track receive
side route changes for multiport setups, for example when lagg(9) receives
traffic from more than one port. This avoids having to install multiple
offloading rules for the same connection.

This field works similar to the existing "rcvif" mbuf packet header field.

Submitted by: jhb@
MFC after: 1 week
Sponsored by: NVIDIA Networking
Sponsored by: Netflix

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

hselasky edited the summary of this revision. (Show Details)

I tested this on a Netflix https serving workload, and I do not see any performance regressions from increasing the pkthdr size. That surprised me.

This revision is now accepted and ready to land.May 27 2022, 7:55 PM

Why is it called "leaf"? Wouldn't "physical" or "original" be a better description?

The problem of rcvif being rewritten as a packet travels up the stack has been affecting not only offloading. I subscribed more people to gather more opinions. That's definitely is a useful feature. Let's get it right from the very beginning.

Why is it called "leaf"? Wouldn't "physical" or "original" be a better description?

From the use of "leaf node" in graphs / trees in programming. That is, if you view the set of interfaces as a kind of graph where packets enter in a leaf node (and the "top" or root of the tree is the protocol layer). That graph perhaps isn't quite as simple as that as you have interfaces like if_wg that sit higher in between protocols, but that's the origin of the name.

gnn added a subscriber: gnn.

LGTM

@glebius To be clear, I'm not opposed to other names, just explaining why 'leaf' is the name I came up with.

Can you elaborate more on what some other use cases would be? Right now the other patches for KTLS RX only set this field in packets in the mlx5 driver as it is the only driver to do NIC TLS RX. However, if there are other use cases it may make sense to set this more centrally, e.g. in ether_input and possibly other if_input routines (vs having to get set it all the drivers). Or we could define a helper macro perhaps for use in device drivers that sets both pointers that drivers would use.

BTW, one other thing that KTLS currently kind of hacks around in the RX patches is that the vlan ID is "lost" on the way up the stack (vlan_input strips M_ETHER_VLAN_TAG or whatever the flag is). Would other use cases prefer to have other information about the packet's "initial" state preserved further up the stack?

Just my humble opinion, "physical" would describe it better than a "leaf". Just thinking about all kinds of tunneling, and there could be leaves under leaves.

Other use cases that I can image are vlan, lagg, netgraph and other who rewrite rcvif, while upper layers potentially might be interested in what was the physical interface.

Just my humble opinion, "physical" would describe it better than a "leaf". Just thinking about all kinds of tunneling, and there could be leaves under leaves.

Other use cases that I can image are vlan, lagg, netgraph and other who rewrite rcvif, while upper layers potentially might be interested in what was the physical interface.

Hmm, by use cases I was more thinking about the "upper layers". KTLS is one such example, but I'm not sure if other use cases want more info (e.g. KTLS would really like the vlan to persist if possible, but I'm not sure if other upper layer use cases would want that as well)

Any more comments? Or are we ready to give this a try?

Any more comments? Or are we ready to give this a try?

No objections from me on the idea as is. We definitely need this information. However, I would ask to declare it open for later modifications.

Apart from the naming matter, that I already raised, I got more thoughts on this. At some point we should probably differentiate between network interface and network device. As we already do in wlan(4) and its underlying devices. Once a Ethernet device becomes a part of lagg(4) it effectively stops to be a network interface but remains a network device. In the scope of this patch we probably want the device to be saved on the received mbuf. So potentially the new pointer shouldn't even be a pointer to struct ifnet.

Again, no objections for the current patch.

Thank you. Will push this later today.

I just observed a build failure for ARM which appears attributable to this.

--- kern_mbuf.o ---
sys/kern/kern_mbuf.c:343:1: error: static_assert failed due to requirement 'sizeof(struct mbuf) <= 256' "size of mbuf exceeds MSIZE"
_Static_assert(sizeof(struct mbuf) <= MSIZE,
^              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Upon reverting the commits associated with D35339 and D32356 the build was successful.

Thank you.

I'll have a look at it.

Exactly which ARM architecture is this?

I tried to build several ARM targets and cannot reproduce the build failure:

make buildkernel TARGET_ARCH=armv6 
make buildkernel TARGET_ARCH=armv7  
make buildkernel TARGET_ARCH=aarch64 
make buildkernel TARGET_ARCH=armv7 KERNCONF=RPI-B

--HPS

Exactly which ARM architecture is this?

Was simply a generic ARM kernel build, no customization simply the latest git had. At this point it looks like D35339/4d88d81c316 was the immediate cause, but that requires reverting cb27627968e, f0fca646180 and fe8c78f0d20 in order to build. I suspect ARM64 and RISC-V builds may also be hit by this, but I haven't yet confirmed.

I've only got so much computing power so kernel builds take a bit. Appears b8394039dc2 did fix the issue. Also confirmed this effected ARM and i386 (ARM64 and RISC-V were unaffected).

I kicked off a universe build.

Looks good so far.