Page MenuHomeFreeBSD

net: translate inbound checksum offloading flags to outbound when forwarding
Needs RevisionPublic

Authored by royger on May 27 2016, 4:08 PM.

Details

Summary

According to the mbuf(9) man page, an inboud packet that contains the
CSUM_DATA_VALID and CSUM_PSEUDO_HDR flags and has it's checksum field set to
0xffff signals that the packet has been validated by the NIC, but that the
actual checksum has not been returned. Translate this flags correctly when
doing packet forwarding, or else the outbound path will wrongly think the
checksum is present.

Sponsored by: Citrix Systems R&D

Test Plan

FreeBSD DomU PVHVM guests cannot 'route' traffic for other Xen PV guests on same Dom0 Host.
I am linking to the PR that this is suppose to be part of the fix for.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 3992
Build 4035: arc lint + arc unit

Event Timeline

royger retitled this revision from to net: translate inbound checksum offloading flags to outbound when forwarding.
royger updated this object.
royger edited the test plan for this revision. (Show Details)

Is there any concern or additional work needed for ipV6?

Is there any concern or additional work needed for ipV6?

Likely yes I think, I haven't tested IPv6 at all sadly.

This will break some NICs (see comments on D6656 -- basically, if you're setting CSUM_TCP then you need to replace the TCP checksum with the pseudohdr checksum, and store the offset of the TCP header's checksum in csum_data).

The important question is: If the packet has already had its checksum checked, then the checksum is correct. Why not update the checksum normally and send the packet without checksum offload enabled?

This revision now requires changes to proceed.Aug 15 2020, 7:46 PM

This will break some NICs (see comments on D6656 -- basically, if you're setting CSUM_TCP then you need to replace the TCP checksum with the pseudohdr checksum, and store the offset of the TCP header's checksum in csum_data).

The important question is: If the packet has already had its checksum checked, then the checksum is correct. Why not update the checksum normally and send the packet without checksum offload enabled?

Right, I think this (likely) an optimization only used by para-virtual nics. The problem here is that the guest receives a packet that has originated in the same box from a different guest, using a para-virtualized nic. Most para-virtualized nic implementations have an option to avoid calculating the checksum, since the actual hardware will calculate the checksum when the packet hits a physical wire.

In the case above however the packet never actually hit the wire, and thus there's no checksum calculated at all. Hence it would be helpful to have a way to internally forward such packets that have a correct checksum but that such checksum is not available anywhere. Having to calculate the checksum would unnecessarily slow things down, as we know the data is correct, just there's no checksum anywhere.

Isn't there a way to forward such kind of packets (correct packets that don't have the actual checksum anywhere)?

According to the mbuf(9) man using CSUM_DATA_VALID | CSUM_PSEUDO_HDR is the correct way to signal that a packet has been checksummed but that such value is not available in csum_data and instead contains 0xffff.

Sorry to start a discussion here, but we have a similar problem with bhyve. When it is necessary to deliver packets from VM with partial checksum and TSO to the host stack (inbound path).
For example, we need to solve the next path:
VM (virtio-net, TSO, partial checksum) -> if_bridge/ng_bridge -> if_tuntap/ng_eiface -> host stack.

Now this works only for the vale switch: VM (virtio-net, TSO, partial checksum) -> vale switch -> (here vale switch performs software checksum and GSO) -> real interface or host stack.

It would be good to come up with a universal solution on how to inject packets (mbuf's) without or with partially calculated checksums onto the host stack.

It is also useful for communication between jails on the same host when the checkum calculation is not needed and TSO can be enabled. Because enabling TSO and TXCSUM allows you to increase the throughput between jail's from 4-5 gbit/s up to 25 gbit/s. For example, I mean the configuration Jail_1 (vtnet_1) -> ng_eiface (TSO, TXCSUM enabled) -> ng_bridge -> ng_eiface (TSO, TXCSUM enabled) -> Jail_2 (vtnet_2) or similar configuration with if_epair and if_bridge.

This will break some NICs (see comments on D6656 -- basically, if you're setting CSUM_TCP then you need to replace the TCP checksum with the pseudohdr checksum, and store the offset of the TCP header's checksum in csum_data).

The important question is: If the packet has already had its checksum checked, then the checksum is correct. Why not update the checksum normally and send the packet without checksum offload enabled?

Right, I think this (likely) an optimization only used by para-virtual nics. The problem here is that the guest receives a packet that has originated in the same box from a different guest, using a para-virtualized nic. Most para-virtualized nic implementations have an option to avoid calculating the checksum, since the actual hardware will calculate the checksum when the packet hits a physical wire.

In the case above however the packet never actually hit the wire, and thus there's no checksum calculated at all. Hence it would be helpful to have a way to internally forward such packets that have a correct checksum but that such checksum is not available anywhere. Having to calculate the checksum would unnecessarily slow things down, as we know the data is correct, just there's no checksum anywhere.

Isn't there a way to forward such kind of packets (correct packets that don't have the actual checksum anywhere)?

According to the mbuf(9) man using CSUM_DATA_VALID | CSUM_PSEUDO_HDR is the correct way to signal that a packet has been checksummed but that such value is not available in csum_data and instead contains 0xffff.

The problem is: How do you distinguish packets that are flying around inside a machine from those destined to go onto the wire? I suppose you could have a CSUM_SKIP flag on packets like this, which means that the checksum was never calculated, and have the output path look for a flag in if_capenable (IFCAP_CSUMSKIP?) when it encounters that flag. Virtual interfaces would set that flag. When that flag is present, the packet is passed unmolested. If that flag is absent, but IFCAP_TXCSUM* is present, then the checksum is "fixed up" into canonical form. And if the egress interface does not support csum offload at all, the checksum is calculated in software before being sent on the wire.

How does that sound?

This will break some NICs (see comments on D6656 -- basically, if you're setting CSUM_TCP then you need to replace the TCP checksum with the pseudohdr checksum, and store the offset of the TCP header's checksum in csum_data).

The important question is: If the packet has already had its checksum checked, then the checksum is correct. Why not update the checksum normally and send the packet without checksum offload enabled?

Right, I think this (likely) an optimization only used by para-virtual nics. The problem here is that the guest receives a packet that has originated in the same box from a different guest, using a para-virtualized nic. Most para-virtualized nic implementations have an option to avoid calculating the checksum, since the actual hardware will calculate the checksum when the packet hits a physical wire.

In the case above however the packet never actually hit the wire, and thus there's no checksum calculated at all. Hence it would be helpful to have a way to internally forward such packets that have a correct checksum but that such checksum is not available anywhere. Having to calculate the checksum would unnecessarily slow things down, as we know the data is correct, just there's no checksum anywhere.

Isn't there a way to forward such kind of packets (correct packets that don't have the actual checksum anywhere)?

According to the mbuf(9) man using CSUM_DATA_VALID | CSUM_PSEUDO_HDR is the correct way to signal that a packet has been checksummed but that such value is not available in csum_data and instead contains 0xffff.

The problem is: How do you distinguish packets that are flying around inside a machine from those destined to go onto the wire? I suppose you could have a CSUM_SKIP flag on packets like this, which means that the checksum was never calculated, and have the output path look for a flag in if_capenable (IFCAP_CSUMSKIP?) when it encounters that flag. Virtual interfaces would set that flag. When that flag is present, the packet is passed unmolested. If that flag is absent, but IFCAP_TXCSUM* is present, then the checksum is "fixed up" into canonical form. And if the egress interface does not support csum offload at all, the checksum is calculated in software before being sent on the wire.

How does that sound?

I think one question is how can you really identify these packets? You can send a packet into a VM (e.g. via an if_tap interface from a localhost connection) but that VM might itself forward the packet internally to another interface that then heads out onto the wire. You can't really know that before you send the packet into the if_tap, and the VM itself doesn't know which interfaces are internal and which are not.

In D6611#580267, @jhb wrote:

I think one question is how can you really identify these packets? You can send a packet into a VM (e.g. via an if_tap interface from a localhost connection) but that VM might itself forward the packet internally to another interface that then heads out onto the wire. You can't really know that before you send the packet into the if_tap, and the VM itself doesn't know which interfaces are internal and which are not.

I considered & rejected some magic solution where we constrain virtual interface mac addresses to a range that is unique and which can be easily checked. Then we'd know if a packet came in on a virtual interface & needs checksumming. But that doesn't work if the VM has pass-thru access to a physical NIC and can send a packet w/o going through the host.

In D6611#580267, @jhb wrote:

I think one question is how can you really identify these packets? You can send a packet into a VM (e.g. via an if_tap interface from a localhost connection) but that VM might itself forward the packet internally to another interface that then heads out onto the wire. You can't really know that before you send the packet into the if_tap, and the VM itself doesn't know which interfaces are internal and which are not.

I considered & rejected some magic solution where we constrain virtual interface mac addresses to a range that is unique and which can be easily checked. Then we'd know if a packet came in on a virtual interface & needs checksumming. But that doesn't work if the VM has pass-thru access to a physical NIC and can send a packet w/o going through the host.

I'm likely loosing context here (as I know very little of the network subsystem), but for example the Xen paravirtualized interface will get a field in the packets that signal whether the csum is present or whether it's been already checksummed and found to be correct, even if the checksum is not present anymore, maybe because this packet has never hit the wire.

I think it's possible to receive a packet on a network interface that's been checksummed by the hardware, but that the checksum is not provided to the driver (OS)?

That would be how the Xen paravirtualized network card behaves, and in such case there should be a way to process the packet normally and just translate the incoming checksum offload flags into outgoing ones, so that the checksum is never actually generated until the packet hits the wire or it's forwarded to a network interface that doesn't have checksum offload capabilities, in which case it should be calculated by the OS, but I assume that's already the case.