Page MenuHomeFreeBSD

epair: add TXCSUM and TXCSUM6
ClosedPublic

Authored by timo.voelker_fh-muenster.de on Jul 30 2025, 8:04 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Oct 12, 2:56 PM
Unknown Object (File)
Sun, Oct 12, 2:50 PM
Unknown Object (File)
Sun, Oct 12, 12:13 PM
Unknown Object (File)
Fri, Oct 10, 11:41 AM
Unknown Object (File)
Fri, Oct 10, 11:41 AM
Unknown Object (File)
Fri, Oct 10, 11:41 AM
Unknown Object (File)
Fri, Oct 10, 11:41 AM
Unknown Object (File)
Fri, Oct 10, 11:41 AM

Details

Summary

Add capabilities RXCSUM and RXCSUM6 as well as TXCSUM and TXCSUM6 for TCP and UDP to the epair interface and enable them by default.

RXCSUM and RXCSUM6 are enabled because an epair interface may receive a packet with the mbuf flag CSUM_DATA_VALID set, which is expected only if these capabilities are enabled. Since it seems not helpful to remove this flag, it is not possible to disable these capabilities.

With TXCSUM or TXCSUM6 enabled, the sender does not compute the checksum but sets the mbuf flag CSUM_TCP or CSUM_UDP. The sending epair interface end just transmits the mbuf with the flag set to the other epair interface end. If the packet in the mbuf leaves the host, because, after received by the other end, a bridge switches or IP routes the packet out over a physical interface, that interface computes the checksum.

TXCSUM and TXCSUM6 are synchronized between the two epair interface ends. If enabled/disabled on one end, it will be enabled/disabled on the other end. If the sending epair interface end has TXCSUM or TXCSUM6 enabled and the receiving end is in a bridge, it is assumed that all interfaces in the bridge have that capability enabled. Otherwise the bridge would have disabled that capability on the receiving epair interface end in the bridge which would have disabled that capability on the sending epair interface end as well due to the synchronization.

Note that in case the packet leaves the host due to IP routing, the outgoing interface may not have TXCSUM or TXCSUM6 enabled. Since the code changes in D51475 the checksum is computed in software in that case.

Also note that if the packet is for the local host, since the code changes in D51475 the host accept the packet even if the checksum is incorrect due to offloading.

Test Plan

With this patch applied, create an epair interface pair and set an IP address.

ifconfig epair0 create
ifconfig epair0a inet 192.168.0.1/24 up

Move epair0b into a VNET and set an IP address.

jail -c name=jvnet host.hostname=jvnet persist vnet vnet.interface=epair0b
jexec jvnet ifconfig epair0b inet 192.168.0.2/24 up

Test TCP with nc

jexec jvnet nc -l 1234
nc 192.168.0.2 1234

Result: Data transfer works even though TCP segments are sent with an incorrect checksum (observable with tcpdump).

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Should epair also provide support for IFCAP_RXCSUM and IFCAP_RXCSUM_IPV6? I guess it will deliver packets with the corresponding mbuf header flags already...

Should epair also provide support for IFCAP_RXCSUM and IFCAP_RXCSUM_IPV6? I guess it will deliver packets with the corresponding mbuf header flags already...

I added them in a previous version of the patch, but I decided to remove these capabilities.

I'm not sure what these capabilities mean.

  • If they mean the interface validates the checksum, then it would be wrong to add this capability.
  • If they just mean that incoming packets may have the mbuf flag CSUM_DATA_VALID set, then it would be OK to add this capability. But then it should be enabled by default without the possibility to disable them because epair does not remove this flag when it transfers a packet between the epair interface ends.

Should epair also provide support for IFCAP_RXCSUM and IFCAP_RXCSUM_IPV6? I guess it will deliver packets with the corresponding mbuf header flags already...

I added them in a previous version of the patch, but I decided to remove these capabilities.

I'm not sure what these capabilities mean.

  • If they mean the interface validates the checksum, then it would be wrong to add this capability.
  • If they just mean that incoming packets may have the mbuf flag CSUM_DATA_VALID set, then it would be OK to add this capability. But then it should be enabled by default without the possibility to disable them because epair does not remove this flag when it transfers a packet between the epair interface ends.

My interpretation is that when setting IFCAP_RXCSUM or IFCAP_RXCSUM_IPV6, I would appreciate to receive CSUM_DATA_VALID. So in general I would argue that for the current behavior it needs to be on by default. Turning it off, might make sense if these flags are also synced in a bridge. But that is a different issue.

For SIOCSIFCAP, ifr->ifr_reqcap is assigned directly to ifp->if_capenable instead of masking it with if_capabilities. The man page of ifnet told me that the masking is done earlier in the ioctl call chain.

To address @tuexen comment, I added the capabilities RXCSUM and RXCSUM6, which does nothing if enabled but may make the user aware that packets with the CSUM_DATA_VALID flag can arrive. Currently these capabilities are off by default. As stated in my last commit, they should be enabled by default without the user being able to disable them. That's the plan for the future after we have gained experience with it.

share/man/man4/epair.4
28
78537728efc53 (Ronald Klop    2025-08-11 15:51:16 +0200  28) .Dd August 12, 2025

There is a conflict here. Don't forget to update this

sys/net/if_epair.c
447

We only update the if_hwassist value, could you please explain why the LOCK is needed here?

628–629

The interface initialization is not done yet. Is there any need for this lock other than for epair_caps_changed?

I wrote the dtrace script below. It should print CSUM_IP_TCP, CSUM_IP_UDP, CSUM_IP6_TCP, and CSUM_IP6_UDP, whenever it sees an mbuf with the corresponding flag. but most packets don't contain CSUM flags.

Here is my dtrace script, make sure you change the if_index in the if_input filter based on your own epair indexes.

#!/usr/sbin/dtrace -s
# pragma D option quiet

fbt:kernel:if_input:entry
/((struct ifnet *)arg0)->if_index == 3 ||
((struct ifnet *)arg0)->if_index == 4/
{
    IFCAP_TXCSUM = (1 << (1));
    IFCAP_TXCSUM_IPV6 = (1 << (22));

    CSUM_IP_UDP = 0x00000002;
    CSUM_IP_TCP = 0x00000004;
    CSUM_IP6_UDP = 0x00000200;
    CSUM_IP6_TCP = 0x00000400;

    /* checksum is correct */
    CSUM_L4_VALID = 0x08000000;

    IFNAMSIZ = 16;

    printf("Entering if_input...\n");
    this->m = (struct mbuf *)arg1;
    this->ifp = (struct ifnet *)arg0;
    this->m_pkthdr = (struct pkthdr *)&this->m->m_pkthdr;
    printf("if_xname: %s\n", this->ifp->if_xname);
    printf("  if_capabilities: %d\n", this->ifp->if_capabilities);
    printf("  if_capenable: %d\n", this->ifp->if_capenable);
    if (this->ifp->if_capenable & IFCAP_TXCSUM)
        printf("  IFCAP_TXCSUM\n");
    if (this->ifp->if_capenable & IFCAP_TXCSUM_IPV6)
        printf("  IFCAP_TXCSUM_IPV6\n");
    printf("  if_hwassist: %lu\n", this->ifp->if_hwassist);
    printf("  if_vlantrunk: %p\n", this->ifp->if_vlantrunk);
    printf("\n");
    printf("mbuf address %p\n", this->m);
    printf("  m_pkthdr address %p\n", this->m_pkthdr);
    printf("    csum_flags: %u\n", this->m_pkthdr->csum_flags);
    if (this->m_pkthdr->csum_flags & CSUM_L4_VALID)
        printf("    CSUM_DATA_VALID\n");
    if (this->m_pkthdr->csum_flags & CSUM_IP_UDP)
        printf("    CSUM_IP_UDP\n");
    if (this->m_pkthdr->csum_flags & CSUM_IP_TCP)
        printf("    CSUM_IP_TCP\n");
    if (this->m_pkthdr->csum_flags & CSUM_IP6_UDP)
        printf("    CSUM_IP6_UDP\n");
    if (this->m_pkthdr->csum_flags & CSUM_IP6_TCP)
        printf("    CSUM_IP6_TCP\n");
    printf("\n");
}

I also added the CSUM_DATA_VALID flag to help with your tests addressing @tuexen concern about rxcsum and rxcsum6.

I wrote the dtrace script below. It should print CSUM_IP_TCP, CSUM_IP_UDP, CSUM_IP6_TCP, and CSUM_IP6_UDP, whenever it sees an mbuf with the corresponding flag. but most packets don't contain CSUM flags.

Thanks for the dtrace script. What packets were exchanged? TCP? UDP? SCTP? Others?

share/man/man4/epair.4
28
78537728efc53 (Ronald Klop    2025-08-11 15:51:16 +0200  28) .Dd August 12, 2025

There is a conflict here. Don't forget to update this

The date needs to be updated when committed anyway. That is something the committer has to deal anyways, even if there is no conflict.

Thanks for the dtrace script. What packets were exchanged? TCP? UDP? SCTP? Others?

Oops! I forgot to print their L4 protocol. (ICMPv6 is chatty)

Every TCP and UDP packet has the correct CSUM flag!

Unfortunately, for SCTP I don't know of any tool to test it in the base system. (nc --sctp doesn't work)

Log:

if_xname: epair0b                                                                              
  IFCAP_TXCSUM                                 
  IFCAP_TXCSUM_IPV6                                                                            
mbuf address fffff80003d12d00  m_pkthdr address fffff80003d12d20                               
L4 Proto Number: 17                            
    CSUM_IP6_UDP

if_xname: epair0a                              
  IFCAP_TXCSUM                                 
  IFCAP_TXCSUM_IPV6                            
mbuf address fffff80003b17800  m_pkthdr address fffff80003b17820                               
L4 Proto Number: 58                                                                            
                                                                                               
if_xname: epair0b                              
  IFCAP_TXCSUM                                 
  IFCAP_TXCSUM_IPV6                            
mbuf address fffff80003d0e900  m_pkthdr address fffff80003d0e920                               
L4 Proto Number: 58                            
                                                                                               
if_xname: epair0b                              
  IFCAP_TXCSUM                                 
  IFCAP_TXCSUM_IPV6                            
mbuf address fffff80003702c00  m_pkthdr address fffff80003702c20                               
L4 Proto Number: 6                             
    CSUM_IP6_TCP                               
                                                                                               
if_xname: epair0a                                                                              
  IFCAP_TXCSUM                                 
  IFCAP_TXCSUM_IPV6                            
mbuf address fffff80003c31200  m_pkthdr address fffff80003c31220                               
L4 Proto Number: 6                                                                             
    CSUM_IP6_TCP                               
                                                                                               
if_xname: epair0b                              
  IFCAP_TXCSUM                                 
  IFCAP_TXCSUM_IPV6                            
mbuf address fffff80003c31200  m_pkthdr address fffff80003c31220                               
L4 Proto Number: 6                                                                             
    CSUM_IP6_TCP

Update to dtrace.s:

this->data = (uintptr_t)this->m->m_data;
this->ip = (struct ip6_hdr *)(this->data + 14);
printf("L4 Proto Number: %d\n", this->ip->ip6_ctlun.ip6_un1.ip6_un1_nxt);

Thanks for the dtrace script. What packets were exchanged? TCP? UDP? SCTP? Others?

Oops! I forgot to print their L4 protocol. (ICMPv6 is chatty)

Every TCP and UDP packet has the correct CSUM flag!

Unfortunately, for SCTP I don't know of any tool to test it in the base system. (nc --sctp doesn't work)

I guess you need to do kldload sctp to get nc working with SCTP.

Log:

if_xname: epair0b                                                                              
  IFCAP_TXCSUM                                 
  IFCAP_TXCSUM_IPV6                                                                            
mbuf address fffff80003d12d00  m_pkthdr address fffff80003d12d20                               
L4 Proto Number: 17                            
    CSUM_IP6_UDP

if_xname: epair0a                              
  IFCAP_TXCSUM                                 
  IFCAP_TXCSUM_IPV6                            
mbuf address fffff80003b17800  m_pkthdr address fffff80003b17820                               
L4 Proto Number: 58                                                                            
                                                                                               
if_xname: epair0b                              
  IFCAP_TXCSUM                                 
  IFCAP_TXCSUM_IPV6                            
mbuf address fffff80003d0e900  m_pkthdr address fffff80003d0e920                               
L4 Proto Number: 58                            
                                                                                               
if_xname: epair0b                              
  IFCAP_TXCSUM                                 
  IFCAP_TXCSUM_IPV6                            
mbuf address fffff80003702c00  m_pkthdr address fffff80003702c20                               
L4 Proto Number: 6                             
    CSUM_IP6_TCP                               
                                                                                               
if_xname: epair0a                                                                              
  IFCAP_TXCSUM                                 
  IFCAP_TXCSUM_IPV6                            
mbuf address fffff80003c31200  m_pkthdr address fffff80003c31220                               
L4 Proto Number: 6                                                                             
    CSUM_IP6_TCP                               
                                                                                               
if_xname: epair0b                              
  IFCAP_TXCSUM                                 
  IFCAP_TXCSUM_IPV6                            
mbuf address fffff80003c31200  m_pkthdr address fffff80003c31220                               
L4 Proto Number: 6                                                                             
    CSUM_IP6_TCP

Update to dtrace.s:

this->data = (uintptr_t)this->m->m_data;
this->ip = (struct ip6_hdr *)(this->data + 14);
printf("L4 Proto Number: %d\n", this->ip->ip6_ctlun.ip6_un1.ip6_un1_nxt);

Thanks for the update. I like the idea of using dtrace for observation.
Here is a script to observe the incoming part (right now at the interface and TCP level) using a dtrace aggregator:

#!/usr/sbin/dtrace -s

fbt:kernel:tcp_do_segment:entry
{
	@tcp[args[1]->m_pkthdr.rcvif->if_xname, args[1]->m_pkthdr.csum_flags] = count();
}

fbt:kernel:if_input:entry
{
	@if[args[0]->if_xname, args[1]->m_pkthdr.csum_flags] = count();
}

END
{
	printf("\nTCP input\n");
	printa("%s: %08x %@u\n", @tcp);
	printf("Interface input\n");
	printa("%s: %08x %@u\n", @if);
}

I guess you need to do kldload sctp to get nc working with SCTP.

Sorry for the delay, I had to read about SCTP to know what I was doing. (found SCTP: What is it, and how to use it? and it was awesome!)

As expected, the CSUM_IP_SCTP flag doesn't apply to SCTP mbufs.
However, after I added the CSUM_IP_SCTP to the if_hwassist flag of epair_caps_changed, it doesn't appear in the csum_flags of sctp mbufs neither.

I have the sctp kernel module loaded and can establish a connection and exchange data between the jvnet and the host via nc.
But, here is the result:

if_xname: epair0a
  IFCAP_TXCSUM
  IFCAP_TXCSUM_IPV6
mbuf address fffff80003bac500  m_pkthdr address fffff80003bac520
L4 Proto Number: 132

if_xname: epair0b
  IFCAP_TXCSUM
  IFCAP_TXCSUM_IPV6
mbuf address fffff80003bac800  m_pkthdr address fffff80003bac820
L4 Proto Number: 132

Maybe I'm doing it wrong, but here is the environment:

Updated dtrace.s:

CSUM_IP_SCTP = 0x00000008;
CSUM_IP6_SCTP = 0x00000800;
if (this->m_pkthdr->csum_flags & CSUM_IP_SCTP)
    printf("    CSUM_IP_SCTP\n");
if (this->m_pkthdr->csum_flags & CSUM_IP6_SCTP)
    printf("    CSUM_IP6_SCTP\n");

setup.sh:

sh 
#!/bin/sh
kldload sctp
ifconfig epair0 create
ifconfig epair0a inet6 2a01:e140:cafe::a/64 txcsum txcsum6
jail -c name=jvnet host.hostname=jvnet persist vnet vnet.interface=epair0b
jexec jvnet ifconfig epair0b inet6 2a01:e140:cafe::b/64
jexec jvnet nc --sctp -6l 1234

another terminal:

nc --sctp -6 2a01:e140:cafe::b 1234

I guess you need to do kldload sctp to get nc working with SCTP.

Sorry for the delay, I had to read about SCTP to know what I was doing. (found SCTP: What is it, and how to use it? and it was awesome!)

As expected, the CSUM_IP_SCTP flag doesn't apply to SCTP mbufs.

I agree, that is expected.

However, after I added the CSUM_IP_SCTP to the if_hwassist flag of epair_caps_changed, it doesn't appear in the csum_flags of sctp mbufs neither.

Hmm. That is unexpected. Our plan is to support for TCP and UDP checksum offloading first and then focus on SCTP checksum offloading.
In the first step we will identify all relevant places for code changes and generic improvements to the infrastructure. In the second step we have to deal with the fact that currently there is no interface flag which can be used to figure out if the interface does transmit checksum offloading for SCTP or not. There is also a small change in the virtio specification needed, I guess.
But I agree, for epair, I thought your change would be good enough. So I am missing something. SCTP has stats counters for transmit and receive checksum offloading. I just realise that they are not displayed via netstat. Need to fix that first.

I have the sctp kernel module loaded and can establish a connection and exchange data between the jvnet and the host via nc.
But, here is the result:

if_xname: epair0a
  IFCAP_TXCSUM
  IFCAP_TXCSUM_IPV6
mbuf address fffff80003bac500  m_pkthdr address fffff80003bac520
L4 Proto Number: 132

if_xname: epair0b
  IFCAP_TXCSUM
  IFCAP_TXCSUM_IPV6
mbuf address fffff80003bac800  m_pkthdr address fffff80003bac820
L4 Proto Number: 132

Maybe I'm doing it wrong, but here is the environment:

Updated dtrace.s:

CSUM_IP_SCTP = 0x00000008;
CSUM_IP6_SCTP = 0x00000800;
if (this->m_pkthdr->csum_flags & CSUM_IP_SCTP)
    printf("    CSUM_IP_SCTP\n");
if (this->m_pkthdr->csum_flags & CSUM_IP6_SCTP)
    printf("    CSUM_IP6_SCTP\n");

setup.sh:

sh 
#!/bin/sh
kldload sctp
ifconfig epair0 create
ifconfig epair0a inet6 2a01:e140:cafe::a/64 txcsum txcsum6
jail -c name=jvnet host.hostname=jvnet persist vnet vnet.interface=epair0b
jexec jvnet ifconfig epair0b inet6 2a01:e140:cafe::b/64
jexec jvnet nc --sctp -6l 1234

another terminal:

nc --sctp -6 2a01:e140:cafe::b 1234

All this looks OK.

However, after I added the CSUM_IP_SCTP to the if_hwassist flag of epair_caps_changed, it doesn't appear in the csum_flags of sctp mbufs neither.

Hmm. That is unexpected. Our plan is to support for TCP and UDP checksum offloading first and then focus on SCTP checksum offloading.
In the first step we will identify all relevant places for code changes and generic improvements to the infrastructure. In the second step we have to deal with the fact that currently there is no interface flag which can be used to figure out if the interface does transmit checksum offloading for SCTP or not. There is also a small change in the virtio specification needed, I guess.
But I agree, for epair, I thought your change would be good enough. So I am missing something. SCTP has stats counters for transmit and receive checksum offloading. I just realise that they are not displayed via netstat. Need to fix that first.

I double-checked my tests, and it didn't make sense to me. I performed a clean build and it worked. (my bad, not sure why it was needed, ccache maybe?):

if_xname: epair0b
  IFCAP_TXCSUM
  IFCAP_TXCSUM_IPV6
...
L4 Proto Number: 132
    CSUM_IP6_SCTP

if_xname: epair0a
  IFCAP_TXCSUM
  IFCAP_TXCSUM_IPV6
...
L4 Proto Number: 132
    CSUM_IP6_SCTP

My only concern now is the need for the LOCK mechanism itself.

However, after I added the CSUM_IP_SCTP to the if_hwassist flag of epair_caps_changed, it doesn't appear in the csum_flags of sctp mbufs neither.

Hmm. That is unexpected. Our plan is to support for TCP and UDP checksum offloading first and then focus on SCTP checksum offloading.
In the first step we will identify all relevant places for code changes and generic improvements to the infrastructure. In the second step we have to deal with the fact that currently there is no interface flag which can be used to figure out if the interface does transmit checksum offloading for SCTP or not. There is also a small change in the virtio specification needed, I guess.
But I agree, for epair, I thought your change would be good enough. So I am missing something. SCTP has stats counters for transmit and receive checksum offloading. I just realise that they are not displayed via netstat. Need to fix that first.

I double-checked my tests, and it didn't make sense to me. I performed a clean build and it worked. (my bad, not sure why it was needed, ccache maybe?):

Thanks for double checking and verifying that our expectation was right.

if_xname: epair0b
  IFCAP_TXCSUM
  IFCAP_TXCSUM_IPV6
...
L4 Proto Number: 132
    CSUM_IP6_SCTP

if_xname: epair0a
  IFCAP_TXCSUM
  IFCAP_TXCSUM_IPV6
...
L4 Proto Number: 132
    CSUM_IP6_SCTP

My only concern now is the need for the LOCK mechanism itself.

Yes, I agree. @timo.voelker_fh-muenster.de will look into this.

sys/net/if_epair.c
100

I think this is not needed.

447

The EPAIR_LOCK() is used to protect the global variable next_index. Since you don't use it here, you should not assert on it.

487

This lock is not needed, since you don't touch next_index.

502

Not needed...

628–629

You don't touch next_index, so EPAIR_LOCK() is not needed.

Just a note: netstat -s -p sctp does now provide the counters related to receive and transmit checksum offloading: 6d988ec3a761.

timo.voelker_fh-muenster.de edited the summary of this revision. (Show Details)
timo.voelker_fh-muenster.de edited the test plan for this revision. (Show Details)

Addressing comments

  • rebase: Rebased my changes on the FreeBSD head. This solves a conflict @p.mousavizadeh_protonmail.com mentioned and includes the changes @tuexen did in a previous commit.
  • remove lock: I used a global lock to protect changes to if_hwassist. This was probably too restrictive as discussed here by @p.mousavizadeh_protonmail.com and @tuexen.
  • default on: The capabilities RXCSUM, RXCSUM6, TXCSUM, and TXCSUM6 are now enabled by default. Also RXCSUM and RXCSUM6 cannot be disabled because a received packet may have CSUM_DATA_VALID set, which is expected only if these capabilites are enabled.

My only concern now is the need for the LOCK mechanism itself.

Thanks for testing and for your comments. I removed the lock and did a rebase.

You can check your man page change by running "mandoc -Tlint" to get some feedback about it.

share/man/man4/epair.4
115

New sentence on a new line (after "... interface.")

118

Same here

timo.voelker_fh-muenster.de added inline comments.
share/man/man4/epair.4
115

Thanks for noting!

share/man/man4/epair.4
129

Please add a sentence here that this feature is useful when using an epair in combination with a bridge interface. I guess not all admins will read the code and find the comment.

timo.voelker_fh-muenster.de marked an inline comment as done.

Addressing @tuexen's comment by adding a sentence in the man page that explains the sync of TXCSUM and TXCSUM6.

timo.voelker_fh-muenster.de added inline comments.
share/man/man4/epair.4
129

I thought one that only reads the man page is not interested in details but that might not be true.

Still looks good, thanks!

This revision is now accepted and ready to land.Aug 29 2025, 6:00 AM
This revision was automatically updated to reflect the committed changes.

I ran the tests in /usr/tests/sys/netpfil/pf/Kyuafile, but couldn't find which one is failing (related to epair)?
@tuexen
Doas @kp said which pf test is failing?

I ran the tests in /usr/tests/sys/netpfil/pf/Kyuafile, but couldn't find which one is failing (related to epair)?
@tuexen
Doas @kp said which pf test is failing?

Yes, I forwarded his email to you. He sent it to a mailing list.

I guess the problem is not in this patch, but having this patch uncovers a bug somewhere else. Timo is already investigating. We want to figure out what we missed.

This revision is now accepted and ready to land.Sep 4 2025, 12:13 PM
This revision was automatically updated to reflect the committed changes.