Page MenuHomeFreeBSD

ixgbe: Make masks for TCP flag handling during TSO sysctl'able
ClosedPublic

Authored by tuexen on Mar 6 2024, 9:32 PM.
Tags
None
Referenced Files
F103130829: D44258.diff
Thu, Nov 21, 10:05 AM
F103104370: D44258.diff
Thu, Nov 21, 1:26 AM
F103103859: D44258.diff
Thu, Nov 21, 1:15 AM
Unknown Object (File)
Thu, Nov 14, 1:53 AM
Unknown Object (File)
Oct 12 2024, 10:16 AM
Unknown Object (File)
Oct 2 2024, 10:50 AM
Unknown Object (File)
Oct 1 2024, 3:04 AM
Unknown Object (File)
Oct 1 2024, 1:14 AM
Subscribers

Details

Reviewers
rscheff
rrs
kbowling
erj
Group Reviewers
Restricted Owners Package(Owns No Changed Paths)
Commits
rGeea2e089f8da: ixgbe: sysctl for TCP flag handling during TSO
Summary

Add tso_tcp_flags_mask_first_segment, tso_tcp_flags_mask_middle_segment, and tso_tcp_flags_mask_last_segment sysctl-variables to control the handling of TCP flags during TSO.
This allows to fix the handling appropriate for classical ECN and to configure a handling appropriate for accurate ECN.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

Owners added a reviewer: Restricted Owners Package.Mar 6 2024, 9:32 PM
tuexen requested review of this revision.Mar 6 2024, 9:32 PM

As at least two, maybe three different NIC vendors have similar capabilities, wouldn't it make sense to have global, driver independent sysctls?

Are you sure this can't be dealt with dynamically in ix_txrx? Admittedly I have no reason to spend a lot of time digging into this but my intuition is you can stuff the offsets into if_pkt_info_t in iflib.c and make the right decisions when constructing the TSO packet descriptor.

sys/dev/ixgbe/if_ix.c
4706

The handler will run any time the sysctl tree is queried. I would gate the write to only apply on changes.

Are you sure this can't be dealt with dynamically in ix_txrx? Admittedly I have no reason to spend a lot of time digging into this but my intuition is you can stuff the offsets into if_pkt_info_t in iflib.c and make the right decisions when constructing the TSO packet descriptor.

Are you saying you would write the register with every TSO packet? I have no experience, but I thought this might be too expensive. Right now, the NIC is using one behavior for all TSO packets for all connections...

Are you sure this can't be dealt with dynamically in ix_txrx? Admittedly I have no reason to spend a lot of time digging into this but my intuition is you can stuff the offsets into if_pkt_info_t in iflib.c and make the right decisions when constructing the TSO packet descriptor.

Are you saying you would write the register with every TSO packet? I have no experience, but I thought this might be too expensive. Right now, the NIC is using one behavior for all TSO packets for all connections...

No, I was speculating maybe you can accomplish whatever header change is desired in the TSO descriptor rather than globally altering those registers. But I do not know.

I believe by default whatever is in the pseudoheader is copied to all segments.

What are we trying to accomplish.. is a default mask ruining the ECN flag or are you trying to drop/alter the ECN flag in the middles or last segment?

Are you sure this can't be dealt with dynamically in ix_txrx? Admittedly I have no reason to spend a lot of time digging into this but my intuition is you can stuff the offsets into if_pkt_info_t in iflib.c and make the right decisions when constructing the TSO packet descriptor.

Are you saying you would write the register with every TSO packet? I have no experience, but I thought this might be too expensive. Right now, the NIC is using one behavior for all TSO packets for all connections...

No, I was speculating maybe you can accomplish whatever header change is desired in the TSO descriptor rather than globally altering those registers. But I do not know.

I believe by default whatever is in the pseudoheader is copied to all segments.

What are we trying to accomplish.. is a default mask ruining the ECN flag or are you trying to drop/alter the ECN flag in the middles or last segment?

Are you sure this can't be dealt with dynamically in ix_txrx? Admittedly I have no reason to spend a lot of time digging into this but my intuition is you can stuff the offsets into if_pkt_info_t in iflib.c and make the right decisions when constructing the TSO packet descriptor.

Are you saying you would write the register with every TSO packet? I have no experience, but I thought this might be too expensive. Right now, the NIC is using one behavior for all TSO packets for all connections...

No, I was speculating maybe you can accomplish whatever header change is desired in the TSO descriptor rather than globally altering those registers. But I do not know.

I believe by default whatever is in the pseudoheader is copied to all segments.

What are we trying to accomplish.. is a default mask ruining the ECN flag or are you trying to drop/alter the ECN flag in the middles or last segment?

Here is what I'm trying to accomplish:
The TCP header contains 12 flags (see IANA. The Intel NIC uses three 12 bit masks to compute, which flags are copied to the first, the middle, and the last segment when performing TSO. When doing ECN as specified in RFC 3168, the masks should be:

mask.first0xFF6
mask.middle0xF76
mask.last0xF7F

This means that the FIN and PSH flag only appear in the last segment and the CWR flag only appear in the first segment. Several Intel NICs behave like this.
However,

ix0: <Intel(R) X520 82599ES (SFI/SFP+)> mem 0x6004000000000-0x600400007ffff,0x6004000100000-0x6004000103fff irq 1040376 at device 0.0 numa-domain 0 on pci3
ix0: Using 2048 TX descriptors and 2048 RX descriptors
ix0: Using 16 RX queues 16 TX queues
ix0: Using MSI-X interrupts with 17 vectors
ix0: allocated for 16 queues
ix0: allocated for 16 rx queues
ix0: Ethernet address: 90:e2:ba:f7:48:74
ix0: PCI Express Bus: Speed 5.0GT/s Width x8
ix0: eTrack 0x000161c1

reports

dev.ix.0.tso_tcp_flags_mask_first_segment: 0x00000ff6
dev.ix.0.tso_tcp_flags_mask_middle_segment: 0x00000ff6
dev.ix.0.tso_tcp_flags_mask_last_segment: 0x00000f7f

This means that the PSH and FIN flag correctly appear only in the last segment, but the CWR flag does not only appear in the first segment, but also in all middle segments.
Using this patch, this could be fixed.

When doing Accurate ECN as currently being specified in Accurate ECN, the masks should be:

mask.first0xFF6
mask.middle0xFF6
mask.last0xFFF

The proposed patch would allow a NIC to be configured to do TSO for either classical ECN or accurate ECN.

Does this description make the motivation clear?

tuexen added inline comments.
sys/dev/ixgbe/if_ix.c
4706

I think if you just perform a read operation req->newptr == NULL is true, and you would not perform the IXGBE_WRITE_REG(). Are you suggesting to optimize the case of a write operation where the new value is equal to the old value and skipping the IXGBE_WRITE_REG() in this special case?

sys/dev/ixgbe/if_ix.c
4706

I think if you just perform a read operation req->newptr == NULL is true, and you would not perform the IXGBE_WRITE_REG(). Are you suggesting to optimize the case of a write operation where the new value is equal to the old value and skipping the IXGBE_WRITE_REG() in this special case?

Yes kib corrected me in the other review it is fine as is.

Are you sure this can't be dealt with dynamically in ix_txrx? Admittedly I have no reason to spend a lot of time digging into this but my intuition is you can stuff the offsets into if_pkt_info_t in iflib.c and make the right decisions when constructing the TSO packet descriptor.

Are you saying you would write the register with every TSO packet? I have no experience, but I thought this might be too expensive. Right now, the NIC is using one behavior for all TSO packets for all connections...

No, I was speculating maybe you can accomplish whatever header change is desired in the TSO descriptor rather than globally altering those registers. But I do not know.

I believe by default whatever is in the pseudoheader is copied to all segments.

What are we trying to accomplish.. is a default mask ruining the ECN flag or are you trying to drop/alter the ECN flag in the middles or last segment?

Are you sure this can't be dealt with dynamically in ix_txrx? Admittedly I have no reason to spend a lot of time digging into this but my intuition is you can stuff the offsets into if_pkt_info_t in iflib.c and make the right decisions when constructing the TSO packet descriptor.

Are you saying you would write the register with every TSO packet? I have no experience, but I thought this might be too expensive. Right now, the NIC is using one behavior for all TSO packets for all connections...

No, I was speculating maybe you can accomplish whatever header change is desired in the TSO descriptor rather than globally altering those registers. But I do not know.

I believe by default whatever is in the pseudoheader is copied to all segments.

What are we trying to accomplish.. is a default mask ruining the ECN flag or are you trying to drop/alter the ECN flag in the middles or last segment?

Here is what I'm trying to accomplish:
The TCP header contains 12 flags (see IANA. The Intel NIC uses three 12 bit masks to compute, which flags are copied to the first, the middle, and the last segment when performing TSO. When doing ECN as specified in RFC 3168, the masks should be:

mask.first0xFF6
mask.middle0xF76
mask.last0xF7F

This means that the FIN and PSH flag only appear in the last segment and the CWR flag only appear in the first segment. Several Intel NICs behave like this.
However,

ix0: <Intel(R) X520 82599ES (SFI/SFP+)> mem 0x6004000000000-0x600400007ffff,0x6004000100000-0x6004000103fff irq 1040376 at device 0.0 numa-domain 0 on pci3
ix0: Using 2048 TX descriptors and 2048 RX descriptors
ix0: Using 16 RX queues 16 TX queues
ix0: Using MSI-X interrupts with 17 vectors
ix0: allocated for 16 queues
ix0: allocated for 16 rx queues
ix0: Ethernet address: 90:e2:ba:f7:48:74
ix0: PCI Express Bus: Speed 5.0GT/s Width x8
ix0: eTrack 0x000161c1

reports

dev.ix.0.tso_tcp_flags_mask_first_segment: 0x00000ff6
dev.ix.0.tso_tcp_flags_mask_middle_segment: 0x00000ff6
dev.ix.0.tso_tcp_flags_mask_last_segment: 0x00000f7f

This means that the PSH and FIN flag correctly appear only in the last segment, but the CWR flag does not only appear in the first segment, but also in all middle segments.
Using this patch, this could be fixed.

When doing Accurate ECN as currently being specified in Accurate ECN, the masks should be:

mask.first0xFF6
mask.middle0xFF6
mask.last0xFFF

The proposed patch would allow a NIC to be configured to do TSO for either classical ECN or accurate ECN.

Does this description make the motivation clear?

It does, thank you. There is nothing wrong with the patch as is. Should we program the registers automatically for normal ECN in a followup as the newer NICs do? I am guessing they are the way they are due to the age of the hardware/driver.

This means that the PSH and FIN flag correctly appear only in the last segment, but the CWR flag does not only appear in the first segment, but also in all middle segments.
Using this patch, this could be fixed.

When doing Accurate ECN as currently being specified in Accurate ECN, the masks should be:

mask.first0xFF6
mask.middle0xFF6
mask.last0xFFF

The proposed patch would allow a NIC to be configured to do TSO for either classical ECN or accurate ECN.

Does this description make the motivation clear?

Your description definitely helps; thanks for asking the question @kbowling. @tuexen have you checked these masks on the other devices that use ixl/ice, or is this what you've observed so far?

I haven't heard anything internally about needing the TSO masks to be changed, but I'm certainly fine with adding sysctls to configure these values since it seems to fix a valid issue.

Are you sure this can't be dealt with dynamically in ix_txrx? Admittedly I have no reason to spend a lot of time digging into this but my intuition is you can stuff the offsets into if_pkt_info_t in iflib.c and make the right decisions when constructing the TSO packet descriptor.

Are you saying you would write the register with every TSO packet? I have no experience, but I thought this might be too expensive. Right now, the NIC is using one behavior for all TSO packets for all connections...

No, I was speculating maybe you can accomplish whatever header change is desired in the TSO descriptor rather than globally altering those registers. But I do not know.

I believe by default whatever is in the pseudoheader is copied to all segments.

What are we trying to accomplish.. is a default mask ruining the ECN flag or are you trying to drop/alter the ECN flag in the middles or last segment?

Are you sure this can't be dealt with dynamically in ix_txrx? Admittedly I have no reason to spend a lot of time digging into this but my intuition is you can stuff the offsets into if_pkt_info_t in iflib.c and make the right decisions when constructing the TSO packet descriptor.

Are you saying you would write the register with every TSO packet? I have no experience, but I thought this might be too expensive. Right now, the NIC is using one behavior for all TSO packets for all connections...

No, I was speculating maybe you can accomplish whatever header change is desired in the TSO descriptor rather than globally altering those registers. But I do not know.

I believe by default whatever is in the pseudoheader is copied to all segments.

What are we trying to accomplish.. is a default mask ruining the ECN flag or are you trying to drop/alter the ECN flag in the middles or last segment?

Here is what I'm trying to accomplish:
The TCP header contains 12 flags (see IANA. The Intel NIC uses three 12 bit masks to compute, which flags are copied to the first, the middle, and the last segment when performing TSO. When doing ECN as specified in RFC 3168, the masks should be:

mask.first0xFF6
mask.middle0xF76
mask.last0xF7F

This means that the FIN and PSH flag only appear in the last segment and the CWR flag only appear in the first segment. Several Intel NICs behave like this.
However,

ix0: <Intel(R) X520 82599ES (SFI/SFP+)> mem 0x6004000000000-0x600400007ffff,0x6004000100000-0x6004000103fff irq 1040376 at device 0.0 numa-domain 0 on pci3
ix0: Using 2048 TX descriptors and 2048 RX descriptors
ix0: Using 16 RX queues 16 TX queues
ix0: Using MSI-X interrupts with 17 vectors
ix0: allocated for 16 queues
ix0: allocated for 16 rx queues
ix0: Ethernet address: 90:e2:ba:f7:48:74
ix0: PCI Express Bus: Speed 5.0GT/s Width x8
ix0: eTrack 0x000161c1

reports

dev.ix.0.tso_tcp_flags_mask_first_segment: 0x00000ff6
dev.ix.0.tso_tcp_flags_mask_middle_segment: 0x00000ff6
dev.ix.0.tso_tcp_flags_mask_last_segment: 0x00000f7f

This means that the PSH and FIN flag correctly appear only in the last segment, but the CWR flag does not only appear in the first segment, but also in all middle segments.
Using this patch, this could be fixed.

When doing Accurate ECN as currently being specified in Accurate ECN, the masks should be:

mask.first0xFF6
mask.middle0xFF6
mask.last0xFFF

The proposed patch would allow a NIC to be configured to do TSO for either classical ECN or accurate ECN.

Does this description make the motivation clear?

It does, thank you. There is nothing wrong with the patch as is. Should we program the registers automatically for normal ECN in a followup as the newer NICs do? I am guessing they are the way they are due to the age of the hardware/driver.

Yes, such a follow up commit makes sense. I can propose one. Procedural question: should I wait for someone in the O11: Intel Networking group to approve this review or go ahead and commit it with the two approvals right now?

In D44258#1067037, @erj wrote:

This means that the PSH and FIN flag correctly appear only in the last segment, but the CWR flag does not only appear in the first segment, but also in all middle segments.
Using this patch, this could be fixed.

When doing Accurate ECN as currently being specified in Accurate ECN, the masks should be:

mask.first0xFF6
mask.middle0xFF6
mask.last0xFFF

The proposed patch would allow a NIC to be configured to do TSO for either classical ECN or accurate ECN.

Does this description make the motivation clear?

Your description definitely helps; thanks for asking the question @kbowling. @tuexen have you checked these masks on the other devices that use ixl/ice, or is this what you've observed so far?

Depends on the meaning of "checking"... Here is what is based on Intel datasheets:

Chipdriverfirst maskmiddle masklast mask
82576EBigb0xFF60xF760xF7F
i210igb0xFF60xF760xF7F
i211igb0xFF60xF760xF7F
i350igb0xFF60xF760xF7F
82599ix0xFF60xFF60xF7F
X710ixl0xFF60xF760xF7F
E810ice0xFF60xF760xF7F

So all cards except the 82599 uses the expected masks for classical ECN. I have tested the NICs except the last two. Give me a couple of days and I'll actually test the X710, since I have these cards in my lab. I can't test the E810, since I don't have these cards in my lab.

I haven't heard anything internally about needing the TSO masks to be changed, but I'm certainly fine with adding sysctls to configure these values since it seems to fix a valid issue.

Changing these masks came up in relation to Accurate ECN, TSO Requirements. This document is in its final phase at the IETF.

In D44258#1067037, @erj wrote:

Your description definitely helps; thanks for asking the question @kbowling. @tuexen have you checked these masks on the other devices that use ixl/ice, or is this what you've observed so far?

I haven't heard anything internally about needing the TSO masks to be changed, but I'm certainly fine with adding sysctls to configure these values since it seems to fix a valid issue.

The incorrect retaining of the CWR header flag for middle segments usually is benign. It may (due to a popular misimplementation, which may exist in hardware) mask a CE mark on that very packet carrying the CWR (fbsd had this, see https://reviews.freebsd.org/rG9c251892c0a8612ee1cc6f53e5a7b89039b0f480 ). This would at least delay, possibly induce additional loss - and in a subsequent round, the CE marks (or loss) would get noted and appropriate action taken.

So, unless one specifically looks for this, it's very hard to notice. And in most cases, the CWR-flagged packet would not get marked with CE by the network. However, for correctness - and since the possibility exists to fix this - I'm in favor of changing these masks even for older hardware.

This revision was not accepted when it landed; it landed in state Needs Review.Thu, Nov 21, 1:26 AM
This revision was automatically updated to reflect the committed changes.

@tuexen just a note that the X552 is also using the same middle value as 82599

@tuexen just a note that the X552 is also using the same middle value as 82599

Interesting. Thanks for letting me know.