Page MenuHomeFreeBSD

Disallow setting of ECN bits with setsockopt()
ClosedPublic

Authored by rscheff on Feb 3 2022, 10:24 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Feb 14, 5:30 PM
Unknown Object (File)
Wed, Feb 14, 6:22 AM
Unknown Object (File)
Wed, Feb 14, 6:22 AM
Unknown Object (File)
Wed, Feb 14, 6:18 AM
Unknown Object (File)
Wed, Feb 14, 6:18 AM
Unknown Object (File)
Wed, Feb 14, 6:18 AM
Unknown Object (File)
Wed, Feb 14, 6:18 AM
Unknown Object (File)
Wed, Feb 14, 6:18 AM

Details

Summary

setsockopt() grants full access to the deprecated
TOS byte. Mask out the ECN codepoint, so that
only the DSCP portion can be adjusted.

Also update the man page examples to deprecate
the use of TOS, and provide DSCP examples instead, and
remove IPTOS_MINCOST to squat the ECT1 codepoint.

Test Plan
// Verifying that TCP will clear out the ECN codepoint
// when setsockopt(IP_TOS) tries to set them

--tolerance_usecs=2000000

// Set sysctl defaults
0.0 `sysctl net.inet.tcp.cc.algorithm=newreno`
+0.1 `sysctl net.inet.tcp.functions_default=freebsd`
+0.1 `sysctl net.inet.tcp.initcwnd_segments=10`
+0.1 `sysctl net.inet.tcp.ecn.enable=0`
+0.1 `sysctl net.inet.tcp.hostcache.purgenow=1`

// Test Server side

// Establish a regular TCP connection.
+0.0 `echo 'server: freebsd tcp dcsp session'`
+0.50 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.01 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0.01 setsockopt(3, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
+0.01 bind(3, ..., ...) = 0
+0.01 listen(3, 1) = 0
+0.04 <[noecn] S  0:0(0) win 65535 <mss 1000>
+0.00 >[noecn] S.  0:0(0) ack 1 win 65535 <...>
+0.00 <[noecn]  .   1:1(0) ack 1 win 65535
+0.00 accept(3, ..., ...) = 4
+0.01 getsockopt(4, IPPROTO_IP, IP_TOS, [0x0], [4]) = 0
+0.01 setsockopt(4, IPPROTO_IP, IP_TOS, [0xff], 4) = 0
+0.01 getsockopt(4, IPPROTO_IP, IP_TOS, [0xfc], [4]) = 0
+0.01 write(4, ..., 1000) = 1000
+0.00 > (tos 0xfc) P.    1:1001(1000) ack 1 win 65535
+0.00 close(4) = 0
+0.00 > (tos 0xfc) R. 1001:1001(0) ack 1 win 0
+0.00 close(3) = 0

+0.1 `sysctl net.inet.tcp.functions_default=rack`

+0.0 `echo 'server: rack tcp dscp session'`
+0.50 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.01 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0.01 setsockopt(3, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
+0.01 bind(3, ..., ...) = 0
+0.01 listen(3, 1) = 0
+0.04 <[noecn] S  0:0(0) win 65535 <mss 1000>
+0.00 >[noecn] S.  0:0(0) ack 1 win 65535 <...>
+0.00 <[noecn]  .   1:1(0) ack 1 win 65535
+0.00 accept(3, ..., ...) = 4
+0.01 getsockopt(4, IPPROTO_IP, IP_TOS, [0x0], [4]) = 0
+0.01 setsockopt(4, IPPROTO_IP, IP_TOS, [0xff], 4) = 0
+0.01 getsockopt(4, IPPROTO_IP, IP_TOS, [0xfc], [4]) = 0
+0.01 write(4, ..., 1000) = 1000
+0.00 > (tos 0xfc) P.    1:1001(1000) ack 1 win 65535
+0.00 close(4) = 0
+0.00 > (tos 0xfc) R. 1001:1001(0) ack 1 win 0
+0.00 close(3) = 0


`{
sysctl net.inet.tcp.functions_default=freebsd;
sysctl net.inet.tcp.cc.algorithm=newreno;
sysctl net.inet.tcp.ecn.enable=2;
}`

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Looks good. Did you check sys/ofed for similar cases?

No, I did not. And I don't feel competent enough to commit something there, but I guess this may work:

diff --git a/sys/ofed/drivers/infiniband/core/ib_cma.c b/sys/ofed/drivers/infiniband/core/ib_cma.c
index 586524ac88bb..6d32dadbbef8 100644
--- a/sys/ofed/drivers/infiniband/core/ib_cma.c
+++ b/sys/ofed/drivers/infiniband/core/ib_cma.c
@@ -2497,7 +2497,7 @@ void rdma_set_service_type(struct rdma_cm_id *id, int tos)
        struct rdma_id_private *id_priv;

        id_priv = container_of(id, struct rdma_id_private, id);
-       id_priv->tos = (u8) tos;
+       id_priv->tos = (u8) tos & ~IPTOS_ECN_MASK;
 }
 EXPORT_SYMBOL(rdma_set_service_type);

diff --git a/sys/ofed/drivers/infiniband/core/ib_uverbs_std_types_flow_action.c b/sys/ofed/drivers/infiniband/core/ib_uverbs_std_types_flow_action.c
index 89cd0b313b72..68917c482868 100644
--- a/sys/ofed/drivers/infiniband/core/ib_uverbs_std_types_flow_action.c
+++ b/sys/ofed/drivers/infiniband/core/ib_uverbs_std_types_flow_action.c
@@ -123,7 +123,7 @@ static int parse_esp_ip(enum ib_flow_spec_type proto,
                .src_ip = cpu_to_be32(0xffffffffUL),
                .dst_ip = cpu_to_be32(0xffffffffUL),
                .proto = 0xff,
-               .tos = 0xff,
+               .tos = 0xfc,
                .ttl = 0xff,
                .flags = 0xff,
        };
@@ -132,7 +132,7 @@ static int parse_esp_ip(enum ib_flow_spec_type proto,
                           0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
                .dst_ip = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
                           0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
-               .flow_label = cpu_to_be32(0xffffffffUL),
+               .flow_label = cpu_to_be32(~(IPTOS_ECN_MASK << 20)),
                .next_hdr = 0xff,
                .traffic_class = 0xff,
                .hop_limit = 0xff,
debdrup added a subscriber: debdrup.

mdoc changes look good to me

@rscheff : OK, I'll take the ibcore part, if any.

You are actually changing the semantic of the IP_TOS option. If you mask out the ECN bits, how am I supposed to implement a UDP based protocol supporting ECN?

sys/netinet/tcp_stacks/rack.c
20478 ↗(On Diff #102310)

Why is this change needed? If you already ensure that inp_ip_tos is correctly set, this is not necessary. Or am I missing something?

I did not consider UDP, actually.

Then let me revoke the code change in ip_ctloutput, and just go with the clearing of the current pkthdr ecn codepoint from D34148 plus perhaps something similar to the rack_set_sockopt for the base stack, such than regular sessions do not end up getting IP ECN codepoints, as D34148 only clears the ecn codepoints on (TCP-) ECN enabled sessions, but leaves non-ECN sessions unperturbed.

Does that sound reasonable?

sys/netinet/tcp_stacks/rack.c
20478 ↗(On Diff #102310)

I am not sure why this is in both places - other than RACK maybe caching the ip-> header rather than building up a new one from the inp-> struct?

I did not consider UDP, actually.

Then let me revoke the code change in ip_ctloutput, and just go with the clearing of the current pkthdr ecn codepoint from D34148 plus perhaps something similar to the rack_set_sockopt for the base stack, such than regular sessions do not end up getting IP ECN codepoints, as D34148 only clears the ecn codepoints on (TCP-) ECN enabled sessions, but leaves non-ECN sessions unperturbed.

Does that sound reasonable?

Yes. In SCTP we do https://cgit.freebsd.org/src/tree/sys/netinet/sctp_output.c#n4069

  • only mask out ECN codepoint for TCP, leave other protos unperturbed
  • only mask out ECN codepoint for TCP, leave other protos unperturbed
  • no need to parse the optval again
share/man/man4/ip.4
100

Isn't the old text still correct?

Maybe you want to add a sentence that setting the ECN bits on a socket using a transport protocol implementing ECN has no effect.

110

Maybe keeping the variable name 'tos' makes sense?

sys/netinet/tcp_usrreq.c
1764

This change looks good to me.

  • update the ip.4 man page, to discourage the use of a tos field,
share/man/man4/ip.4
100

OK.

110

I think keeping the variable name as tos makes sense. You are providing the TOS byte, not only the DSCP codepoint.

rscheff marked an inline comment as done.
  • restore "tos" variable name
rscheff added inline comments.
share/man/man4/ip.4
110

The examples at least should move ahead and not use historic field names for variable names IMHO.

This revision is now accepted and ready to land.Feb 3 2022, 5:40 PM