Page MenuHomeFreeBSD

tcp: improve TCP fast open with source routing
AbandonedPublic

Authored by tuexen on May 19 2026, 9:57 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Jun 22, 11:32 PM
Unknown Object (File)
Sun, Jun 21, 3:23 PM
Unknown Object (File)
Sun, Jun 21, 5:14 AM
Unknown Object (File)
Sat, Jun 20, 9:33 PM
Unknown Object (File)
Sat, Jun 20, 4:01 PM
Unknown Object (File)
Sat, Jun 20, 1:13 PM
Unknown Object (File)
Thu, Jun 18, 12:54 AM
Unknown Object (File)
Sat, Jun 13, 6:05 AM

Details

Summary

When handling a TCP/IPv4 SYN-segment with an acceptable fast open cookie and source routing IP options, don't free the source routing IPv4 options while the inp still points to them. This avoids in a use after free scenario which can be observed by using a KASAN kernel.
ipopts should only be freed, if the on-stack struct syncache is used and the pointer in this structure still points to the allocated ipopts. If these ipopts are moved from the struct syncache to the struct inpcb in syncache_socket(), which is called by syncache_tfo_expand(), the pointer in the struct syncache is set to NULL.
In a FreeBSD default setup this problem is mitigated by

  1. TCP fast open support on the server side not being enabled (the sysctl-variable net.inet.tcp.fastopen.server_enable is 0).
  2. Incoming IP packet with source routing options are not being processed by the host stack (the sysctl-variable net.inet.ip.accept_sourceroute is 0).

Only if these two sysctl-variables are changed, a FreeBSD system is affected, if a server actually using TCP fast open is running.
This patch will be MFCed to stable/14 and stable/15.
The issue was reported by Yuxiang Yang, Yizhou Zhao, Xuewei Feng, Qi Li, and Ke Xu from Tsinghua University using GLM5.1 from Z.ai.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

Proposed an alternate solution at line 1782.

sys/netinet/tcp_syncache.c
1782

I think this one line change is a less disruptive and more clearly-to-the-point solution. The core issue is that syncache_tfo_expand() may result in the transfer of ownership of ipopts, and whether or not that occurred is clearly reflected in sc->sc_ipopts after it returns. There is thus only a need to synchronize this state at this point, and not a need to restructure the ipopts handling logic at the end into something more complicated.

s/net.inet.ip.accept_sourcerouting/net.inet.ip.accept_sourceroute/

Surely the point is moot though and SSRR/LSSR accepted only for feature-completeness in IPv4? E.g. SR-MPLS and SRv6 do not use it.

I "scienced the sh**t" out of this, as Dr. Mark Watney in "The Martian" would say, as background R&D for SPRING above. From my research notebook:

Andre's move in commit-id ef39adf obscures the history slightly.
The original code in question dates back to the 4.4BSD-Lite import by
rgrimes@ himself, so I'd refer to Stevens for further explanation there.

The only real modern FreeBSD change is to cache the incoming source entry
as a PACKET_TAG_IPOPTIONS mbuf tag, rather than relying on other fields.
That gets retrieved by the ip_srcroute() wrapper.
But I can't see this being inspected by ip_forward() anywhere, although
only the TCP and ICMP transports appear to inspect its return value.

However, multicast IPOPT_LSRR is explicitly disallowed; X_ip_mforward().
Me: That is probably reasonable. LSRR would be more difficult to control and
would most likely make uRPF verification/amplification prevention difficult
and/or impossible.
Whether pf will let your options through is another question; the last time
I checked, pf does not have fine grained IPv4/IPv6 option match syntax.

"Source-routing not considered harmful"
https://web.archive.org/web/20240512050628/http://www.gweep.net/~crimson/network/lsrr.html
This article discusses source routing in traditional IPv4 ASN operator terms,
and its application at peering points (IXPs), dating back to 2002:
(Pulled quotes omitted).
"Any network that has settlement-free peering interfaces with other networks
should have LSR enabled on the routers where those interfaces live. This
allows peer to examine routing policy in a concrete, verifiable way. The
subjective manner of testing by contacting the other operator is both subject
to error and consumes unnescessary, expensive human resources. It allows you
to detect peers that are misconfigured, abusing your network, or having their
networks abused."
"SSR is not very useful when poking your fingers into someone else's network,
which is presumable opaque to you, since it requires specific topology
knowledge. Additionally, SSR is limited to a small number of hops due to the
IP packet size; since you have to list all the hops the packet will visit in
the header, there's only room for 8 intervening networks."
"The irony is that with source-routing allowed across your entire network, you
can be abused. But just running it at the edges allows your peers to verify
that you're not abusing them, and it is just a token of good faith to allow
them to check you if you want to check them. So to avoid bad faith, you use a
part of something that can be abused to avoid being abused."

There are all sorts of reasons NOT to do LSR/SSR, as the author wisely
recounts, but it's concerning to me that FreeBSD apparently stopped
supporting it, given its historical utility at peering points. Of course,
for end hosts, LSR/SSR can go pound sand; it isn't needed there at all.

Finally:
Q. Which IPv4 carriers in the current Internet require "Loose Source Routing"
at peering points?

A. None. The parrot goes on to cite an expired I-D castigating source
routing in IPv4 once and for all: draft-reitzel-ipv4-source-routing-is-evil-00
It then goes on to mention that "Traffic engineering and path control are [now]
handled via: BGP policies, MPLS, SRv6/SR-MPLS; ...under the control of network
operators—not end hosts."
So that article is best consigned to history for IPv4 and vanilla IPv6.

In D57104#1308954, @bms wrote:

s/net.inet.ip.accept_sourcerouting/net.inet.ip.accept_sourceroute/

Thanks, fixed.

sys/netinet/tcp_syncache.c
1782

This proposal would also fix the issue.

I thought about this, but came to the conclusion that it makes the semantic of ipopts less clear. Right now my understanding is that it is a pointer to the source routing options, if not NULL. So the condition to freeing them is a description of the options are not stored persistently.

To have a clean solution, I would argue that we should change the semantic of ipopts to the source routing options we need to free on exit. Then your proposed fix here makes sense to me, but we would also have to change the code around line 1675 to:

	sc->sc_ipopts = ipopts;
        if (sc != &scs)
                ipopts = NULL;

and then we can do at the end:

if (ipopts != NULL)
        (void)m_free(ipopts);

Between this and the solution I choose, I think the difference is a change of the semantic of the variable ipopts. Therefore, I selected the one which does not change the semantic of ipopts.

Maybe others want to chime in...

sys/netinet/tcp_syncache.c
1782

I agree with your delineation of the essential difference between your proposal and mine, which I would summarize as:

  • you are proposing to fix the use after free bug AND apply what you feel is an improvement to visibility of the ipopts semantic (I am using 'feel' there just to flag that is the portion that can give rise to subjective debate)
  • I am proposing to only fix the use after free bug with an approach that values minimizing the surface of the change to just the code path in which the bug exists more highly than trying to achieve your second goal at the expense (in terms of complexity of evaluation, consequent enhanced risk, and possibility of subjective debate in the path of a bug fix) of making a change to a point where several intertwined code paths converge (this is why, for example, my proposal does not include the 'decorative' conditional setting of ipopts to NULL after assignment to sc_ipopts)

I'm presently comfortable to let things proceed from there with the differences in the proposals thus understood - I haven't found any objective problem when evaluating your proposed change, and I don't think there is anything else to be done in terms of advocacy for my counteroffer.

sys/netinet/tcp_syncache.c
1782

I agree that my choice is subjective and I just wanted to make clear why I proposed what I proposed. That was not intended as an argument that it has to be done in that way.

I agree that your proposed fix only changes the code path which is relevant for this bug and that might be the reason to go down that path and weaken the semantic of ipopts.

It is hard for me to judge between the two proposals. That is why I tried to get a third opinion...

I like Patrick's change more, but maybe with more verbose comment.

Of course this gets things a bit more sophisticated. I don't like that syncache_socket() will again pull the ipopts from the mbuf as the syncache_add() already just did. Maybe after the fix try to make this less sophisticated?

Of course this gets things a bit more sophisticated. I don't like that syncache_socket() will again pull the ipopts from the mbuf as the syncache_add() already just did. Maybe after the fix try to make this less sophisticated?

syncache_socket() pulls the ipopts from the ACK when called by syncache_expand(). syncache_socket() does not pull them again when called via syncache_add(), since you can't pull them twice.

markj added inline comments.
sys/netinet/tcp_syncache.c
1782

I'll point out that this is not the first time this function has had bugs due to ambiguous ownership of resources that are released in syncache_free(), e.g., commit 44cb1e857f048.

Something like the following would be cleaner IMHO:

  • Add a syncache_release() helper which frees the credential, MAC label, etc., and use that in syncache_free().
  • When assigning cred, maclabel, ipopts to the corresponding syncache entry fields, clear the local pointers, like we do already for cred. This is in the spirit of Patrick's suggestion as I understand it.
  • Try to unify cleanup for sc whether or not it's &scs.

In the meantime, the change as proposed looks ok to me.

I like Patrick's change more, but maybe with more verbose comment.

OK, then let's go down that path. Not sure what additional comments you want. That part of the code is pretty clear to me, it is the weakened semantic of ipopts I was concerned about. But that only affects future changes.

sys/netinet/tcp_syncache.c
1782

At least for ipopts, this would mean to use

	sc->sc_ipopts = ipopts;
        if (sc != &scs)
                ipopts = NULL;

instead of just

sc->sc_ipopts = ipopts;

then add Patrick's suggestion and finally the cleanup can be simplified to be

if (ipopts != NULL)
        (void)m_free(ipopts);

This would be similar to the handling of cred. That has a level of consistency I like.
Is that acceptable to everyone?

sys/netinet/tcp_syncache.c
1782

I would like to minimize the number of sc != &scs cases when handling cleanup. In general though I'm a fan extending the pattern of explicitly transferring ownership by setting local pointers to NULL. The approach seems ok to me but it's hard to say without seeing the patch.

sys/netinet/tcp_syncache.c
1782

Let me update this review with such a patch.

A second proposed solution integrating Patrick's proposal. Now ipopts is handled like cred.

sys/netinet/tcp_syncache.c
1678

Can we not do this unconditionally, and try to clean up scs at the end of the function? Then you don't need to explicitly synchronize ipopts again after calling syncache_tfo_expand().

sys/netinet/tcp_syncache.c
1678

That would require to at least initialize some fields of scs when we not use it. Something we don't do right now.

sys/netinet/tcp_syncache.c
1678

This may be redundant to everyone's understanding at this point, but just in case...

In syncache_tfo_expand() the ownership of ipopts is only sometimes transferred. If the syncache_socket() call made by syncache_tfo_expand() returns via its allocfail path, ipopts needs to be freed in syncache_add(), otherwise it does not. Which one of those cases occurred needs to be identified by syncache_add() so it can behave accordingly.

One way to do that identification is to synchronize ipopts after the syncache_tfo_expand() call, and another way is to expand the predicate of the conditional ipopts cleanup to test for it.

sys/netinet/tcp_syncache.c
1829

In the case where the sc was found by syncache_lookup(), sc->ipopts will also point to ipopts, and so it isn't safe to free here. That case also needs to explicitly transfer ownership by setting ipopts = NULL.

sys/netinet/tcp_syncache.c
1829

When uploading the diff, I focussed on the structure of the handling. I did not go through the whole function to check all occurrences of ipopts. This is now done. Will upload another diff (in a different review) for a prototype using your suggested syncache_release() function.

sys/netinet/tcp_syncache.c
1828–1831

If you write it like this:

if (sc == &scs)
    ipopts = sc->sc_ipopts;
if (ipopts != NULL)
    m_free(ipopts);

Then:

  • you don't need to conditionally clear ipopts on line 1678, it can be done unconditionally,
  • there is no need to resynchronize ipopts after the syncache_tfo_expand() call.

That's simpler IHMO.

tuexen added inline comments.
sys/netinet/tcp_syncache.c
1828–1831

Done. But I don't think this really simplifies things. I would really prefer to use D57374 instead of this. In D57374 the three variables cred, ipopts, and maclabel are handled consistently. It is based on your suggestion to use a syncache_release() function.

sys/netinet/tcp_syncache.c
1828–1831

I like that version better too.

The alternative D57374 was committed.