tcp: LRO epoch issue as well a potential pullup issue.
AbandonedPublic
Actions

Authored by rrs on Jul 15 2021, 11:17 AM.

Details

Reviewers

tuexen
gallatin
• hselasky
glebius

Group Reviewers

transport

Summary

There are some drivers that call LRO without having been in
an epoch. This of course can cause a panic due to the assumption
that we are in an epoch. Also as long as we are here cleaning
up a few things, there is no validation that we don't need
a pullup. In theory drivers should always have the IP+TCP
headers in the first mbuf, but there is no assurance that
this is true, let's not assume it and instead validate it.
Also in investigating this the rack_bbr_common, which as
a pullup in it for one of its paths does a mfree in a pullup
failure, thats plain wrong.

Test Plan

Make sure that LRO still works and also get the reporter of the
epoch issue to test his specific circumstances with the patch
to validate that it fixes the epoch issue.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

rrs created this revision.Jul 15 2021, 11:17 AM

Herald added a reviewer: transport. · View Herald TranscriptJul 15 2021, 11:17 AM

Herald added a subscriber: melifaro. · View Herald Transcript

rrs requested review of this revision.Jul 15 2021, 11:17 AM

Spacing issues

another extra nl

Please use __predict_false() where possible.

tuexen added inline comments.Jul 15 2021, 11:42 AM

sys/netinet/tcp_lro.c
1269	Please use `__predict_false(needs_epoch)` instead of `needs_epoch`. See latest patch I sent...
1275	See above.
1290	See above.
1301	See above.
1347	See above.

• hselasky added inline comments.Jul 15 2021, 11:51 AM

sys/netinet/tcp_lro.c
1627	This code will never trigger, because the tcp_lro_parser() already checks that the full payload, including the TCP header and options, fit into m_len. If it doesn't it returns NULL, and so the packet is not LRO'ed. /* We expect a contiguous header [eh, ip, tcp]. */ pa = tcp_lro_parser(m, &po, &pi, true); if (__predict_false(pa == NULL)) return (TCP_LRO_NOT_SUPPORTED);

Why is this even needed?

The way I'm reading this, almost all drivers mark themselves as network drivers (INTR_TYPE_NET) when registering interrupts. When that happens, ithread_loop() does the NET_EPOCH_ENTER() for them, around calls to their interrupt handler. It is only "special snowflake" drivers that do not use this mechanism and mark themselves as IFF_KNOWSEPOCH. So I'd say this should be replaced by a NET_EPOCH_ASSERT()

There is a similar check in ether_input which Gleb added. I've added him to the review for his comment.

A couple of bits..

Hans great to hear that the pullup is not needed earlier in the parser.. I can pull that code out then.

Michael, yes I can put things at predict false

Drew, the bits on epoch is an actual reported bug (Michael can chime in on the number.. NFS or some such I guess).

And something in this is broken yet so there will be more changes.. if I have LRO going through the latest
rack code things break still debugging

rrs added inline comments.Jul 15 2021, 1:35 PM

sys/netinet/tcp_lro.c
1627	So I have now been looking at the parser, can you explain to me how it is checking this? The first thing I see that tcp_lro_parser() does is call the low level parser, but that only gets m->m_data and walks through the pointer, it does not validate that m_len is long enough to do what its doing. Now I agree that most drivers will not be sending these short mbufs in, but how is just walking the headers validating that a driver did not violate the thing? Maybe I am missing something can you please point out to me how this is validated?? Thanks

• hselasky added inline comments.Jul 15 2021, 1:41 PM

sys/netinet/tcp_lro.c
1627	Sure. When packets enter the tcp_lro_parser() I've factored out the generic parts into the tcp_lro_low_level_parser() so that it can be used for both inner and outer headers. Before the tcp_lro_low_level_parser() function returns non-NULL it updates parser->total_hdr_len, which include all TCP options, if any. Then before tcp_lro_parser() returns, it checks total_hdr_len agains m_len: if (data_ptr == NULL \|\| po->total_hdr_len > m->m_len) return (NULL); If the m_len is not big enough to contain all inner and outer headers, including TCP options, then it fails passing the packet into the LRO engine basically. --HPS

rrs added inline comments.Jul 15 2021, 1:58 PM

sys/netinet/tcp_lro.c
1627	Yes I see that this prevents us from having a packet that fails that particular test... But what happens if I send in (from some errant driver) a m_pkthdr with the E-net and IP header but not the TCP header.. yes a driver should not do that.. but you make no provisions in the parsing code. All you are going to do is romp on through whatever was in the mbuf before and dereference the data.. For example if you see its IPPROT_TCP then you just set parser->tcp = ptr and then dereference that pointer (to get option length as well as the ports). This is not going to go well if a driver sends in a non-contiguous block.. you may catch it, but you may not. We probably need to send in the m_len to the low-level parser and make sure it does not exceed that somehow.

Drew
Here is the bug Michael did the epoch thing over

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254695

• hselasky added inline comments.Jul 15 2021, 2:34 PM

sys/netinet/tcp_lro.c
1627	Hi Randall, The tcp_lro_parser() function returns NULL if there is only ETH+IP headers, if you look carefully. The tcp_lro_parser() function by default, only returns non-NULL, if the inner header is TCP and all of it fits into the first mbuf in the chain.

rrs added inline comments.Jul 15 2021, 3:01 PM

sys/netinet/tcp_lro.c
1627	Hans This is broken if the caller send you ETH+IP+TCP and the TCP_Options in the next mbuf, you will deref the options and read garbage. If the caller passes you ETH+IP and the TCP + Options is in the next mbuf then you will deref whatever happens to be in the mbuf. You thus end up with unknown results and problems.

Michael and I have chatted on the transport sync. I am going to remove
this review, and he is going to separate out the EPOCH review on its own.

Then we need to address the fact that the parser code is possibly
wandering through garbage in a separate review. We either need
to protect against this i.e. using pullup or pass in the m_len and track
how much is left.. or fix it with pullups..

rrs abandoned this revision.Jul 15 2021, 3:03 PM

• hselasky added inline comments.Jul 15 2021, 3:15 PM

sys/netinet/tcp_lro.c
1627	This is broken if the caller send you ETH+IP+TCP and the TCP_Options in the next mbuf, you will deref the options and read garbage. Hi Randall, The current LRO code will just forward such packets AS-IS to the TCP stack, because the tcp_lro_parser() will return NULL then! mbuf chains like you describe are _not_ touched by the current LRO code. --HPS

This is a driver bug. The hyperv driver appears to be using a taskqueue to inject packets into the network stack, rather than a normal interrupt handler. As such, I feel that it is the hyperv driver's responsibility to take the network epoch, and not force 99% of drivers which do the right thing to take the epoch twice.

I'd really like to simplify all input paths, not make them more complex with unneeded tests and atomic operations.