There are some drivers that call LRO without having been in
an epoch. This of course can cause a panic due to the assumption
that we are in an epoch. Also as long as we are here cleaning
up a few things, there is no validation that we don't need
a pullup. In theory drivers should always have the IP+TCP
headers in the first mbuf, but there is no assurance that
this is true, let's not assume it and instead validate it.
Also in investigating this the rack_bbr_common, which as
a pullup in it for one of its paths does a mfree in a pullup
failure, thats plain wrong.
Details
- Reviewers
tuexen gallatin • hselasky glebius - Group Reviewers
transport
Make sure that LRO still works and also get the reporter of the
epoch issue to test his specific circumstances with the patch
to validate that it fixes the epoch issue.
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
sys/netinet/tcp_lro.c | ||
---|---|---|
1627 | This code will never trigger, because the tcp_lro_parser() already checks that the full payload, including the TCP header and options, fit into m_len. If it doesn't it returns NULL, and so the packet is not LRO'ed. /* We expect a contiguous header [eh, ip, tcp]. */ pa = tcp_lro_parser(m, &po, &pi, true); if (__predict_false(pa == NULL)) return (TCP_LRO_NOT_SUPPORTED); |
Why is this even needed?
The way I'm reading this, almost all drivers mark themselves as network drivers (INTR_TYPE_NET) when registering interrupts. When that happens, ithread_loop() does the NET_EPOCH_ENTER() for them, around calls to their interrupt handler. It is only "special snowflake" drivers that do not use this mechanism and mark themselves as IFF_KNOWSEPOCH. So I'd say this should be replaced by a NET_EPOCH_ASSERT()
There is a similar check in ether_input which Gleb added. I've added him to the review for his comment.
A couple of bits..
Hans great to hear that the pullup is not needed earlier in the parser.. I can pull that code out then.
Michael, yes I can put things at predict false
Drew, the bits on epoch is an actual reported bug (Michael can chime in on the number.. NFS or some such I guess).
And something in this is broken yet so there will be more changes.. if I have LRO going through the latest
rack code things break still debugging
sys/netinet/tcp_lro.c | ||
---|---|---|
1627 | So I have now been looking at the parser, can you explain to me how The first thing I see that tcp_lro_parser() does is call the low level parser, but Now I agree that most drivers will not be sending these short mbufs Maybe I am missing something can you please point out to me how this Thanks |
sys/netinet/tcp_lro.c | ||
---|---|---|
1627 | Sure. When packets enter the tcp_lro_parser() I've factored out the generic parts into the tcp_lro_low_level_parser() so that it can be used for both inner and outer headers. Before the tcp_lro_low_level_parser() function returns non-NULL it updates parser->total_hdr_len, which include all TCP options, if any. Then before tcp_lro_parser() returns, it checks total_hdr_len agains m_len: if (data_ptr == NULL || po->total_hdr_len > m->m_len) return (NULL); If the m_len is not big enough to contain all inner and outer headers, including TCP options, then it fails passing the packet into the LRO engine basically. --HPS |
sys/netinet/tcp_lro.c | ||
---|---|---|
1627 | Yes I see that this prevents us from having a packet that fails that particular test... But what happens if I send in (from some errant driver) a m_pkthdr with the E-net and IP header but not For example if you see its IPPROT_TCP then you just set parser->tcp = ptr and then dereference We probably need to send in the m_len to the low-level parser and make sure it does not |
Drew
Here is the bug Michael did the epoch thing over
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254695
R
sys/netinet/tcp_lro.c | ||
---|---|---|
1627 | Hi Randall, The tcp_lro_parser() function returns NULL if there is only ETH+IP headers, if you look carefully. The tcp_lro_parser() function by default, only returns non-NULL, if the inner header is TCP and all of it fits into the first mbuf in the chain. |
sys/netinet/tcp_lro.c | ||
---|---|---|
1627 | Hans This is broken if the caller send you ETH+IP+TCP and the TCP_Options in the next mbuf, you If the caller passes you ETH+IP and the TCP + Options is in the next mbuf You thus end up with unknown results and problems. |
Michael and I have chatted on the transport sync. I am going to remove
this review, and he is going to separate out the EPOCH review on its own.
Then we need to address the fact that the parser code is possibly
wandering through garbage in a separate review. We either need
to protect against this i.e. using pullup or pass in the m_len and track
how much is left.. or fix it with pullups..
sys/netinet/tcp_lro.c | ||
---|---|---|
1627 |
Hi Randall, The current LRO code will just forward such packets AS-IS to the TCP stack, because the tcp_lro_parser() will return NULL then! mbuf chains like you describe are _not_ touched by the current LRO code. --HPS |
This is a driver bug. The hyperv driver appears to be using a taskqueue to inject packets into the network stack, rather than a normal interrupt handler. As such, I feel that it is the hyperv driver's responsibility to take the network epoch, and not force 99% of drivers which do the right thing to take the epoch twice.
I'd really like to simplify all input paths, not make them more complex with unneeded tests and atomic operations.