Page MenuHomeFreeBSD

Enforce the network epoch contract at entry points in TCP LRO code
ClosedPublic

Authored by tuexen on Aug 23 2021, 5:03 PM.

Details

Summary

Since 69a34e8d0235c0304a28bf8ab076b931aa61835f it is required for the caller of

  • tcp_lro_flush()
  • tcp_lro_flush_all()
  • tcp_lro_queue_mbuf()
  • tcp_lro_flush_inactive()

to enter the network epoch when the kernel is compiled with options TCPHPTS. Instead if asserting for this in the LRO internal function tcp_lro_lookup(), do this at the entry functions of the LRO code which ends up in calling tcp_lro_lookup().
This makes the contract regarding to the network epoch handling clearer for users of TCP LRO code.

If we agree on this, fixing bugs like 254695 will be simpler.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Why is this under #ifdef ? Anything inputting packets via if_input() needs the network epoch.

This comment was removed by gallatin.
This revision now requires changes to proceed.Aug 24 2021, 2:55 AM

Please no.

99% of drivers already have the network epoch via their ithread. For the few that don't, like the hyperv driver in the associated bug, the driver needs to take the network epoch. Let's please not penalize the 99% of drivers that already have the epoch for a few oddball drivers that don't use ithreads.

Now I'm confused. My intention was do enforce what you suggest: the drivers need to enter and leave the network epoch.

Are you saying that 99% of the driver which have the network epoch via their ithread result having it such that a NET_EPOCH_ASSERT() would fail? Is there any other way of testing?

Just be crystal clear: I want to enforce that the drivers are fixed by making clear at the entry points of the TCP LRO code what the LRO code expects. Do you think this is wrong?

Why is this under #ifdef ? Anything inputting packets via if_input() needs the network epoch.

I put it under #ifdef, because the intention of this patch is only to move an NET_EPOCH_ASSERT() from an internal function to the beginning on functions which enter the LRO code and end up in calling the internal function.

If we agree on this plan, we can remove the #ifdef where the caller always should be in the network epoch.

Please no.

99% of drivers already have the network epoch via their ithread. For the few that don't, like the hyperv driver in the associated bug, the driver needs to take the network epoch. Let's please not penalize the 99% of drivers that already have the epoch for a few oddball drivers that don't use ithreads.

Now I'm confused. My intention was do enforce what you suggest: the drivers need to enter and leave the network epoch.

Are you saying that 99% of the driver which have the network epoch via their ithread result having it such that a NET_EPOCH_ASSERT() would fail? Is there any other way of testing?

I'm so sorry, I read the patch incorrectly. I read it as you were entering the epoch, not that you were expanding the assertions. Please disregard my comment.

Just be crystal clear: I want to enforce that the drivers are fixed by making clear at the entry points of the TCP LRO code what the LRO code expects. Do you think this is wrong?

Why is this under #ifdef ? Anything inputting packets via if_input() needs the network epoch.

I put it under #ifdef, because the intention of this patch is only to move an NET_EPOCH_ASSERT() from an internal function to the beginning on functions which enter the LRO code and end up in calling the internal function.

If we agree on this plan, we can remove the #ifdef where the caller always should be in the network epoch.

Yes, removing the ifdefs would be a good idea.

Don't put NET_EPOCH_ASSERT() under #ifdef TCPHPTS as suggested.

This revision is now accepted and ready to land.Aug 25 2021, 3:05 PM
This comment was removed by freebsd_darkain.com.
sys/netinet/tcp_lro.c
601

Not sure what is going on, but this broke the if_oce driver. Commenting out this one line, and then driver comes up and works as expected.

Is there something that is now expected to be set somewhere that the OCE NIC driver isn't properly setting?

Phabricator is being super wonky with my commenting.

TLDR: if_oce is hitting this assert when loading the driver.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260330