Page MenuHomeFreeBSD

Overhaul ip_encap KPI to be lockless on data path
ClosedPublic

Authored by ae on May 30 2018, 3:34 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mar 8 2024, 9:08 PM
Unknown Object (File)
Jan 18 2024, 1:04 PM
Unknown Object (File)
Dec 20 2023, 4:52 AM
Unknown Object (File)
Oct 20 2023, 8:15 AM
Unknown Object (File)
Oct 14 2023, 9:22 AM
Unknown Object (File)
Oct 4 2023, 6:10 PM
Unknown Object (File)
Aug 1 2023, 8:33 PM
Unknown Object (File)
Aug 1 2023, 8:32 PM
Subscribers

Details

Summary

ip_encap KPI is used by network stack to handle various IP encapsulation types.
Currently it has several disadvantages:

  • it uses single mutex to protect internal structures. It is used by data- and control- path, thus there are no parallelism at all.
  • it uses single list to keep encap handlers for both INET and INET6 families.
  • struct encaptab keeps unneeded information (src, dst, masks, protosw), that isn't used by code in the source tree.
  • matches are prioritized and when many tunneling interfaces are registered, encapcheck handler of each interface is invoked for each packet. the search takes O(n) for n interfaces.

What is contained in this patch:

  • data path converted to be lockless using epoch(9) KPI
  • struct encaptab now linked using CK_LIST
  • struct encaptab now contains new fields: "min_length" and "exact_match". min_length is the minimal value of packet length, that encap handler expects to see. exact_match is value that returned by the encapcheck handler, and it means that handler wants to handle this packet.
  • two lists are created to separate IPv4 and IPv6 handlers;
  • added new "encap_lookup_t" method, that will be used later. It is targeted to speedup lookup of needed interface, when gif(4)/gre(4) have many interfaces.
  • the need to use protosw structure is eliminated. The only pr_input method was used from this structure, so I don't see the need to keep using it.
  • encap_input_t method changed to avoid using mbuf tags to store softc pointer. Now it is passed directly trough encap_input method.
  • all sockaddr structures and code that uses them removed. We don't have any code in the tree that uses them. All consumers use encap_attach_func() method, that relies on invoking of encapcheck callback to get the needed handler.
  • struct encap_config introduced, it contains parameters of encap handler that is going to be registered by encap_attach() functions.
  • attach/detach functions are separated by address family to ip_encap_attach() and ip6_encap_attach().
  • all current consumers changed to use new KPI.
  • encap[46]_input() function now stops the search if returned by encapcheck method value matches exact_match value.
  • encap handlers are stored in lists ordered by exact_match value, thus handlers that need more bits to match will be checked first, and if encapcheck method returns exact_match value, the search will be stopped.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

ae added reviewers: mmacy, network.
ae set the repository for this revision to rS FreeBSD src repository - subversion.
sys/netinet/ip_encap.c
166 ↗(On Diff #43141)

I'm not familiar with how this code is used. How often is encap_detach called? If it's frequent we should be passing it to epoch_call(). This would also mean that you wouldn't have to drop the lock in the middle.

sys/netinet/ip_encap.c
166 ↗(On Diff #43141)

It is called as a result of user actions, e.g. interface reconfiguration or destroying, kernel module unloading. I tried use epoch_call(), it works too. epoch_wait() looks a bit simpler.

I can only really give a cursory thumbs up for the networking changes themselves. I don't even know how to test this code. Nonetheless, the locking change is straightforward and looks fine to me provided the detach path can handle delays in wait.

sys/netinet/ip_encap.c
6 ↗(On Diff #43141)

I think newer copyrights always follow older copyrights.

224 ↗(On Diff #43141)

Is it intentional to remove the "last resort" behavior?

This revision is now accepted and ready to land.Jun 1 2018, 7:16 PM
ae marked an inline comment as done.Jun 2 2018, 12:05 PM

I found small bug in if_gre implementation, and also changed encap_lookup_t prototype to not lost some old behavior. I also modified if_gif(4) to use epoch(9) instead of rmlock, so I post the patch after some testing. Also I'll try to do some benchmarks to show the benefits.

sys/netinet/ip_encap.c
224 ↗(On Diff #43141)

It is done in the AF-specific encap[46]_input() when encap_input() returns zero.

This revision was automatically updated to reflect the committed changes.
ae marked an inline comment as done.