Paths

Table of Contentst

WIP: Add scalable route caching framework
Needs ReviewPublic
Actions

Authored by zlei on Dec 20 2022, 12:38 PM.

Details

Reviewers

melifaro
kp
ae

Group Reviewers

network

Summary

This is inspired by counter(9) PCPU implementation. @ae remove cached route support for tunnel interfaces in 0b9f5f8a, f325335ca . The original implementation does not work with concurrent io as the cached route is not properly synchronized, and it acquires exclusive lock since ip_output() / ip6_output() may update it.

A simple gif(4) over disc(4) test on a N5105 box shows about 3% - 17% increasing of forwarding rate, and 5% - 41% decreasing of CPU use time vary the size of routing table.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

zlei created this revision.Dec 20 2022, 12:38 PM

Herald added subscribers: glebius, imp. · View Herald TranscriptDec 20 2022, 12:38 PM

zlei requested review of this revision.Dec 20 2022, 12:38 PM

you should add a sample consumer to illustrate the api in use and a flamegraph

zlei mentioned this in D37758: WIP: if_gif: Demonstrate usage of route caching framework.Dec 20 2022, 12:48 PM

In D37757#858771, @mjg wrote:

you should add a sample consumer to illustrate the api in use and a flamegraph

Please see D37758 .

In D37757#858771, @mjg wrote:

you should add a sample consumer to illustrate the api in use and a flamegraph

I do not have access to the N5105 box right now. I'll benchmark when it is available.

zlei added a child revision: D37758: WIP: if_gif: Demonstrate usage of route caching framework.Dec 20 2022, 1:20 PM

konrad.kreciwilk_korbank.pl added a subscriber: konrad.kreciwilk_korbank.pl.Dec 20 2022, 1:35 PM

zlei added a child revision: D37760: WIP: if_ovpn: Demonstrate usage of route caching framework.Dec 20 2022, 3:35 PM

zlei added inline comments.Dec 20 2022, 3:40 PM

sys/net/route/route_cache.h
79	This is important as if the lock of PCPU route cache is held by other thread, then we never block and fallback to no route caching.

First of all, I'd like to thank you for working on this. Caching routes/nexthops for the tunnels is a thing we should have in base and I'd really love to have it landed.
I like the part of the diff where you establish a clear KPI to register/unregister/use cached routes.

I'd like to offer a suggestion on the datapath implementation part.
In many setups, you either have many tunnels or a full-view, so typically the amount of route churn is low (< 1 change/sec). Though, even with the full-view and enabled algo you may end up with ~ 20 changes/sec.
I'd consider avoiding taking any locks (even uncontested ones) on the hot path and do it slightly differently:

extend struct route with the custom "update_cache" callback pointer
the callback task is to update the cached data in struct route in the "safe" fashion
struct route_cache can start with struct route and is followed by the mutex, which can be used as the sync mechanism
PCB caching is generalised to this mechanism, so ip[6]_output() and ether_output() should call the callback instead of invalidating cache directly

What do you think?

sys/net/route/route_cache.h
37	Different `struct route` differ only by the tail `struct sockaddr_XX`. I'd consider having a unified structure of everything, sufficient to hold any supported/compiled family. Union inside can be `struct route`, `struct route_in` and `struct route_in6`, with the latter two being conditionally enabled. The former one can be used in most of the internal code, as it doesn't care about the actual sockaddr.

zlei added inline comments.Dec 21 2022, 3:12 PM

sys/net/route/route_cache.h
37	My previous version is exactly struct route_cache_pcpu { struct mtx mtx; union { #ifdef INET struct route ro; #endif #ifdef INET6 struct route_in6 ro6; #endif } } I worried about future change of `struct route_in6` then `route_cache_pcpu` may occupy extra unused cache line then it will punish dual stacked host. And also consumers should include both `opt_inet.h` and `opt_inet6.h` otherwise boom. Anyway that could be solved by exposing pointer `struct route_cache *` rather than the structure but this introduce another hop ( sc->route_cache->pcpu ). For modern CPUs this might be negligible but by I doubt it deserves. Indeed I'm planning to embed `rt_cookie` into `struct route[_in6]` so that route cache can be asynchronously revalidated without waiting on lock ( if trylock fails just ignore and continue, the lock is held by hot data path and will be validated ). Embedding `rt_cookie` into `struct route_in6` then `struct route_cache_pcpu` will need one more cache line on 64bit systems.
60	A second thought that if embed `family` and `fibnum` into `struct route_cache` then the implementation will be more simple (and also the usage of APIs).
67	These per AF method indeed can be turned into internal then consumers will use unified APIs listed above.
sys/net/route/route_cache_in.c
140 ↗	(On Diff #114313)	so that route cache can be asynchronously revalidated without waiting on lock This can be improve as such: CPU_FOREACH(cpu) { pcpu = zpcpu_get_cpu(rc->pcpu, cpu); if (mtx_trylock(&pcpu->mtx) && pcpu->ro.ro_nh != NULL) { NH_VALIDATE(&pcpu->ro, &pcpu->ro.inp_rt_cookie, rc->fibnum); mtx_unlock(&pcpu->mtx); } }

In D37757#859070, @melifaro wrote:

First of all, I'd like to thank you for working on this. Caching routes/nexthops for the tunnels is a thing we should have in base and I'd really love to have it landed.
I like the part of the diff where you establish a clear KPI to register/unregister/use cached routes.

I'd like to offer a suggestion on the datapath implementation part.
In many setups, you either have many tunnels or a full-view, so typically the amount of route churn is low (< 1 change/sec). Though, even with the full-view and enabled algo you may end up with ~ 20 changes/sec.
I'd consider avoiding taking any locks (even uncontested ones) on the hot path and do it slightly differently:

I'm not willing to add a mutex to struct route_cache_pcpu while it is possible that hot path may interrupted by other threads. Since the cached route is per-CPU other thread may also access it and potentially it will be partially modified. A pair of critical_enter and critical_exit around ip_output seems too heavy weighted.

extend struct route with the custom "update_cache" callback pointer

How does this callback work?

the callback task is to update the cached data in struct route in the "safe" fashion

I'm interested in the safe portion.

struct route_cache can start with struct route and is followed by the mutex, which can be used as the sync mechanism

PCB caching is generalised to this mechanism, so ip[6]_output() and ether_output() should call the callback instead of invalidating cache directly

So PCB caching employ struct route_cache rather than struct route[_in6] ?

What do you think?

extend struct route with the custom "update_cache" callback pointer

How does this callback work?

the callback task is to update the cached data in struct route in the "safe" fashion

I'm interested in the safe portion.

struct route_cache can start with struct route and is followed by the mutex, which can be used as the sync mechanism

PCB caching is generalised to this mechanism, so ip[6]_output() and ether_output() should call the callback instead of invalidating cache directly

So PCB caching employ struct route_cache rather than struct route[_in6] ?

What do you think?

This is the example of what I have in mind. Does it help to clarify the points?

struct route {
        struct  nhop_object *ro_nh;
        struct  llentry *ro_lle;
        char            *ro_prepend;
        void            *ro_update_func; /* new func */
        uint16_t        ro_plen;
        uint16_t        ro_flags;
        uint16_t        ro_mtu; /* saved ro_rt mtu */
        uint16_t        spare;
        struct  sockaddr ro_dst;
}

struct route_cache {
  union {
   struct route ro;
   struct route_in ro_in;
   struct route_in6 ro_in6
  }
 struct mtx lock;
 struct rib_subscription *rs;
}

/*
  Generic long-lived route object update func
*/
static bool
cache_update(struct route *ro, int type, const void *old_obj, void *new_obj)
{
  struct route_cache *rc = (struct route_cache *)ro;
  result = false;

  RC_LOCK(rc);
  if (type == TYPE_NHOP) {
   if (ro->ro_nh == old_obj) {
    ro->ro_nh = new_obj;
    result = true;
  } else
    old_obj = NULL;
 } else if (type == TYPE_LLE) {
   ...
 } else if (type == TYPE_CLEAR) {
 // zero both nh & lle
}
 RC_UNLOCK(rc);

 if (type == TYPE_NHOP) {
   if (old_obj != NULL) {
    nhop_free(old_obj);
  }
 } else if (type == TYPE_LLE) {
   if (old_obj != NULL)
    lle_free(old_obj);
 }

  return result;
}

struct route *route_cache_init(fib, dst)
{
  struct route_cache *rc = malloc();
  rc->ro.update_func = &cache_update;
  ...
  return &rc->ro;
}

void
route_cache_clear(struct route *ro)
{
  if (ro->update_func)
   ro->update_func(ro, TYPE_CLEAR, NULL, NULL);
}

void
route_cache_update(struct route *ro, int type, const void *old_obj, void *new_obj)
{
 if (ro->update_func)
  return (ro->update_func(ro, type, old_obj, new_obj))
 return true;
}

/* PCB update_func */
bool
pcb_update_cache(struct route *ro, int type, const void *old_obj, void *new_obj)
{
if (type == TYPE_CLEAR)
  RO_INVALIDATE_CACHE(ro);
}

/* ip_output: */

        if (ro->ro_nh != NULL &&
            ((!NH_IS_VALID(ro->ro_nh)) || dst->sin_family != AF_INET ||
            dst->sin_addr.s_addr != ip->ip_dst.s_addr))
{
OLD:
                RO_INVALIDATE_CACHE(ro);
NEW:
                route_cache_clear(ro);
}

OLD:
                        ro->ro_nh = fib4_lookup(fibnum, dst->sin_addr, 0,
                            NHR_REF, flowid);

NEW:
  nh = fib4_lookup(fibnum, dst->sin_addr, 0, NHR_REF, flowid);
  if (!route_cache_update(ro, TYPE_NHOP, NULL, nh))
    nh_free(nh);


USAGE;

tunnel_sc->route_cache = route_cache_init()
ip[6]_output(..., tunnel_sc->route_cache)

Revert to my previous version, and adopt some @melifaro 's suggestions . The KPIs keep almost the same.

zlei marked an inline comment as done.Dec 23 2022, 3:56 PM

struct route_cache {
  union {
   struct route ro;
   struct route_in ro_in;
   struct route_in6 ro_in6
  }
 struct mtx lock;
 struct rib_subscription *rs;
}

Does the lock protect the read part ?
If yes then it is apparent a contention when concurrent network IO occurs.

In D37757#860208, @zlei wrote:
struct route_cache {
  union {
   struct route ro;
   struct route_in ro_in;
   struct route_in6 ro_in6
  }
 struct mtx lock;
 struct rib_subscription *rs;
}
Does the lock protect the read part ?
If yes then it is apparent a contention when concurrent network IO occurs.

No, the update fucnction, which uses the lock, is inly called when it is needed to update the cache.
The read path is not affected.

In D37757#860215, @melifaro wrote:
In D37757#860208, @zlei wrote:
struct route_cache {
  union {
   struct route ro;
   struct route_in ro_in;
   struct route_in6 ro_in6
  }
 struct mtx lock;
 struct rib_subscription *rs;
}
Does the lock protect the read part ?
If yes then it is apparent a contention when concurrent network IO occurs.
No, the update fucnction, which uses the lock, is inly called when it is needed to update the cache.
The read path is not affected.

Let me clarify it a bit further. The idea is change the parts of the stack that currently update nhop/lle cache in the struct route directly to use the new callback.
The hot/data/read part remains unaffected - no atomic or other ops are added.
When route cache is invalidated (route_cache_invalidate() clearing the pointer) datapath may read old nhop/lle pointer. It is safe, as both nhop and lles use epoch-backed garbage collection.

In D37757#860348, @melifaro wrote:
In D37757#860215, @melifaro wrote:
In D37757#860208, @zlei wrote:
struct route_cache {
  union {
   struct route ro;
   struct route_in ro_in;
   struct route_in6 ro_in6
  }
 struct mtx lock;
 struct rib_subscription *rs;
}
Does the lock protect the read part ?
If yes then it is apparent a contention when concurrent network IO occurs.
No, the update fucnction, which uses the lock, is inly called when it is needed to update the cache.
The read path is not affected.
Let me clarify it a bit further. The idea is change the parts of the stack that currently update nhop/lle cache in the struct route directly to use the new callback.
The hot/data/read part remains unaffected - no atomic or other ops are added.
When route cache is invalidated (route_cache_invalidate() clearing the pointer) datapath may read old nhop/lle pointer. It is safe, as both nhop and lles use epoch-backed garbage collection.