Page MenuHomeFreeBSD

WIP: Add scalable route caching framework
Needs ReviewPublic

Authored by zlei on Dec 20 2022, 12:38 PM.
Tags
None
Referenced Files
Unknown Object (File)
Oct 4 2024, 8:48 PM
Unknown Object (File)
Oct 4 2024, 6:02 PM
Unknown Object (File)
Oct 4 2024, 5:56 AM
Unknown Object (File)
Oct 1 2024, 11:05 PM
Unknown Object (File)
Oct 1 2024, 9:24 PM
Unknown Object (File)
Sep 21 2024, 2:58 AM
Unknown Object (File)
Sep 16 2024, 2:34 PM
Unknown Object (File)
Sep 11 2024, 2:12 AM

Details

Reviewers
melifaro
kp
ae
Group Reviewers
network
Summary

This is inspired by counter(9) PCPU implementation. @ae remove cached route support for tunnel interfaces in 0b9f5f8a, f325335ca . The original implementation does not work with concurrent io as the cached route is not properly synchronized, and it acquires exclusive lock since ip_output() / ip6_output() may update it.

A simple gif(4) over disc(4) test on a N5105 box shows about 3% - 17% increasing of forwarding rate, and 5% - 41% decreasing of CPU use time vary the size of routing table.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

zlei requested review of this revision.Dec 20 2022, 12:38 PM

you should add a sample consumer to illustrate the api in use and a flamegraph

In D37757#858771, @mjg wrote:

you should add a sample consumer to illustrate the api in use and a flamegraph

Please see D37758 .

In D37757#858771, @mjg wrote:

you should add a sample consumer to illustrate the api in use and a flamegraph

I do not have access to the N5105 box right now. I'll benchmark when it is available.

sys/net/route/route_cache.h
79

This is important as if the lock of PCPU route cache is held by other thread, then we never block and fallback to no route caching.

First of all, I'd like to thank you for working on this. Caching routes/nexthops for the tunnels is a thing we should have in base and I'd really love to have it landed.
I like the part of the diff where you establish a clear KPI to register/unregister/use cached routes.

I'd like to offer a suggestion on the datapath implementation part.
In many setups, you either have many tunnels or a full-view, so typically the amount of route churn is low (< 1 change/sec). Though, even with the full-view and enabled algo you may end up with ~ 20 changes/sec.
I'd consider avoiding taking any locks (even uncontested ones) on the hot path and do it slightly differently:

  • extend struct route with the custom "update_cache" callback pointer
  • the callback task is to update the cached data in struct route in the "safe" fashion
  • struct route_cache can start with struct route and is followed by the mutex, which can be used as the sync mechanism
  • PCB caching is generalised to this mechanism, so ip[6]_output() and ether_output() should call the callback instead of invalidating cache directly

What do you think?

sys/net/route/route_cache.h
37

Different struct route differ only by the tail struct sockaddr_XX.
I'd consider having a unified structure of everything, sufficient to hold any supported/compiled family.
Union inside can be struct route, struct route_in and struct route_in6, with the latter two being conditionally enabled. The former one can be used in most of the internal code, as it doesn't care about the actual sockaddr.

sys/net/route/route_cache.h
37

My previous version is exactly

struct route_cache_pcpu {
	struct mtx mtx;
	union {
#ifdef INET
		struct route ro;
#endif
#ifdef INET6
		struct route_in6 ro6;
#endif
	}
}

I worried about future change of struct route_in6 then route_cache_pcpu may occupy extra unused cache line then it will punish dual stacked host.
And also consumers should include both opt_inet.h and opt_inet6.h otherwise boom.
Anyway that could be solved by exposing pointer struct route_cache * rather than the structure but this introduce another hop ( sc->route_cache->pcpu ). For modern CPUs this might be negligible but by I doubt it deserves.

Indeed I'm planning to embed rt_cookie into struct route[_in6] so that route cache can be asynchronously revalidated without waiting on lock ( if trylock fails just ignore and continue, the lock is held by hot data path and will be validated ).

Embedding rt_cookie into struct route_in6 then struct route_cache_pcpu will need one more cache line on 64bit systems.

60

A second thought that if embed family and fibnum into struct route_cache then the implementation will be more simple (and also the usage of APIs).

67

These per AF method indeed can be turned into internal then consumers will use unified APIs listed above.

sys/net/route/route_cache_in.c
140 ↗(On Diff #114313)

so that route cache can be asynchronously revalidated without waiting on lock

This can be improve as such:

CPU_FOREACH(cpu) {
	pcpu = zpcpu_get_cpu(rc->pcpu, cpu);

	if (mtx_trylock(&pcpu->mtx) && pcpu->ro.ro_nh != NULL) {
		NH_VALIDATE(&pcpu->ro, &pcpu->ro.inp_rt_cookie, rc->fibnum);
		mtx_unlock(&pcpu->mtx);
	}
}

First of all, I'd like to thank you for working on this. Caching routes/nexthops for the tunnels is a thing we should have in base and I'd really love to have it landed.
I like the part of the diff where you establish a clear KPI to register/unregister/use cached routes.

I'd like to offer a suggestion on the datapath implementation part.
In many setups, you either have many tunnels or a full-view, so typically the amount of route churn is low (< 1 change/sec). Though, even with the full-view and enabled algo you may end up with ~ 20 changes/sec.
I'd consider avoiding taking any locks (even uncontested ones) on the hot path and do it slightly differently:

I'm not willing to add a mutex to struct route_cache_pcpu while it is possible that hot path may interrupted by other threads. Since the cached route is per-CPU other thread may also access it and potentially it will be partially modified. A pair of critical_enter and critical_exit around ip_output seems too heavy weighted.

  • extend struct route with the custom "update_cache" callback pointer

How does this callback work?

  • the callback task is to update the cached data in struct route in the "safe" fashion

I'm interested in the safe portion.

  • struct route_cache can start with struct route and is followed by the mutex, which can be used as the sync mechanism
  • PCB caching is generalised to this mechanism, so ip[6]_output() and ether_output() should call the callback instead of invalidating cache directly

So PCB caching employ struct route_cache rather than struct route[_in6] ?

What do you think?

  • extend struct route with the custom "update_cache" callback pointer

How does this callback work?

  • the callback task is to update the cached data in struct route in the "safe" fashion

I'm interested in the safe portion.

  • struct route_cache can start with struct route and is followed by the mutex, which can be used as the sync mechanism
  • PCB caching is generalised to this mechanism, so ip[6]_output() and ether_output() should call the callback instead of invalidating cache directly

So PCB caching employ struct route_cache rather than struct route[_in6] ?

What do you think?

This is the example of what I have in mind. Does it help to clarify the points?

struct route {
        struct  nhop_object *ro_nh;
        struct  llentry *ro_lle;
        char            *ro_prepend;
        void            *ro_update_func; /* new func */
        uint16_t        ro_plen;
        uint16_t        ro_flags;
        uint16_t        ro_mtu; /* saved ro_rt mtu */
        uint16_t        spare;
        struct  sockaddr ro_dst;
}

struct route_cache {
  union {
   struct route ro;
   struct route_in ro_in;
   struct route_in6 ro_in6
  }
 struct mtx lock;
 struct rib_subscription *rs;
}

/*
  Generic long-lived route object update func
*/
static bool
cache_update(struct route *ro, int type, const void *old_obj, void *new_obj)
{
  struct route_cache *rc = (struct route_cache *)ro;
  result = false;

  RC_LOCK(rc);
  if (type == TYPE_NHOP) {
   if (ro->ro_nh == old_obj) {
    ro->ro_nh = new_obj;
    result = true;
  } else
    old_obj = NULL;
 } else if (type == TYPE_LLE) {
   ...
 } else if (type == TYPE_CLEAR) {
 // zero both nh & lle
}
 RC_UNLOCK(rc);

 if (type == TYPE_NHOP) {
   if (old_obj != NULL) {
    nhop_free(old_obj);
  }
 } else if (type == TYPE_LLE) {
   if (old_obj != NULL)
    lle_free(old_obj);
 }

  return result;
}

struct route *route_cache_init(fib, dst)
{
  struct route_cache *rc = malloc();
  rc->ro.update_func = &cache_update;
  ...
  return &rc->ro;
}

void
route_cache_clear(struct route *ro)
{
  if (ro->update_func)
   ro->update_func(ro, TYPE_CLEAR, NULL, NULL);
}

void
route_cache_update(struct route *ro, int type, const void *old_obj, void *new_obj)
{
 if (ro->update_func)
  return (ro->update_func(ro, type, old_obj, new_obj))
 return true;
}

/* PCB update_func */
bool
pcb_update_cache(struct route *ro, int type, const void *old_obj, void *new_obj)
{
if (type == TYPE_CLEAR)
  RO_INVALIDATE_CACHE(ro);
}

/* ip_output: */

        if (ro->ro_nh != NULL &&
            ((!NH_IS_VALID(ro->ro_nh)) || dst->sin_family != AF_INET ||
            dst->sin_addr.s_addr != ip->ip_dst.s_addr))
{
OLD:
                RO_INVALIDATE_CACHE(ro);
NEW:
                route_cache_clear(ro);
}

OLD:
                        ro->ro_nh = fib4_lookup(fibnum, dst->sin_addr, 0,
                            NHR_REF, flowid);

NEW:
  nh = fib4_lookup(fibnum, dst->sin_addr, 0, NHR_REF, flowid);
  if (!route_cache_update(ro, TYPE_NHOP, NULL, nh))
    nh_free(nh);


USAGE;

tunnel_sc->route_cache = route_cache_init()
ip[6]_output(..., tunnel_sc->route_cache)

Revert to my previous version, and adopt some @melifaro 's suggestions . The KPIs keep almost the same.

zlei marked an inline comment as done.Dec 23 2022, 3:56 PM
struct route_cache {
  union {
   struct route ro;
   struct route_in ro_in;
   struct route_in6 ro_in6
  }
 struct mtx lock;
 struct rib_subscription *rs;
}

Does the lock protect the read part ?
If yes then it is apparent a contention when concurrent network IO occurs.

In D37757#860208, @zlei wrote:
struct route_cache {
  union {
   struct route ro;
   struct route_in ro_in;
   struct route_in6 ro_in6
  }
 struct mtx lock;
 struct rib_subscription *rs;
}

Does the lock protect the read part ?
If yes then it is apparent a contention when concurrent network IO occurs.

No, the update fucnction, which uses the lock, is inly called when it is needed to update the cache.
The read path is not affected.

In D37757#860208, @zlei wrote:
struct route_cache {
  union {
   struct route ro;
   struct route_in ro_in;
   struct route_in6 ro_in6
  }
 struct mtx lock;
 struct rib_subscription *rs;
}

Does the lock protect the read part ?
If yes then it is apparent a contention when concurrent network IO occurs.

No, the update fucnction, which uses the lock, is inly called when it is needed to update the cache.
The read path is not affected.

Let me clarify it a bit further. The idea is change the parts of the stack that currently update nhop/lle cache in the struct route directly to use the new callback.
The hot/data/read part remains unaffected - no atomic or other ops are added.
When route cache is invalidated (route_cache_invalidate() clearing the pointer) datapath may read old nhop/lle pointer. It is safe, as both nhop and lles use epoch-backed garbage collection.

In D37757#860208, @zlei wrote:
struct route_cache {
  union {
   struct route ro;
   struct route_in ro_in;
   struct route_in6 ro_in6
  }
 struct mtx lock;
 struct rib_subscription *rs;
}

Does the lock protect the read part ?
If yes then it is apparent a contention when concurrent network IO occurs.

No, the update fucnction, which uses the lock, is inly called when it is needed to update the cache.
The read path is not affected.

Let me clarify it a bit further. The idea is change the parts of the stack that currently update nhop/lle cache in the struct route directly to use the new callback.
The hot/data/read part remains unaffected - no atomic or other ops are added.
When route cache is invalidated (route_cache_invalidate() clearing the pointer) datapath may read old nhop/lle pointer. It is safe, as both nhop and lles use epoch-backed garbage collection.

I see.
I'd like to prototype that when I have bandwidth.
Currently working on jail bugs .