Page MenuHomeFreeBSD

routing: Add support for route metric
ClosedPublic

Authored by pouria on Apr 8 2026, 9:56 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, May 25, 9:33 PM
Unknown Object (File)
Mon, May 25, 9:18 PM
Unknown Object (File)
Mon, May 25, 8:05 PM
Unknown Object (File)
Mon, May 25, 6:46 PM
Unknown Object (File)
Mon, May 25, 5:28 PM
Unknown Object (File)
Mon, May 25, 12:12 AM
Unknown Object (File)
Sun, May 24, 11:52 PM
Unknown Object (File)
Sun, May 24, 11:32 PM
Subscribers

Details

Summary

Apply metric to the actual list of nexthops in nhgrp.
Previously, we only supported nexthop weight for ECMP routes.
The upper 8 bit of weight was reserved for metric or administrative
distance purposes.
Now we filter any nexthops with lower metric in the actual nexthop
group.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 72070
Build 68953: arc lint + arc unit

Event Timeline

pouria requested review of this revision.Apr 8 2026, 9:56 PM

Here is how it works:

[root@ftsr1] [~] # route -n6 add -net 3fff::/64 -gateway 2a01:e140:10:10::6 -weight 33554435
change net 3fff::/64: gateway 2a01:e140:10:10::6 fib 0
[root@ftsr1] [~] # route -n6 add -net 3fff::1/64 -gateway 2a01:e140:10:10::1 -weight 2
add net 3fff::1/64: gateway 2a01:e140:10:10::1 fib 0
[root@ftsr1] [~] # route -n6 get 3fff::/64
   route to: 3fff::
destination: 3fff::
       mask: ffff:ffff:ffff:ffff::
    gateway: 2a01:e140:10:10::6
        fib: 0
  interface: vtnet0
      flags: <UP,GATEWAY,DONE,STATIC>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu       metric    weight    expire
       0         0         0         0      1500         2         3         0
[root@ftsr1] [~] # route -on6 get 3fff::/64
   route to: 3fff::
destination: 3fff::
       mask: ffff:ffff:ffff:ffff::
        fib: 0
      flags: <UP,GATEWAY,DONE,STATIC>
      nhops: 2
        via gw 2a01:e140:10:10::1 iface vtnet0 metric 0 weight 2 mtu 1500 table inet6.0
        via gw 2a01:e140:10:10::6 iface vtnet0 metric 2 weight 3 mtu 1500
[root@ftsr1] [~] # netstat -rn6W | grep 3fff
3fff::/64                         2a01:e140:10:10::1            UGS         0   1500           vtnet0
3fff::/64                         2a01:e140:10:10::6            UGS         0   1500           vtnet0
[root@ftsr1] [~] # netstat -On6W
Nexthop groups data

Internet6:
GrpIdx  NhIdx     Weight   Slots                                 Gateway     Netif  Refcnt
11        ------- ------- ------- --------------------------------------- ---------       2
               9       2       0                      2a01:e140:10:10::1    vtnet0
              1033554435      64                      2a01:e140:10:10::6    vtnet0

Of course, we should add a separate -metric argument to route(8) later.

sys/net/route.h
106

Here is another hint for me to understand what was the original @melifaro design.
See below: get_info_weight

sys/net/route/nhgrp_ctl.c
147–148

Changes here only affect the actual nhgrp size in alloc_nhgrp().
So, basically, we're reducing the size required for mpath if metric sets.

252

This is not necessary.
I can revert it.

sys/net/route/route_ctl.c
199

Why not use a separate value for metric?
Here is the reason.
Both approaches had its own side effects.
I decided to go with original @melifaro design.

Also this revision will NOT change expected routing behavior (POLA) since we never allow users to set weight more than 24-bit.
Now we don't, but we use it for metric.
I intentionally used the term metric instead of administrative distance. You know the difference.

before this patch, adding nexthop never results in change to the main or first nhop of the nexthop group.
Before this patch:

[pouria@cornelia] [/usr/src] [mpath_user|✔] % mdo route -n6 add -host 3fff::1 -gateway ::4 -weight 16777218
add host 3fff::1: gateway ::4
[pouria@cornelia] [/usr/src] [mpath_user|✔] % netstat -On6W
Nexthop groups data

Internet6:
GrpIdx  NhIdx     Weight   Slots                                 Gateway     Netif  Refcnt
14        ------- ------- ------- --------------------------------------- ---------       2
               7       1       0                                     ::1       lo0
               8       2       0                                     ::2       lo0
              11 16777218      32                                     ::4       lo0
              12 16777216      32                                     ::3       lo0
[pouria@cornelia] [/usr/src] [mpath_user|✔] % echo '2^24' | bc
16777216
[pouria@cornelia] [/usr/src] [mpath_user|✔] % echo '(2^24)*2' | bc
33554432
[pouria@cornelia] [/usr/src] [mpath_user|✔] % mdo route -n6 add -host 3fff::1 -gateway ::5 -weight 33554434
add host 3fff::1: gateway ::5
[pouria@cornelia] [/usr/src] [mpath_user|✔] % route -n6 get 3fff::1
   route to: 3fff::1
destination: 3fff::1
    gateway: ::4
        fib: 0
  interface: lo0
      flags: <UP,GATEWAY,HOST,DONE,STATIC>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0     16384         0         0
[pouria@cornelia] [/usr/src] [mpath_user|✔] % netstat -On6W
Nexthop groups data

Internet6:
GrpIdx  NhIdx     Weight   Slots                                 Gateway     Netif  Refcnt
15        ------- ------- ------- --------------------------------------- ---------       2
               7       1       0                                     ::1       lo0
               8       2       0                                     ::2       lo0
              11 16777218      16                                     ::4       lo0
              12 16777216      15                                     ::3       lo0
              13 33554434      33                                     ::5       lo0

You may wonder how I set the weight to more than 24-bit before this patch?
The answer is we didn't validate maximum weight value from netlink.
However, we didn't document any of the weight system. So we're safe from breaking the userland.

After this patch:

[root@ftsr1] [~] # route -n6 add -net 3fff::/64 -gateway ::4 -weight 16777218
add net 3fff::/64: gateway ::4 fib 0
[root@ftsr1] [~] # netstat -On6W
Nexthop groups data

Internet6:
GrpIdx  NhIdx     Weight   Slots                                 Gateway     Netif  Refcnt
14        ------- ------- ------- --------------------------------------- ---------       2
               9       2       0                                     ::1       lo0
              10       2       0                                     ::2       lo0
              1116777218      32                                     ::4       lo0
              1216777216      32                                     ::3       lo0
[root@ftsr1] [~] # route -n6 add -net 3fff::/64 -gateway ::5 -weight 33554433
add net 3fff::/64: gateway ::5 fib 0
[root@ftsr1] [~] # netstat -On6W
Nexthop groups data

Internet6:
GrpIdx  NhIdx     Weight   Slots                                 Gateway     Netif  Refcnt
15        ------- ------- ------- --------------------------------------- ---------       2
               9       2       0                                     ::1       lo0
              10       2       0                                     ::2       lo0
              1116777218       0                                     ::4       lo0
              1216777216       0                                     ::3       lo0
              1333554433      64                                     ::5       lo0

Compare the nexthop slots of the nexthop group, before and after this patch.

Refactor to use union instead.
This is much more cleaner.

Update structure to make it compile separately.
+ fix minor issues.

sys/net/route/route_helpers.c
297 ↗(On Diff #175200)

With new structure, this initialization must be changed.
This is NOT for styling.

sys/net/route.h
92–95

Is this structure user visible? If so, better reduce rmx_filler[] down to 1 and use that space. The comment on T/TCP should be cleared while modifying this line.

pouria marked an inline comment as done.

@glebius done!
Yes, it's user visible.
Thank you.

sys/net/route.h
91–93

If you keep order of existing members as before, the ABI should be compatible with old programs.

Maintain backward compatibility.
@glebius done.

Refactor metric implementation.

In our routing stack implementation, metric is the attribute
of the nexthop, not the route itself.
Because we can't have two different route with same destination,
But, we can have a route with multiple nexthops.
Therefore, it doesn't make sense to compare weight of multiple
nexthops with different metrics.

By definition, we shouldn't have multiple of same nexthops with
different weight, but we can have multiple nexthops with different
metrics.

Apply metric to the actual list of nexthops in nhgrp.
However, store metric in nhop_object.

Previously, we only supported nexthop weight for ECMP routes.
Remove the upper 8-bit reservation of weight and reuse spare
data of nhop object for metric.

sys/net/route/route_ctl.c
199

This is no longer true.
In last change, I've used a separate value for metric.

Minor styling and adjust comments

Implement support for metric via rtsock.
Add RT_DEFAULT_METRIC.

sys/net/route/nhgrp_ctl.c
138

Higher than what?

165

This comment is a bit confusing, it looks like the check is just filtering out nhops with metrics not equal to the one that was passed in.

271

Again, this comment is not very helpful and kind of confusing.

515

Is it impossible to have num_nhops == 0?

526

It's better to avoid this style of i++; /* increment i */ comment.

This looks good; how would we unit test this behaviour?

pouria marked 4 inline comments as done.

Address @markj comments.

Thank you for your review!

@adrian : D57016

sys/net/route/nhgrp_ctl.c
271

I removed this comment and added a more meaningful description above the function.

515

No

This revision is now accepted and ready to land.Mon, May 18, 5:06 PM

Simplify set_metric logic by specifying 0 as RT_WILDCARD_METRIC macro and overwrite wildcard (0) metric to default metric (1).
This simple change allow users to delete all the routes without specifying metric in both rtsock and metric and also preserve POLA by not allowing users to add/change route metric to wildcard metric (0) by overwriting it to default metric.

This revision now requires review to proceed.Tue, May 19, 1:20 PM
sys/net/route/nhgrp_ctl.c
158

Is it possible to that none of the nhops will match? If so, xmin will be left uninitialized below.

515

"No" meaning it's impossible? I would add an assertion then.

Address @markj comments.
Initialize xmin for future.
Add MPASS in nhgrp_alloc.

Merge D57007.
Address @markj comment.
markj asked: How do you decide whether to add a field to nhop_object vs. the other (nhop_priv)?
I reevaluate my decisions and decided to test whether if worth it to reuse spare member of nhop_object which is used for dataplane or should I place it under nhop_priv?
I had two options, my initial thought was to place it inside the lookup section (NH_PRIV_END_CMP) of nhop_priv.
But since it was cache aligned, I initially decided to place metric inside the nhop_object.
But I didn't think of having to touch the cmp_priv at that time. So its cost of metric comparison by double dereferencing was not visible to me.
Now with D57007, I decided to test moving metric under nhop_priv without placing it inside the lookup section.
I decided to test that in production at NL-IX.
Here was the results:

% # When nh_metric was placed in nhop_object.
% mdo vmstat -m | grep nh
nhops   38 21248 324367067 64,128,256,512,1024,2048,4096,8192
% mdo netstat -rn4 | wc -l
1053225
% mdo netstat -rn6 | wc -l
288723
% mdo netstat -On6 | wc -l
37
% mdo netstat -on6 | wc -l
432
% # After moving it into nhop_priv (FreeBSD 16.0-CURRENT (GENERIC-NODEBUG) #3 mpath-n285993-1a086c816b5a):
% mdo vmstat -m | grep nh
nhops   38 21248 202332 64,128,256,512,1024,2048,4096,8192
% mdo netstat -rn4 | wc -l
1053251
% mdo netstat -rn6 | wc -l
288986
% mdo netstat -On6 | wc -l
40
% mdo netstat -on6 | wc -l
431

Superising to me, memory allocation is not changed.
Well, after seeing the results, I checked the size of nhop_priv again and found out the overall structure of it is not aligned to allocation boundaries.
So it is completely safe to move it under nhop_priv and it's better to do so because metric is a control-plane variable and
not belongs to nhop_object.

Thank you so much @markj for asking me that question.

Fix FIB_NH_LOG format specifier.

IPv6 default route will be added using rib_default_route from route_helpers.
Ensure it sets the RT_DEFAULT_METRIC.

This revision was not accepted when it landed; it landed in state Needs Review.Wed, May 20, 8:58 PM
This revision was automatically updated to reflect the committed changes.