Page MenuHomeFreeBSD

Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9)
ClosedPublic

Authored by ae on Jun 13 2018, 11:51 AM.

Details

Summary

Using of rwlock with multiqueue NICs for IP forwarding on high pps produces high lock contention and inefficient. Replacing rwlock to rmlock allows achieve pps results that are several times higher. We use similar patch at Yandex at least from FreeBSD 9.x. AFAIK, Netflix has tested it under their workloads and no regressions were observed. So, I think the patch can be included into FreeBSD 12.0.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ae created this revision.Jun 13 2018, 11:51 AM
olivier accepted this revision as: olivier.Jun 14 2018, 9:40 PM

Macro shipit:

On a 2 sockets, 12 Core Xeon E5 2650 with a Mellanox ConnectX-4:

x head r335106: inet4 packets-per-second
+ head r335106 with D15789 : inet4 packets-per-second
+--------------------------------------------------------------------------+
|                                                                         +|
|                                                                         +|
|                                                                         +|
|x                                                                        +|
|x   xxx                                                                  +|
||__AM_|                                                                   |
|                                                                         A|
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5     2531823.5       3417268     3121023.5     2974341.3     413968.07
+   5      13240135      13257591      13254631      13251573     7260.4177
Difference at 95.0% confidence
        1.02772e+07 +/- 426980
        345.53% +/- 63.9485%
        (Student's t, pooled s = 292765)

Macro this-is-freebsd:

On a Xeon E5 2650 with Chelsio_T540-CR:

x head r335106: inet4 packets-per-second
+ head r335106 with D15789 : inet4 packets-per-second
+--------------------------------------------------------------------------+
|x x                                                                 +     |
|x x                                                                 +  +++|
|MA|                                                                       |
|                                                                    |_AM_||
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5       5812926       5942477       5827792     5868257.5       66325.8
+   5      11038688      11436516      11282262      11243417      184820.6
Difference at 95.0% confidence
        5.37516e+06 +/- 202502
        91.5972% +/- 3.94168%
        (Student's t, pooled s = 138848)

On a 8 core Atom C2758 with Chelsio T540-CR:

x head r335106: inet4 packets-per-second
+ head r335106 with D15789 : inet4 packets-per-second
+--------------------------------------------------------------------------+
|                                                     +                    |
|x      x  x xx                                       +            + +    +|
|   |____A_M___|                                                           |
|                                                     |_________A__M_____| |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5       3679811     3829723.5       3789131     3773040.1     59232.903
+   5       4275233     4502475.5     4422916.5     4384184.3     102410.34
Difference at 95.0% confidence
        611144 +/- 122006
        16.1977% +/- 3.37258%
        (Student's t, pooled s = 83655.3)

On 4 core AMD GX-412TC with Intel i210AT:

x head r335106: inet4 packets-per-second
+ head r335106 with D15789 : inet4 packets-per-second
+--------------------------------------------------------------------------+
|x x                                                                  +    |
|xxx                                                                 ++ + +|
||A|                                                                       |
|                                                                    |MA_| |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5        674835        678752        676834      676729.2     1573.2049
+   5        784956        793338        787191      788414.8     3448.5979
Difference at 95.0% confidence
        111686 +/- 3909.03
        16.5037% +/- 0.595148%
        (Student's t, pooled s = 2680.28)
This revision is now accepted and ready to land.Jun 14 2018, 9:40 PM
bz added a subscriber: bz.EditedJun 14 2018, 9:45 PM

Macro shipit:

I am totally not surprised by these numbers. However (a) did you do the same test for IPv6? (b) is that a forwarding setup or an end node setup? (c) how many route updates per second did you try on a forwarding node?

olivier added a comment.EditedJun 14 2018, 9:57 PM
In D15789#334385, @bz wrote:

I am totally not surprised by these numbers. However (a) did you do the same test for IPv6? (b) is that a forwarding setup or an end node setup? (c) how many route updates per second did you try on a forwarding node?

It's a forwarding setup and I'm using only 2 static routes on my benches

About the inet6 results:

On a 2 sockets, 12 Core Xeon E5 2650 with a Mellanox ConnectX-4:

x r335106 : inet6 packets-per-second
+ r335106 with D15789: inet6 packets-per-second
+--------------------------------------------------------------------------+
|                                                                         +|
|                                                                         +|
|                                                                         +|
|xx                                                                       +|
|xxx                                                                      +|
||A                                                                        |
|                                                                         A|
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5     3228845.5       3435569     3275485.5     3293546.5     84303.358
+   5      12396604      12421052      12414368      12410845     10398.101
Difference at 95.0% confidence
        9.1173e+06 +/- 87598.7
        276.823% +/- 9.95235%
        (Student's t, pooled s = 60063.2)

And diff inet4 vs inet6 on this plateform:

x r335106D15789: inet4  packets-per-second
+ r335106D15789: inet6  packets-per-second
+--------------------------------------------------------------------------+
|  +                                                                      x|
|  +                                                                      x|
|+++                                                                    xxx|
|                                                                        AM|
||AM                                                                       |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5      13240135      13257591      13254631      13251573     7260.4177
+   5      12396604      12421052      12414368      12410845     10398.101
Difference at 95.0% confidence
        -840728 +/- 13078.7
        -6.34437% +/- 0.0966876%
        (Student's t, pooled s = 8967.56)

On a 8 core Xeon E5 2650 with Chelsio_T540-CR:

x r335106 : inet6 packets-per-second
+ r335106 with D15789: inet6 packets-per-second
+--------------------------------------------------------------------------+
|                                                                       +  |
|x                                                                      +  |
|x xx  x                                                                +++|
||_A__|                                                                    |
|                                                                       MA||
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5       5833700       6259507     5968707.5     5995114.2     174280.36
+   5      11190415      11364997      11236384      11260193     69952.229
Difference at 95.0% confidence
        5.26508e+06 +/- 193668
        87.8228% +/- 5.75798%
        (Student's t, pooled s = 132791)

And diff inet4 vs inet6 on this plateform (bottleneck seems not in the IP stack here but NIC ?):

x r335106D15789: inet4 packets-per-second
+ r335106D15789: inet6 packets-per-second
+--------------------------------------------------------------------------+
|x   x                       +   +   +        x +            +     x      x|
|    |_________________________________A______M_________________________|  |
|                            |_______M____A___________|                    |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5      11038688      11436516      11282262      11243417      184820.6
+   5      11190415      11364997      11236384      11260193     69952.229
No difference proven at 95.0% confidence

On a 8 core Atom C2758 with Chelsio T540-CR:

x r335106 : inet6 packets-per-second
+ r335106 with D15789: inet6 packets-per-second
+--------------------------------------------------------------------------+
|x    x   x x  x                                              +  ++    +  +|
|  |_____AM___|                                                            |
|                                                              |__M_A____| |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5       3549377     3644218.5     3610977.5     3601811.4     37260.982
+   5       3962538     4047545.5       3995556     4003879.1     33482.287
Difference at 95.0% confidence
        402068 +/- 51661
        11.1629% +/- 1.52497%
        (Student's t, pooled s = 35422.1)

On 4 core AMD GX-412TC with Intel i210AT:

x r335106 : inet6 packets-per-second
+ r335106 with D15789: inet6 packets-per-second
+--------------------------------------------------------------------------+
|                                                                      +  +|
|xx x  xx                                                            + +  +|
| |_A__|                                                                   |
|                                                                     |MA_||
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5        614522        624910        619461      619776.4     4432.2748
+   5        721146        728208        724108      724908.6     2904.8339
Difference at 95.0% confidence
        105132 +/- 5465.09
        16.9629% +/- 0.988798%
        (Student's t, pooled s = 3747.21)

And diff inet4 vs inet6 on this platform:

x r335106D15789: inet4 packets-per-second
+ r335106D15789: inet6 packets-per-second
+--------------------------------------------------------------------------+
|   +                                                                      |
|+  +  ++                                                         xxx  x  x|
|                                                                 |_MA___| |
| |_MA__|                                                                  |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5        784956        793338        787191      788414.8     3448.5979
+   5        721146        728208        724108      724908.6     2904.8339
Difference at 95.0% confidence
        -63506.2 +/- 4649.99
        -8.05492% +/- 0.562488%
        (Student's t, pooled s = 3188.33)
bz added a comment.Jun 14 2018, 10:09 PM

It's a forwarding setup and I'm using only 2 static routes on my benches

right; I wonder if you could add about 500k routes for IPv4 and about (no idea 50k let's think ahead?) for IPv6 and then do about 25 route updates per second randomly in the address space; that would be an amazingly interesting test case (especially if you can provide the framework for that somewhere). I guess to see the initial table one could get an MRT dump from say https://www.ripe.net/analyse/internet-measurements/routing-information-service-ris/ris-raw-data or some of the others https://bgpstream.caida.org/data . I am neither saying you have to or that this should prevent this change from going in; it's just one of the things I have thought of for years related to some similar changes and others to have as a "router test bed scenario".

melifaro accepted this revision.Jun 14 2018, 10:12 PM
ae added a comment.Jun 15 2018, 10:05 AM
In D15789#334391, @bz wrote:

right; I wonder if you could add about 500k routes for IPv4 and about (no idea 50k let's think ahead?) for IPv6 and then do about 25 route updates per second randomly in the address space; that would be an amazingly interesting test case (especially if you can provide the framework for that somewhere). I guess to see the initial table one could get an MRT dump from say https://www.ripe.net/analyse/internet-measurements/routing-information-service-ris/ris-raw-data or some of the others https://bgpstream.caida.org/data . I am neither saying you have to or that this should prevent this change from going in; it's just one of the things I have thought of for years related to some similar changes and others to have as a "router test bed scenario".

I think the number of routes does not matter. The problem is with contention, not with the time of lookup. I think you want to see what is the cost of rm_wlock() in comparison to rw_wlock()?
The test scenario can be the following: add several routes with static arp/ndp entries and then change the default route between them in a loop with some delay.

This revision was automatically updated to reflect the committed changes.