Page MenuHomeFreeBSD

ip6_output: use new routing KPI when not passed a cached route
ClosedPublic

Authored by bz on Feb 28 2020, 7:32 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jul 11, 7:10 AM
Unknown Object (File)
Thu, Jul 11, 3:49 AM
Unknown Object (File)
Tue, Jul 9, 2:30 PM
Unknown Object (File)
Tue, Jul 9, 6:04 AM
Unknown Object (File)
Jun 12 2024, 8:36 AM
Unknown Object (File)
Jun 5 2024, 8:33 PM
Unknown Object (File)
Jun 5 2024, 8:33 PM
Unknown Object (File)
Jun 5 2024, 8:33 PM
Subscribers

Details

Summary

Implement the equivalent of r347375 (IPv4) for the IPv6 output path.
In IPv6 we get passed a cached route (and inp) by udp6_output()
depending on whether we acquired a write lock on the INP.
In case we neither bind not connect a first UDP packet would come in
with a cached route (wlocked) and all further packets would not.
In case we bind and do not connect we never write-lock the inp.

When we do not pass in a cached route, rather than providing the
storage for a route locally and pass it over the old lookup code
and down the stack, use the new route lookup KPI and acquire all
details we need to send the packet.

Compared to the IPv4 code the IPv6 code has a couple of possible
complications: given option with a routing hdr/caching route there,
and path mtu (ro_pmtu) case which now equally has to deal with the
possibility of having a route which is NULL passed in, and the
fwd_tag in case a firewall changes the next hop (something to
factor out in the future).

Sponsored by: Netflix

This change requires D23872 and D23873 (the latter currently
also including the former).

Test Plan

I've run the current net, netinet, netinet6, and
netpfil (with pf for the v6 fwd case) test suite
without new fallout.

I've run various UDP pps benchmarks in all 4 combinations:
no bind, no connect; bind, no connect; no bind, connect;
and bind with connect (though in practise this boils down
to "connected or not" apart from 1 packet).

I've also observed various "oddities" with the old and
the new code which are not fully debugged/solved.

One thing I have not figured yet is as to why the
two unconnected cases do not behave the same though
both running into the route lookup KPI case
(ignoring the 1st packet in the unbound code).

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
No Lint Coverage
Unit
No Test Coverage
Build Status
Buildable 29681
Build 27534: arc lint + arc unit

Event Timeline

I re-run UDP6, min-size packets, pps tests on a multi-user system for all four cases (1-2 is the patched new KPI use when possible, 3-4 is the vanilla kernel runs):

no bind, no connect

x 34.x
+ 12.x
+--------------------------------------------------------------------------+
|x                                                                      +  |
|x                                                                      ++ |
|x x                                                                    ++ |
|x x                                                                    ++ |
|x x                                                                    ++ |
|x x                                                                    ++ |
|x x                                                                    ++ |
|x xx                                                                   +++|
|x xx                                                                   +++|
|x xx                                                                   +++|
||A|                                                                    |A |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  22       1880281       1958102     1919490.5     1919208.8     33644.193
+  22       4002562       4073906       4036741     4035067.9     21266.017
Difference at 95.0% confidence
        2.11586e+06 +/- 17124.2
        110.246% +/- 1.65583%
        (Student's t, pooled s = 28144.1)
bind, no connect

x 34.b
+ 12.b
+--------------------------------------------------------------------------+
|                                                                      +   |
|                                                                     ++   |
|                                                                     ++   |
| xx                                                                  ++   |
| xx            x                                                     +++  |
|xxx          x x x                                                  ++++  |
|xxxx      x xxxxxx                                             +   +++++++|
| |_____MA______|                                                   |_AM|  |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  22       3314626       3399183     3346354.5     3352281.8     32945.615
+  22       3621604       3669221       3652504       3651845     9648.7574
Difference at 95.0% confidence
        299563 +/- 14769.9
        8.9361% +/- 0.476971%
        (Student's t, pooled s = 24274.6)

This one is a bit better then it was for single-user, still nowhere near the first one.
There seems something else lingering elsewhere beyond the route lookup in ip6_output().
no bind, connect

x 34.c
+ 12.c
+--------------------------------------------------------------------------+
|        x   x               +       ++                                    |
|x    x  x x *xx+ *x  +x  x* +  * *  *+ + x x + ++x x     x     + + +   + +|
|        |___________M_|_A____________M__A|________________|               |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  22       3420940       3446992       3430067     3431934.3     7438.0559
+  22       3426601       3454099       3437587     3439008.3       8259.39
Difference at 95.0% confidence
        7074.05 +/- 4782.09
        0.206124% +/- 0.13947%
        (Student's t, pooled s = 7859.46)
bind, and connect

x 34.bc
+ 12.bc
+--------------------------------------------------------------------------+
|                                         +       +            +           |
|x        + x      x    x +x  * xx++  x*+x* x x   **+*+x  + x++* *  x     +|
|                      |_________|_______A______A_M_______|____|           |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  22       3411088       3443825     3430899.5     3430531.1     8577.0328
+  22       3415312       3446950       3435337     3434123.1     7446.5046
No difference proven at 95.0% confidence

@olivier do you have a non-forwarding TCP test-bed these days?

In D23886#525161, @bz wrote:

@olivier do you have a non-forwarding TCP test-bed these days?

I have a nginx <=> wrk lab, do you think we could use it in this case ? By serving only very small files, like 16Kb and asking wrk to use large number of clients, something like 2500 or more ?

In D23886#525161, @bz wrote:

@olivier do you have a non-forwarding TCP test-bed these days?

I have a nginx <=> wrk lab, do you think we could use it in this case ? By serving only very small files, like 16Kb and asking wrk to use large number of clients, something like 2500 or more ?

Yes, it'd be nice to know beyond iperf3 that the comparison from before to after does not hurt performance for TCP either.

This comment was removed by bz.
This revision is now accepted and ready to land.Mar 2 2020, 9:35 PM