This patch adds ip6_tryforward() - a fast run to completion forwarding implementation for IPv6.
Details
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 6057 Build 6322: arc lint + arc unit
Event Timeline
There is a problem somewhere because I didn't see any difference :-(
Ministat output of Packet-per-second forwarded on an small atom-4cores with gigabit Intel NIC:
x 12-head r308878 + 12-head r308878 with D8527 +--------------------------------------------------------------------------+ |+ + xx x x x + + +| | |_____________A_____________| | | |________________________________A______________M________________| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 858189 866458 861860 861749.5 3530.3342 + 5 853023.5 871471 866940 863030.7 8312.3947 No difference proven at 95.0% confidence
Output of "pmcstat -S instructions -T -w1" WITHOUT the patch (with traffic limited to 200Kpps):
PMC: [INSTR_RETIRED_ANY] Samples: 16538 (100.0%) , 0 unresolved %SAMP IMAGE FUNCTION CALLERS 4.9 kernel in6_setscope ip6_forward:3.3 ip6_input:1.6 4.9 kernel bounce_bus_dmamap_lo bus_dmamap_load_mbuf_sg 4.4 kernel ip6_forward ip6_input 4.0 kernel igb_mq_start_locked igb_mq_start 3.7 libc.so.7 bsearch 0x64bc 3.6 kernel igb_rxeof igb_msix_que 3.6 kernel uma_zfree_arg m_freem:2.6 mb_free_ext:1.0 3.1 kernel uma_zalloc_arg m_getjcl:1.2 m_copym:1.0 m_prepend:0.9 3.0 kernel ip6_input netisr_dispatch_src 3.0 kernel rn_match rtalloc1_fib 2.6 kernel ether_output ip6_forward 2.5 kernel bzero m_pkthdr_init:1.6 nd6_resolve:0.5 2.5 kernel m_copym ip6_forward 2.0 kernel __rw_rlock nd6_resolve:1.1 rtalloc1_fib:0.9
Output of "pmcstat -S instructions -T -w1" WITHT D8527 patch (with traffic still limited to 200Kpps):
PMC: [INSTR_RETIRED_ANY] Samples: 10678 (100.0%) , 0 unresolved %SAMP IMAGE FUNCTION CALLERS 6.1 kernel igb_mq_start_locked igb_mq_start 6.0 kernel igb_rxeof igb_msix_que 4.5 kernel rn_match fib6_lookup_nh_basic 4.2 kernel bounce_bus_dmamap_lo bus_dmamap_load_mbuf_sg 4.2 kernel ether_output ip6_tryforward 3.7 libc.so.7 bsearch 0x64bc 3.6 kernel bzero fib6_lookup_nh_basic:1.3 m_pkthdr_init:0.8 nd6_resolve:0.8 ip6_tryforward:0.7 3.2 kernel __rw_rlock fib6_lookup_nh_basic:1.7 nd6_resolve:1.5 2.9 kernel ip6_tryforward ip6_input 2.6 kernel fib6_lookup_nh_basic ip6_findroute 2.5 kernel ip6_input netisr_dispatch_src 2.2 kernel _rw_runlock_cookie fib6_lookup_nh_basic:1.2 nd6_resolve:1.1 2.2 kernel bus_dmamap_load_mbuf igb_mq_start_locked:1.2 igb_refresh_mbufs:1.1 2.2 kernel igb_mq_start ether_output 2.2 kernel netisr_dispatch_src ether_input:1.4 ether_demux:0.8 2.0 kernel ether_nh_input netisr_dispatch_src
I'm working on fixing my bench scripts (was not so easy to add IPv6) and re-run the bench using this last patch revision, on a bigger server (Xeon_E5-2650 with 8Cores, same Chelsio T540 NIC).
Then I will re-run them on the other platforms too.
Meanwhile the results are… impressive: From 2.4Mpps to 4.1Mpps :-)
x head r309424.inet6.pps + head r309424 with D8527.inet6.pps +--------------------------------------------------------------------------+ | x + +| |x xx + + +| | |__AM_| | | |_____M__________A_______________| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 2208236 2497598.5 2472890.5 2427075.3 122943.06 + 5 4043284.5 5424029 4127901 4615409.7 731887.23 Difference at 95.0% confidence 2.18833e+06 +/- 765352 90.1634% +/- 32.6463% (Student's t, pooled s = 524773)
I've re-run the fixed bench script on 8core ATOM.
Same excellent improvement with this patch on this platform:
x r309424 inet6.pps + r309424 with D8527 inet6.pps +--------------------------------------------------------------------------+ | x + +| |x xx x +++| | |_A_| | | |A|| +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 1048371 1178036.5 1118136.5 1123344.5 54959.786 + 5 2969705 3028344.5 3003420 2999605.6 24042.596 Difference at 95.0% confidence 1.87626e+06 +/- 61864.7 167.025% +/- 13.6524%
This is multiple independent changes. The "rcvif" statistics changes are all non-functional noise to my understanding and should be factored out and committed separately. It'll help to review the functional changes a lot easier.
There's an #ifdef in there which clearly will not compile.
sys/netinet6/ip6_fastfwd.c | ||
---|---|---|
143 | That won't compile, will it? | |
sys/netinet6/ip6_input.c | ||
594 | All these "rcvif" changes could be factored out into a separate change and committed as non-functional change irrelevant to this work, please? | |
732 | Put the V_ip6_forwarding check first please? |
I've not read in detail through the ip6_fastfwd.c code but I am fine with the other changes.