Replace if_addr_lock rwlock with epoch + mutex
ClosedPublic
Actions

Looks awesome. My only issue is that I'd strongly prefer that the STAILQ_HEAD / STAILQ_INIT / STAILQ_ENTRY macros be prefixed with CK_ so that readers realize that these lists are used with epochs, and cannot be safely used with the normal STAILQ macros. I tried to point out most of them, but I may have missed a few.

The other issue is that it looks like there is at least one occurrence of a hand-rolled STAILQ_FOREACH that may be missing some epoch safety.

This is a huge public service; thanks for doing this!

sys/contrib/ipfilter/netinet/ip_compat.h
230	Should be CK_STAILQ_HEAD
sys/net/if_var.h
94	CK_STAILQ_HEAD
521	CK_STAILQ_ENTRY
sys/netinet/in_var.h
83	For clarity, it seems like this should be CK_STAILQ_ENTRY (yes, I realize definition is the same, but it gives a hint to humans what's going on with this list)
110	For clarity, it seems like this should be CK_STAILQ_HEAD
176	This looks almost like CK_STAILQ_FOREACH, except that does a ck_pr_fence_load(). Do you need that?
sys/netinet6/in6_var.h
130	CK_STAILQ_ENTRY
sys/netinet6/ip6_input.c
225	CK_STAILQ_INIT

gallatin added a reviewer: jtl.May 9 2018, 3:02 PM

mmacy added inline comments.May 9 2018, 6:28 PM

sys/contrib/ipfilter/netinet/ip_compat.h
230	This is strictly for use by user, I'm hesitant to break buildworld for the sake of consistency there. But with a 96-way I can give it a whirl without much opportunity cost.

Incorporate feedback

User land includes if_var.h - fix buildworld

Change sa_family to AF_UNSPEC after removal to indicate to readers that the address has been logically deleted
fix up compile after merging D15365

Under a heavy UDP packet flood this helps quite a bit.

Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

seanc added a subscriber: seanc.May 11 2018, 6:58 PM

On my dual 8160 with the equivalent workload I see in_broadcast go from 0.86% of samples to 0.24% of samples. However in_pcblookup_hash goes from 0.78% to 1.36%. I can't actually generate enough load to get a performance improvement because FreeBSD UDP tx doesn't actually do any queue hashing:
mmacy@pandemonium [~/FlameGraph|21:44|19] sysctl dev.cc.0 | grep enqueues
dev.cc.0.txq.15.r_enqueues: 843
dev.cc.0.txq.14.r_enqueues: 899
dev.cc.0.txq.13.r_enqueues: 886
dev.cc.0.txq.12.r_enqueues: 887
dev.cc.0.txq.11.r_enqueues: 918
dev.cc.0.txq.10.r_enqueues: 891
dev.cc.0.txq.9.r_enqueues: 910
dev.cc.0.txq.8.r_enqueues: 896
dev.cc.0.txq.7.r_enqueues: 662834
dev.cc.0.txq.6.r_enqueues: 896
dev.cc.0.txq.5.r_enqueues: 855
dev.cc.0.txq.4.r_enqueues: 874
dev.cc.0.txq.3.r_enqueues: 902
dev.cc.0.txq.2.r_enqueues: 891
dev.cc.0.txq.1.r_enqueues: 897
dev.cc.0.txq.0.r_enqueues: 905543423

With the following patch I get uniform packet distribution on the sender

diff --git a/sys/netinet/udp_usrreq.c b/sys/netinet/udp_usrreq.c
index 670182ece8b..1a279382cab 100644
--- a/sys/netinet/udp_usrreq.c
+++ b/sys/netinet/udp_usrreq.c
@@ -1592,6 +1592,8 @@ udp_attach(struct socket *so, int proto, struct thread *td)
        inp = sotoinpcb(so);
        inp->inp_vflag |= INP_IPV4;
        inp->inp_ip_ttl = V_ip_defttl;
+       inp->inp_flowid = arc4random();
+       inp->inp_flowtype = M_HASHTYPE_OPAQUE;
 
        error = udp_newudpcb(inp);
        if (error) {

But the profile is only marginally less embarrassing:
https://github.com/mattmacy/profiling/blob/master/2018.05.11/udpsender2.svg

On going progress on fixing the brokenness in UDP transmit. Shelving this for the day. I'll probably just move to using pkt-gen on Monday.
https://github.com/ScaleBSD/scalebsd.github.io/blob/master/_drafts/2018-05-12-UDP-On-FreeBSD-is-anyone-using-this-at-scale.md

mmacy added a reviewer: jeff.May 13 2018, 1:35 AM

mmacy added a reviewer: ae.May 13 2018, 1:38 AM

gallatin accepted this revision.May 16 2018, 12:12 AM

This revision is now accepted and ready to land.May 16 2018, 12:12 AM

sbruno accepted this revision.May 16 2018, 8:05 PM

• hselasky added a subscriber: • hselasky.May 18 2018, 3:41 AM

• hselasky added inline comments.

sys/net/if.c
983	Setting AF_UNSPEC must be after epoch_wait() ? Else you introduce a race.

• hselasky added inline comments.May 18 2018, 3:48 AM

sys/net/if.c
1001	epoch_wait() first.
1175	epoch_wait() first. Then AF_UNSPEC.
3781	Same here. epoch_wait() must happen before this write.
3794	epoch_wait() first.

It looks like those assignments didn't actually make it in to the branch that's been tested. Look here:
https://github.com/mattmacy/networking/tree/projects/epochifaddrnew

mmacy closed this revision.May 18 2018, 9:45 PM