Mostly mechanical change to use the CK_STAILQ macros and defer frees to an epoch_call
Depends on D15365.
Differential D15366
Replace if_addr_lock rwlock with epoch + mutex mmacy on May 9 2018, 6:50 AM. Authored by Tags None Referenced Files
Details
Diff Detail
Event TimelineComment Actions Looks awesome. My only issue is that I'd strongly prefer that the STAILQ_HEAD / STAILQ_INIT / STAILQ_ENTRY macros be prefixed with CK_ so that readers realize that these lists are used with epochs, and cannot be safely used with the normal STAILQ macros. I tried to point out most of them, but I may have missed a few. The other issue is that it looks like there is at least one occurrence of a hand-rolled STAILQ_FOREACH that may be missing some epoch safety. This is a huge public service; thanks for doing this!
Comment Actions
Comment Actions Under a heavy UDP packet flood this helps quite a bit. Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Comment Actions On my dual 8160 with the equivalent workload I see in_broadcast go from 0.86% of samples to 0.24% of samples. However in_pcblookup_hash goes from 0.78% to 1.36%. I can't actually generate enough load to get a performance improvement because FreeBSD UDP tx doesn't actually do any queue hashing: Comment Actions With the following patch I get uniform packet distribution on the sender diff --git a/sys/netinet/udp_usrreq.c b/sys/netinet/udp_usrreq.c index 670182ece8b..1a279382cab 100644 --- a/sys/netinet/udp_usrreq.c +++ b/sys/netinet/udp_usrreq.c @@ -1592,6 +1592,8 @@ udp_attach(struct socket *so, int proto, struct thread *td) inp = sotoinpcb(so); inp->inp_vflag |= INP_IPV4; inp->inp_ip_ttl = V_ip_defttl; + inp->inp_flowid = arc4random(); + inp->inp_flowtype = M_HASHTYPE_OPAQUE; error = udp_newudpcb(inp); if (error) { But the profile is only marginally less embarrassing: Comment Actions On going progress on fixing the brokenness in UDP transmit. Shelving this for the day. I'll probably just move to using pkt-gen on Monday.
Comment Actions It looks like those assignments didn't actually make it in to the branch that's been tested. Look here: |