* For certain range of send sizes (usually between 256 and 512 bytes)
TSO stack tends to pass to iflib_encap mbuf chains which cannot be mapped
to segments with iflib_busdma_load_mbuf_sg. In that case m_collapse is
called but it also fails. Currnet implementation drops such mbuf chain
causing drastic loss of TX performance. There are no rejected request
for mbufs and clusters in netstat output when it happens. Tests show
that calling m_defrag when m_collapse failed works in almost 100% cases
and solves the problem. Lowering by 1 number of max segments set in ifp
struct usually eliminates need to call m_collapse at all and gives even
better results for streams with such sends.
Example results of testing with netperf:
1) Before the patch:
[root@u2002 ~]# netperf -P0 -H u2020 -t TCP_STREAM -l 10 -- -H u2020-2 -m 512 -M 512
87380 1048576 512 10.17 391.77
[root@u2002 ~]# netstat –m
16130/25375/41505 mbufs in use (current/cache/total)
16072/12204/28276/4194304 mbuf clusters in use (current/cache/total/max)
8213/12027 mbuf+clusters out of packet secondary zone in use (current/cache)
0/3/3/2097152 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/1058683 9k jumbo clusters in use (current/cache/total/max)
0/0/0/595509 16k jumbo clusters in use (current/cache/total/max)
36176K/30763K/66940K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were valid and substituted to bogus page
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed
[root@u2002 ~]# sysctl dev.ix.0 | grep mbuf
dev.ix.0.iflib.txq21.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq21.mbuf_defrag: 0
dev.ix.0.iflib.txq20.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq20.mbuf_defrag: 0
dev.ix.0.iflib.txq19.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq19.mbuf_defrag: 0
dev.ix.0.iflib.txq18.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq18.mbuf_defrag: 0
dev.ix.0.iflib.txq17.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq17.mbuf_defrag: 0
dev.ix.0.iflib.txq16.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq16.mbuf_defrag: 0
dev.ix.0.iflib.txq15.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq15.mbuf_defrag: 0
dev.ix.0.iflib.txq14.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq14.mbuf_defrag: 0
dev.ix.0.iflib.txq13.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq13.mbuf_defrag: 0
dev.ix.0.iflib.txq12.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq12.mbuf_defrag: 0
dev.ix.0.iflib.txq11.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq11.mbuf_defrag: 0
dev.ix.0.iflib.txq10.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq10.mbuf_defrag: 0
dev.ix.0.iflib.txq09.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq09.mbuf_defrag: 0
dev.ix.0.iflib.txq08.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq08.mbuf_defrag: 0
dev.ix.0.iflib.txq07.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq07.mbuf_defrag: 0
dev.ix.0.iflib.txq06.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq06.mbuf_defrag: 0
dev.ix.0.iflib.txq05.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq05.mbuf_defrag: 0
dev.ix.0.iflib.txq04.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq04.mbuf_defrag: 0
dev.ix.0.iflib.txq03.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq03.mbuf_defrag: 0
dev.ix.0.iflib.txq02.mbuf_defrag_failed: 36
dev.ix.0.iflib.txq02.mbuf_defrag: 3
dev.ix.0.iflib.txq01.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq01.mbuf_defrag: 0
dev.ix.0.iflib.txq00.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq00.mbuf_defrag: 0
Average TX performance in 20 runs of netperf:
- msg size 512: 388 Mbps
- msg size 256K: 4372 Mbps
2) Patch without "-1":
Average TX performance:
- msg size 512: 3119 Mbps
- msg size 256K: 4367 Mbps
sysctl dev.ix.0 | grep mbuf:
dev.ix.0.iflib.txq21.mbuf_defrag_failed: 1
dev.ix.0.iflib.txq21.mbuf_collapse_failed: 11904
dev.ix.0.iflib.txq21.mbuf_defrag: 12607
dev.ix.0.iflib.txq20.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq20.mbuf_collapse_failed: 0
dev.ix.0.iflib.txq20.mbuf_defrag: 0
dev.ix.0.iflib.txq19.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq19.mbuf_collapse_failed: 0
dev.ix.0.iflib.txq19.mbuf_defrag: 0
dev.ix.0.iflib.txq18.mbuf_defrag_failed: 1
dev.ix.0.iflib.txq18.mbuf_collapse_failed: 10609
dev.ix.0.iflib.txq18.mbuf_defrag: 11225
dev.ix.0.iflib.txq17.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq17.mbuf_collapse_failed: 10858
dev.ix.0.iflib.txq17.mbuf_defrag: 11492
dev.ix.0.iflib.txq16.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq16.mbuf_collapse_failed: 15449
dev.ix.0.iflib.txq16.mbuf_defrag: 16250
dev.ix.0.iflib.txq15.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq15.mbuf_collapse_failed: 0
dev.ix.0.iflib.txq15.mbuf_defrag: 0
dev.ix.0.iflib.txq14.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq14.mbuf_collapse_failed: 10457
dev.ix.0.iflib.txq14.mbuf_defrag: 11064
dev.ix.0.iflib.txq13.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq13.mbuf_collapse_failed: 11654
dev.ix.0.iflib.txq13.mbuf_defrag: 12331
dev.ix.0.iflib.txq12.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq12.mbuf_collapse_failed: 0
dev.ix.0.iflib.txq12.mbuf_defrag: 0
dev.ix.0.iflib.txq11.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq11.mbuf_collapse_failed: 0
dev.ix.0.iflib.txq11.mbuf_defrag: 0
dev.ix.0.iflib.txq10.mbuf_defrag_failed: 3
dev.ix.0.iflib.txq10.mbuf_collapse_failed: 23030
dev.ix.0.iflib.txq10.mbuf_defrag: 24298
dev.ix.0.iflib.txq09.mbuf_defrag_failed: 1
dev.ix.0.iflib.txq09.mbuf_collapse_failed: 11100
dev.ix.0.iflib.txq09.mbuf_defrag: 11719
dev.ix.0.iflib.txq08.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq08.mbuf_collapse_failed: 0
dev.ix.0.iflib.txq08.mbuf_defrag: 0
dev.ix.0.iflib.txq07.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq07.mbuf_collapse_failed: 11064
dev.ix.0.iflib.txq07.mbuf_defrag: 11703
dev.ix.0.iflib.txq06.mbuf_defrag_failed: 1
dev.ix.0.iflib.txq06.mbuf_collapse_failed: 7404
dev.ix.0.iflib.txq06.mbuf_defrag: 7802
dev.ix.0.iflib.txq05.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq05.mbuf_collapse_failed: 35825
dev.ix.0.iflib.txq05.mbuf_defrag: 37899
dev.ix.0.iflib.txq04.mbuf_defrag_failed: 1
dev.ix.0.iflib.txq04.mbuf_collapse_failed: 7323
dev.ix.0.iflib.txq04.mbuf_defrag: 7728
dev.ix.0.iflib.txq03.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq03.mbuf_collapse_failed: 0
dev.ix.0.iflib.txq03.mbuf_defrag: 0
dev.ix.0.iflib.txq02.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq02.mbuf_collapse_failed: 17819
dev.ix.0.iflib.txq02.mbuf_defrag: 18846
dev.ix.0.iflib.txq01.mbuf_defrag_failed: 1
dev.ix.0.iflib.txq01.mbuf_collapse_failed: 29036
dev.ix.0.iflib.txq01.mbuf_defrag: 30658
dev.ix.0.iflib.txq00.mbuf_defrag_failed: 0
dev.ix.0.iflib.txq00.mbuf_collapse_failed: 0
dev.ix.0.iflib.txq00.mbuf_defrag: 0
3) Patch with "-1":
Average TX performance:
- msg size 512: 3440 Mbps
- msg size 256K: 4361 Mbps