- For certain range of send sizes (usually between 256 and 512 bytes)
TSO stack tends to pass to iflib_encap mbuf chains which cannot be mapped
to segments with iflib_busdma_load_mbuf_sg. In that case m_collapse is
called but it also fails. Currnet implementation drops such mbuf chain
causing drastic loss of TX performance. There are no rejected request
for mbufs and clusters in netstat output when it happens. Tests show
that calling m_defrag when m_collapse failed works in almost 100% cases
and solves the problem. Lowering by 1 number of max segments set in ifp
struct usually eliminates need to call m_collapse at all and gives even
better results for streams with such sends.
Example results of testing with netperf:
- Before the patch:
[root@u2002 ~]# netperf -P0 -H u2020 -t TCP_STREAM -l 10 -- -H u2020-2 -m 512 -M 512 87380 1048576 512 10.17 391.77 [root@u2002 ~]# netstat –m 16130/25375/41505 mbufs in use (current/cache/total) 16072/12204/28276/4194304 mbuf clusters in use (current/cache/total/max) 8213/12027 mbuf+clusters out of packet secondary zone in use (current/cache) 0/3/3/2097152 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/1058683 9k jumbo clusters in use (current/cache/total/max) 0/0/0/595509 16k jumbo clusters in use (current/cache/total/max) 36176K/30763K/66940K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 sendfile syscalls 0 sendfile syscalls completed without I/O request 0 requests for I/O initiated by sendfile 0 pages read by sendfile as part of a request 0 pages were valid at time of a sendfile request 0 pages were valid and substituted to bogus page 0 pages were requested for read ahead by applications 0 pages were read ahead by sendfile 0 times sendfile encountered an already busy page 0 requests for sfbufs denied 0 requests for sfbufs delayed [root@u2002 ~]# sysctl dev.ix.0 | grep mbuf dev.ix.0.iflib.txq21.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq21.mbuf_defrag: 0 dev.ix.0.iflib.txq20.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq20.mbuf_defrag: 0 dev.ix.0.iflib.txq19.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq19.mbuf_defrag: 0 dev.ix.0.iflib.txq18.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq18.mbuf_defrag: 0 dev.ix.0.iflib.txq17.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq17.mbuf_defrag: 0 dev.ix.0.iflib.txq16.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq16.mbuf_defrag: 0 dev.ix.0.iflib.txq15.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq15.mbuf_defrag: 0 dev.ix.0.iflib.txq14.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq14.mbuf_defrag: 0 dev.ix.0.iflib.txq13.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq13.mbuf_defrag: 0 dev.ix.0.iflib.txq12.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq12.mbuf_defrag: 0 dev.ix.0.iflib.txq11.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq11.mbuf_defrag: 0 dev.ix.0.iflib.txq10.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq10.mbuf_defrag: 0 dev.ix.0.iflib.txq09.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq09.mbuf_defrag: 0 dev.ix.0.iflib.txq08.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq08.mbuf_defrag: 0 dev.ix.0.iflib.txq07.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq07.mbuf_defrag: 0 dev.ix.0.iflib.txq06.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq06.mbuf_defrag: 0 dev.ix.0.iflib.txq05.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq05.mbuf_defrag: 0 dev.ix.0.iflib.txq04.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq04.mbuf_defrag: 0 dev.ix.0.iflib.txq03.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq03.mbuf_defrag: 0 dev.ix.0.iflib.txq02.mbuf_defrag_failed: 36 dev.ix.0.iflib.txq02.mbuf_defrag: 3 dev.ix.0.iflib.txq01.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq01.mbuf_defrag: 0 dev.ix.0.iflib.txq00.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq00.mbuf_defrag: 0
Average TX performance in 20 runs of netperf:
- msg size 512: 388 Mbps
- msg size 256K: 4372 Mbps
- Patch without "-1":
Average TX performance:
- msg size 512: 3119 Mbps
- msg size 256K: 4367 Mbps
sysctl dev.ix.0 | grep mbuf: dev.ix.0.iflib.txq21.mbuf_defrag_failed: 1 dev.ix.0.iflib.txq21.mbuf_collapse_failed: 11904 dev.ix.0.iflib.txq21.mbuf_defrag: 12607 dev.ix.0.iflib.txq20.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq20.mbuf_collapse_failed: 0 dev.ix.0.iflib.txq20.mbuf_defrag: 0 dev.ix.0.iflib.txq19.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq19.mbuf_collapse_failed: 0 dev.ix.0.iflib.txq19.mbuf_defrag: 0 dev.ix.0.iflib.txq18.mbuf_defrag_failed: 1 dev.ix.0.iflib.txq18.mbuf_collapse_failed: 10609 dev.ix.0.iflib.txq18.mbuf_defrag: 11225 dev.ix.0.iflib.txq17.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq17.mbuf_collapse_failed: 10858 dev.ix.0.iflib.txq17.mbuf_defrag: 11492 dev.ix.0.iflib.txq16.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq16.mbuf_collapse_failed: 15449 dev.ix.0.iflib.txq16.mbuf_defrag: 16250 dev.ix.0.iflib.txq15.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq15.mbuf_collapse_failed: 0 dev.ix.0.iflib.txq15.mbuf_defrag: 0 dev.ix.0.iflib.txq14.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq14.mbuf_collapse_failed: 10457 dev.ix.0.iflib.txq14.mbuf_defrag: 11064 dev.ix.0.iflib.txq13.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq13.mbuf_collapse_failed: 11654 dev.ix.0.iflib.txq13.mbuf_defrag: 12331 dev.ix.0.iflib.txq12.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq12.mbuf_collapse_failed: 0 dev.ix.0.iflib.txq12.mbuf_defrag: 0 dev.ix.0.iflib.txq11.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq11.mbuf_collapse_failed: 0 dev.ix.0.iflib.txq11.mbuf_defrag: 0 dev.ix.0.iflib.txq10.mbuf_defrag_failed: 3 dev.ix.0.iflib.txq10.mbuf_collapse_failed: 23030 dev.ix.0.iflib.txq10.mbuf_defrag: 24298 dev.ix.0.iflib.txq09.mbuf_defrag_failed: 1 dev.ix.0.iflib.txq09.mbuf_collapse_failed: 11100 dev.ix.0.iflib.txq09.mbuf_defrag: 11719 dev.ix.0.iflib.txq08.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq08.mbuf_collapse_failed: 0 dev.ix.0.iflib.txq08.mbuf_defrag: 0 dev.ix.0.iflib.txq07.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq07.mbuf_collapse_failed: 11064 dev.ix.0.iflib.txq07.mbuf_defrag: 11703 dev.ix.0.iflib.txq06.mbuf_defrag_failed: 1 dev.ix.0.iflib.txq06.mbuf_collapse_failed: 7404 dev.ix.0.iflib.txq06.mbuf_defrag: 7802 dev.ix.0.iflib.txq05.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq05.mbuf_collapse_failed: 35825 dev.ix.0.iflib.txq05.mbuf_defrag: 37899 dev.ix.0.iflib.txq04.mbuf_defrag_failed: 1 dev.ix.0.iflib.txq04.mbuf_collapse_failed: 7323 dev.ix.0.iflib.txq04.mbuf_defrag: 7728 dev.ix.0.iflib.txq03.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq03.mbuf_collapse_failed: 0 dev.ix.0.iflib.txq03.mbuf_defrag: 0 dev.ix.0.iflib.txq02.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq02.mbuf_collapse_failed: 17819 dev.ix.0.iflib.txq02.mbuf_defrag: 18846 dev.ix.0.iflib.txq01.mbuf_defrag_failed: 1 dev.ix.0.iflib.txq01.mbuf_collapse_failed: 29036 dev.ix.0.iflib.txq01.mbuf_defrag: 30658 dev.ix.0.iflib.txq00.mbuf_defrag_failed: 0 dev.ix.0.iflib.txq00.mbuf_collapse_failed: 0 dev.ix.0.iflib.txq00.mbuf_defrag: 0
- Patch with "-1":
Average TX performance:
- msg size 512: 3440 Mbps
- msg size 256K: 4361 Mbps