When mlx4en(4) was convered to using BUSDMA(9) the call to m_defrag was
moved after the part of the tx routine that strips the header from the
mbuf chain. Before it called m_defrag it first trimmed off the now-empty
mbufs from the start of the chain. This has the side effect of also
removing the head of the chain that has M_PKTHDR set. m_defrag will not
defrag a chain that does not have M_PKTHDR set, thus it was effectively
never defragging the mbuf chains.
As it turns out, trimming the mbufs in this fashion is unnecessary since
the call to bus_dmamap_load_mbuf_sg doesn't map empty mbufs anyway, so
remove it.
Additionally this introduces a counter for defrag_attempts, fixes an
outstanding issue with zeroing the oversized_packets counter, and makes
the tso_packets counter per-ring to avoid excessive cache misses in the
tx path.