MFC r351444, r357475, r357479, r357481-r357482, r358859, and r364497.
All these are rx improvements in the cxgbe(4) driver.
cxgbe(4): Use the same buffer size for TOE rx queues as the NIC rx queues.
This is a minor simplification.
cxgbe(4): Initialize the rx buffer's metadata on first-use and not on
refill_fl doesn't touch any part of a freshly allocated cluster after
cxgbe(4): Avoid ext_arg2 in rxb_free.
ext_arg2 is the only item in the third cacheline in an mbuf and could be
cold by the time rxb_free runs. Put the information needed by rxb_free
in the same line as the refcount, which is very likely to be hot given
that rxb_free runs when the refcount is decremented and reaches 0.
cxgbe(4): Retire the allow_mbufs_in_cluster optimization.
This simplifies the driver's rx fast path as well as the bookkeeping
code that tracks various rx buffer sizes and layouts.
cxgbe(4): Treat NIC rx as special and run its handler directly and not
via the t4_cpl_handler dispatch table.
cxgbe(4): Do not try to use 0 as an rx buffer address when the driver is
already allocating from the safe zone and the allocation fails.
This bug was introduced in r357481.
cxgbe(4): Use large clusters for TOE rx queues when TOE+TLS is enabled.
Rx is more efficient within the chip when the receive buffer size
matches the TLS PDU size.
Sponsored by: Chelsio Communications