cxgbe: Make the TOE TLS stats per-queue instead of per-port.
This avoids some atomics by using counter_u64 for TX and relying on
existing single-threading (single ithread per rxq) for RX.
Sponsored by: Chelsio Communications
(cherry picked from commit fe496dc02a9a276d940e72bbd155dc256a34076f)