The mlx5_en send queue contains two mutexes, one used by xmit and one by
completion interrupt ithreads. Both are adjacent and they end up
sharing a cache line. Use mtx_padalign instead.
I considered moving the comp_lock to group it with other fields modified
by the tx completion path, but mlx5_en splits the structure into
"static" and non-static regions for initialization purposes so this is a
bit hairy.