mlx5en: Optimise away duplicate UAR pointers.
This change also reduces the size of the mlx5e_sq structure so that the last
queue_state element will fit into the previous cacheline and then the mlx5e_sq
structure becomes one cacheline less for amd64.
Sponsored by: Mellanox Technologies
MFC after: 1 week