- Avoid main lock contention by trylock for if_start, if that fails, schedule TX taskqueue for if_start
- Don't do direct sending if the packet to be sent is large, e.g. TSO packet.
This change gives me stable 9.1Gbps TCP sending performance w/ TSO over a 10Gbe directly connected network (the performance fluctuated between 4Gbps and 9Gbps before this commit). It also improves non-TSO TCP sending performance a lot.