MFC r318382
- Move Rx Processing to fp_taskqueue(). With this CPU utilization for processing interrupts drops to around 1% for 100G and under 1% for other speeds.
- Use sysctls for TRACE_LRO_CNT and TRACE_TSO_PKT_LEN
- remove unused mtx tx_lock
- bind taskqueue kernel thread to the appropriate cpu core
- when tx_ring is full, stop further transmits till at least 1/16th of the Tx Ring is empty. In our case 1K entries. Also if there are rx_pkts to process, put the taskqueue thread to sleep for 100ms, before enabling interrupts.
- Use rx_pkt_threshold of 128.