if_arge: fix up TX workaround; add TX/RX requirements for busdma; add stats
The early ethernet MACs (I think AR71xx and AR913x) require that both
TX and RX require 4-byte alignment for all packets.
The later MACs have started relaxing the requirements.
For now, the 1-byte TX and 1-byte RX alignment requirements are only for
the QCA955x SoCs. I'll add in the relaxed requirements as I review the
datasheets and do testing.
- Add a hardware flags field and 1-byte / 4-byte TX/RX alignment.
- .. defaulting to 4-byte TX and 4-byte RX alignment.
- Only enforce the TX alignment fixup if the hardware requires a 4-byte TX alignment. This avoids a call to m_defrag().
- Add counters for various situations for further debugging.
- Set the 1-byte and 4-byte busdma alignment requirement when the tag is created.
This improves the straight bridging performance from 130mbit/sec
to 180mbit/sec, purely by removing the need for TX path bounce buffers.
The main performance issue is the RX alignment requirement and any RX
bounce buffering that's occuring. (In a local test, removing the RX
fixup path and just aligning buffers raises the performance to above
400mbit/sec.
In theory it's a no-op for SoCs before the QCA955x.
Tested:
- QCA9558 SoC in AP135 board, using software bridging between arge0/arge1.