[ath] add explicit bus barriers.
The ath hal and driver code all assume the world is an x86 or the
bus layer does an explicit bus flush after each operation (eg netbsd.)
However, we don't do that.
So, to be "correct" on platforms like sparc64, mips and ppc (and maybe
ARM, I am not sure), just do explicit barriers after each operation.
Now, this does slow things down a tad on embedded platforms but I'd
rather things be "correct" versus "fast." At some later point if someone
wishes it to be fast then we should add the barrier calls to the HAL and
driver.
Tested:
- carambola 2 (AR9331.)