When allocating memory we should try to allocate from the NUMA node
closest to the device to reduce cross domain memory traffic. Teach the
arm64 bus_dma code to do this.
While here use mallocarray to guard against an unlikely integer
overflow.
Sponsored by: Arm Ltd