MFC r352544: Improve ioat(4) NUMA-awareness.
Allocate ioat->ring memory from the device domain.
Schedule ioat->poll_timer to the first CPU of the device domain.
According to pcm-numa tool from intel-pcm port, this reduces number of
remote DRAM accesses while copying data by 75%. And unless it is a noise,
I've noticed some speed improvement when copying data to other domain.