Micro-optimize the new arm inline bus_space implementation by grouping all
the data the inline functions access together at the start of the bus_space
struct. The start-of part isn't so important, it's the grouping-together
that's the point: now all the most-accessed data should be in one cache line.
Suggested by: cognet