Bring some of the recent locore-v4.S improvements into locore-V6...
- Map all 4GB as VA=PA so that args passed in from a bootloader can be accessed regardless of where they are.
- Figure out the kernel load address by directly masking the PC rather then by doing pc-relative math on the _start symbol.
- For EARLY_PRINTF support, map device memory as uncacheable (no-op for ARM_NEW_PMAP because all TEX types resolve to uncacheable).