Also, existing assembler implementation uses ld/sd instructions which
are only available on MIPS64. Use lw/sw instructions instead.
Note that the new implementation keeps disabling interrupts around the
load and store sequence, but this does not make the operations atomic.
Only use of llx instruction from MIPS32 v6 would allow this.
The functions are only used by ddb with somewhat doubtful reasoning,
so atomicity is in fact not too critical.
I kept the nops after the accesses, I suppose this is done due for
some requirements about MMIO.
I believe that the code is endian-neutral.