Attempt to silence the mjg assembly linter warnings about excessive branches.
clang produces really ugly code with the loop so I just moved it out. I think the code looks nice enough that I don't object. I was unable to measure a real difference but I think it makes it clearer to the reader what the full fast path is and the assembly looked substantially nicer.