Replace hand-crafted naive byte-by-byte zero block detection routine
with macro based around memcmp(). The latter is expected to be some
8 times faster on a modern 64-bit architectures.
In practice, throughput of doing conv=sparse from /dev/zero to /dev/null
went up some 5-fold here from 1.9GB/sec to 9.7GB/sec with this change
MFC after: 2 weeks