amd64: reimplement libc memcmp and bcmp with kernel memcmp
Both are significantly slower than hand-coded loops. See r338963 for
kernel commit.
bcmp differs from memcmp by always returning 1 when a difference is
found, as opposed to going for a value bigger or lower than 0
depending on what it is. This means it can do less work. For now the
code is duplicated and modified. This will get deduplicated after
another round of optimization when memcmp will get a longer-term form.
Both tested with the glibc suite. While the suite does not have a test
for bcmp, I created a wrapper routine which verified that values match
(0 vs 0, 1 vs non-zero).
Reviewed by: kib
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17336