libmd: Fix amd64 AVX2 SHA-1 transcription errors
This source was manually transcribed from Go's assembly syntax into
FreeBSD's. Some differences exist (e.g. around stack frame allocation,
but also some upstream LEAL instructions were replaced with ADDL here as
getting the 64-bit super-registers of 32-bit isn't so doable, unlike Go)
that were intended, but a few errors crept in. Fix these, found by
comparing post-processed disassembly[1] (handling the ADDL difference
above, and due to Go's assembler not optimising VP[X]OR encoding by
commuting operands when it would give rise to a 2-byte VEX prefix) of a
built copy of the corresponding Go source against ours.
[1] In Vim:
%g/\<vpx\?or\>/s/\(%ymm\([89]\|1[0-5]\)\), %ymm\([0-7]\), %ymm/%ymm\3, \1, %ymm/g (to commute the VP[X]OR operands as LLVM does) %s/\<leal\>\([[:space:]]\+\)(%r\(..\),%r\(..\)), %e\2/addl\1%e\3, %e\2/ (to convert LEAL to ADDL in the cases we do) %s/%e12\>/%r12d/g (as the previous conversion turns %r12 into %e12 not %r12d)
Fixes: 8b4684afcde3 ("lib/libmd: add optimised SHA1 implementations for amd64")