Changes the c version from the default of 001 to 995
This unroles the loop the maximum number of times for skein256 and 512, and half the maximum for 1024
It does not appear possible to unroll 1024 the maximum number of times (10) because the SKEIN_LOOP setting is read % 10
A setting of 0 is supposed to unroll the maximum number of times, but it uses different code and ends up being slower:
SKEIN_LOOP=0 skein512: 208 MiB/second
SKEIN_LOOP=995 skein512: 508 MiB/second
For the assembly version, the setting changes from the default of 002 to 0
This fully unrolls the skein1024 loop resulting in a performance increase:
SKEIN_LOOP=002 skein1024: 469 MiB/second
SKEIN_LOOP=0 skein1024: 530 MiB/second
Code sizes:
-rwxr-xr-x 1 allan wheel 114640 Sep 7 20:54 /tmp/libmd_asm_000.so*
-rwxr-xr-x 1 allan wheel 105856 Sep 7 20:55 /tmp/libmd_asm_001.so*
-rwxr-xr-x 1 allan wheel 106272 Sep 7 20:55 /tmp/libmd_asm_002.so*
-rwxr-xr-x 1 allan wheel 109952 Sep 7 20:56 /tmp/libmd_asm_005.so*
-rwxr-xr-x 1 allan wheel 99696 Sep 7 21:01 /tmp/libmd_asm_111.so*
-rwxr-xr-x 1 allan wheel 102672 Sep 7 21:09 /tmp/libmd_asm_332.so*
-rwxr-xr-x 1 allan wheel 111600 Sep 7 21:10 /tmp/libmd_asm_995.so*
-rwxr-xr-x 1 allan wheel 156064 Sep 7 21:47 /tmp/libmd_c_000.so*
-rwxr-xr-x 1 allan wheel 101616 Sep 7 21:29 /tmp/libmd_c_111.so*
-rwxr-xr-x 1 allan wheel 103232 Sep 7 21:54 /tmp/libmd_c_112.so*
-rwxr-xr-x 1 allan wheel 107584 Sep 7 21:27 /tmp/libmd_c_332.so*
-rwxr-xr-x 1 allan wheel 113904 Sep 7 21:13 /tmp/libmd_c_995.so*