Page MenuHomeFreeBSD

kcrypto_aes: Use separate sessions for AES and SHA1
ClosedPublic

Authored by cem on Jan 30 2016, 10:59 PM.

Details

Summary

Some hardware supports AES acceleration but not SHA1, e.g., AES-NI
extensions. It is useful to have accelerated AES even if SHA1 must be
software.

Suggested by: asomers
Sponsored by: EMC / Isilon Storage Division

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

cem updated this revision to Diff 12895.Jan 30 2016, 10:59 PM
cem retitled this revision from to kcrypto_aes: Use separate sessions for AES and SHA1.
cem updated this object.
cem edited the test plan for this revision. (Show Details)
cem added reviewers: asomers, dfr, rmacklem.
dfr added inline comments.Jan 31 2016, 4:51 PM
sys/kgssapi/krb5/kcrypto_aes.c
47 ↗(On Diff #12895)

I would prefer as_session_aes and as_session_sha1 here. There is no need to abbreviate.

cem added inline comments.Jan 31 2016, 5:56 PM
sys/kgssapi/krb5/kcrypto_aes.c
47 ↗(On Diff #12895)

"as," "aes," and "sha1" are all abbreviations too ;-).

cem updated this revision to Diff 12910.Jan 31 2016, 5:59 PM

Keep full word of "session" in internal struct member names.

cem marked an inline comment as done.Jan 31 2016, 5:59 PM
dfr accepted this revision.Feb 1 2016, 11:12 AM
dfr edited edge metadata.

Looks good to me.

sys/kgssapi/krb5/kcrypto_aes.c
47 ↗(On Diff #12910)

Those are acronyms :)

This revision is now accepted and ready to land.Feb 1 2016, 11:12 AM
asomers accepted this revision.Feb 1 2016, 9:46 PM
asomers edited edge metadata.

I tested this with a dual Xeon-E5-2620 system as my server and a dual Xeon-E5-2643 as my client. Both systems support AESNI. My test was a simple dd of a 3.6 GB file full of zeros. I set vfs.nfsd.async=1 and used a 5-disk RAIDZ1 pool of 7200 RPM SAS drives.

patched?secwriteread
nokrb5p40.8 MBps38.6MBps
yessys116.8 MBps117.0 MBps
yeskrb5116.8 MBps116.9 MBps
yeskrb5i98.8 MBps62.5 MBps
yeskrb5p88.6 MBps56.6 MBps

In conclusion, the patch gave a serious speed improvement for krb5p, the only security level that uses AES on the whole payload. However, krb5i, which does a SHA1 of the entire RPC payload, is still slow. "openssl speed -evp sha1" gives 363 MBps on the server, suggesting that there's still room for improvement. But that's a problem for another day.

This revision was automatically updated to reflect the committed changes.
cem added a comment.Feb 2 2016, 12:28 AM

I tested this with a dual Xeon-E5-2620 system as my server and a dual Xeon-E5-2643 as my client. Both systems support AESNI.
...

patched?secwriteread
yeskrb5116.8 MBps116.9 MBps
yeskrb5i98.8 MBps62.5 MBps
yeskrb5p88.6 MBps56.6 MBps

... However, krb5i, which does a SHA1 of the entire RPC payload, is still slow. "openssl speed -evp sha1" gives 363 MBps on the server, suggesting that there's still room for improvement. But that's a problem for another day.

I get widely varying openssl SHA1 results on my Ivybridge laptop depending on block size. Any idea what read/write block size your NFS mount is using?

So, krb5i isn't going to be any better than krb5. In your results, writes are a little worse than vanilla krb5, but reads are a *lot* worse. I wonder why that is.

We seem to use opencrypto/cryptosoft.c -> opencrypto/xform_sha1.c -> crypto/sha1.c. I wonder if openssl has a better sha1 implementation. It's also possible openssl (in userspace) is free to use e.g. SSE instructions that mutate floating point state, while the kernel must make trade-offs around using the floating point registers (and errs on the side of not using them).

In D5146#109829, @cem wrote:

I get widely varying openssl SHA1 results on my Ivybridge laptop depending on block size. Any idea what read/write block size your NFS mount is using?

Dunno. I'm using the defaults. The man page for nfs_mount suggests that rsize and wsize only matter for UDP mounts, which mine is not. But I don't think it will make a big difference. Even at 1kB blocks, my server gets 321 MBps with openssl.

So, krb5i isn't going to be any better than krb5. In your results, writes are a little worse than vanilla krb5, but reads are a *lot* worse. I wonder why that is.
We seem to use opencrypto/cryptosoft.c -> opencrypto/xform_sha1.c -> crypto/sha1.c. I wonder if openssl has a better sha1 implementation. It's also possible openssl (in userspace) is free to use e.g. SSE instructions that mutate floating point state, while the kernel must make trade-offs around using the floating point registers (and errs on the side of not using them).

SHA1 has two phases, the message scheduling phase and the compressor phase. The scheduling phase can be accelerated with SIMD, and openssl does. The compressor phase, however, cannot be vectorized, and it takes longer than the scheduling phase. Openssl implements it in assembly, but I doubt their implementation for the compressor phase is much faster than a C implementation. I know from experience that Openssl's assembly MD5 function is the same speed as a decent C version, and MD5's compressor is similar to SHA1's. So we could probably get some speedup by using SIMD for our SHA1 message scheduler, but probably no more than a 10% boost.

Skylake has instruction extensions for SHA1, but I don't have any Skylake hardware to play with.

cem added a comment.Feb 3 2016, 5:09 PM

SHA1 has two phases, the message scheduling phase and the compressor phase. The scheduling phase can be accelerated with SIMD, and openssl does. The compressor phase, however, cannot be vectorized, and it takes longer than the scheduling phase. Openssl implements it in assembly, but I doubt their implementation for the compressor phase is much faster than a C implementation. I know from experience that Openssl's assembly MD5 function is the same speed as a decent C version, and MD5's compressor is similar to SHA1's. So we could probably get some speedup by using SIMD for our SHA1 message scheduler, but probably no more than a 10% boost.

I'll leave that work for another day. :)

Skylake has instruction extensions for SHA1, but I don't have any Skylake hardware to play with.

Yeah, me either.