kcrypto_aes: Use separate sessions for AES and SHA1
ClosedPublic
Actions

Authored by cem on Jan 30 2016, 10:59 PM.

Details

Reviewers

dfr
rmacklem
asomers

Commits

rS295134: kcrypto_aes: Use separate sessions for AES and SHA1

Summary

Some hardware supports AES acceleration but not SHA1, e.g., AES-NI
extensions. It is useful to have accelerated AES even if SHA1 must be
software.

Suggested by: asomers
Sponsored by: EMC / Isilon Storage Division

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

cem updated this revision to Diff 12895.Jan 30 2016, 10:59 PM

cem retitled this revision from to kcrypto_aes: Use separate sessions for AES and SHA1.

cem updated this object.

cem edited the test plan for this revision. (Show Details)

cem added reviewers: asomers, dfr, rmacklem.

Herald added a subscriber: imp. · View Herald TranscriptJan 30 2016, 10:59 PM

dfr added inline comments.Jan 31 2016, 4:51 PM

sys/kgssapi/krb5/kcrypto_aes.c
47 ↗	(On Diff #12895)	I would prefer as_session_aes and as_session_sha1 here. There is no need to abbreviate.

cem added inline comments.Jan 31 2016, 5:56 PM

sys/kgssapi/krb5/kcrypto_aes.c
47 ↗	(On Diff #12895)	"as," "aes," and "sha1" are all abbreviations too ;-).

Keep full word of "session" in internal struct member names.

cem marked an inline comment as done.Jan 31 2016, 5:59 PM

Looks good to me.

sys/kgssapi/krb5/kcrypto_aes.c
47 ↗	(On Diff #12910)	Those are acronyms :)

This revision is now accepted and ready to land.Feb 1 2016, 11:12 AM

I tested this with a dual Xeon-E5-2620 system as my server and a dual Xeon-E5-2643 as my client. Both systems support AESNI. My test was a simple dd of a 3.6 GB file full of zeros. I set vfs.nfsd.async=1 and used a 5-disk RAIDZ1 pool of 7200 RPM SAS drives.

patched?	sec	write	read
no	krb5p	40.8 MBps	38.6MBps
yes	sys	116.8 MBps	117.0 MBps
yes	krb5	116.8 MBps	116.9 MBps
yes	krb5i	98.8 MBps	62.5 MBps
yes	krb5p	88.6 MBps	56.6 MBps

In conclusion, the patch gave a serious speed improvement for krb5p, the only security level that uses AES on the whole payload. However, krb5i, which does a SHA1 of the entire RPC payload, is still slow. "openssl speed -evp sha1" gives 363 MBps on the server, suggesting that there's still room for improvement. But that's a problem for another day.

Closed by commit rS295134: kcrypto_aes: Use separate sessions for AES and SHA1 (authored by cem). · Explain WhyFeb 2 2016, 12:15 AM

This revision was automatically updated to reflect the committed changes.

In D5146#109817, @asomers wrote:

I tested this with a dual Xeon-E5-2620 system as my server and a dual Xeon-E5-2643 as my client. Both systems support AESNI.
...

patched? sec write read

yes krb5 116.8 MBps 116.9 MBps

yes krb5i 98.8 MBps 62.5 MBps

yes krb5p 88.6 MBps 56.6 MBps

... However, krb5i, which does a SHA1 of the entire RPC payload, is still slow. "openssl speed -evp sha1" gives 363 MBps on the server, suggesting that there's still room for improvement. But that's a problem for another day.

I get widely varying openssl SHA1 results on my Ivybridge laptop depending on block size. Any idea what read/write block size your NFS mount is using?

So, krb5i isn't going to be any better than krb5. In your results, writes are a little worse than vanilla krb5, but reads are a *lot* worse. I wonder why that is.

We seem to use opencrypto/cryptosoft.c -> opencrypto/xform_sha1.c -> crypto/sha1.c. I wonder if openssl has a better sha1 implementation. It's also possible openssl (in userspace) is free to use e.g. SSE instructions that mutate floating point state, while the kernel must make trade-offs around using the floating point registers (and errs on the side of not using them).

In D5146#109829, @cem wrote:

I get widely varying openssl SHA1 results on my Ivybridge laptop depending on block size. Any idea what read/write block size your NFS mount is using?

Dunno. I'm using the defaults. The man page for nfs_mount suggests that rsize and wsize only matter for UDP mounts, which mine is not. But I don't think it will make a big difference. Even at 1kB blocks, my server gets 321 MBps with openssl.

So, krb5i isn't going to be any better than krb5. In your results, writes are a little worse than vanilla krb5, but reads are a *lot* worse. I wonder why that is.

We seem to use opencrypto/cryptosoft.c -> opencrypto/xform_sha1.c -> crypto/sha1.c. I wonder if openssl has a better sha1 implementation. It's also possible openssl (in userspace) is free to use e.g. SSE instructions that mutate floating point state, while the kernel must make trade-offs around using the floating point registers (and errs on the side of not using them).

SHA1 has two phases, the message scheduling phase and the compressor phase. The scheduling phase can be accelerated with SIMD, and openssl does. The compressor phase, however, cannot be vectorized, and it takes longer than the scheduling phase. Openssl implements it in assembly, but I doubt their implementation for the compressor phase is much faster than a C implementation. I know from experience that Openssl's assembly MD5 function is the same speed as a decent C version, and MD5's compressor is similar to SHA1's. So we could probably get some speedup by using SIMD for our SHA1 message scheduler, but probably no more than a 10% boost.

Skylake has instruction extensions for SHA1, but I don't have any Skylake hardware to play with.

In D5146#109965, @asomers wrote:

SHA1 has two phases, the message scheduling phase and the compressor phase. The scheduling phase can be accelerated with SIMD, and openssl does. The compressor phase, however, cannot be vectorized, and it takes longer than the scheduling phase. Openssl implements it in assembly, but I doubt their implementation for the compressor phase is much faster than a C implementation. I know from experience that Openssl's assembly MD5 function is the same speed as a decent C version, and MD5's compressor is similar to SHA1's. So we could probably get some speedup by using SIMD for our SHA1 message scheduler, but probably no more than a 10% boost.

I'll leave that work for another day. :)

Skylake has instruction extensions for SHA1, but I don't have any Skylake hardware to play with.

Yeah, me either.

Revision Contents
Changeset List

Path

Size

head/

sys/

kgssapi/

krb5/

kcrypto_aes.c

56 lines

Diff 12937

View Options

head/sys/kgssapi/krb5/kcrypto_aes.c

kcrypto_aes: Use separate sessions for AES and SHA1ClosedPublicActions