Page MenuHomeFreeBSD

Per-session locking for cryptosoft
ClosedPublic

Authored by sef on Sep 24 2018, 11:37 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Nov 21, 1:42 PM
Unknown Object (File)
Wed, Nov 13, 11:57 PM
Unknown Object (File)
Thu, Oct 31, 2:52 AM
Unknown Object (File)
Oct 22 2024, 9:36 AM
Unknown Object (File)
Oct 9 2024, 11:51 AM
Unknown Object (File)
Oct 1 2024, 10:48 AM
Unknown Object (File)
Sep 30 2024, 11:30 AM
Unknown Object (File)
Sep 29 2024, 10:23 PM
Subscribers

Details

Summary

After cem's changes to OCF, I found the zfs crypto code was unstable without aesni. After advice from mmacy and jhb, I added locking, and that fixed it up. After that, I then made it finer-grained by putting the locking code in cryptosoft.

NB: This is admittedly not as fine-grained as it should be. I note that aesni uses per-CPU locks, and that's an option, but it adds complexity and risk.

Test Plan

sysctl kern.cryptodevallowsoft=1
cryptocheck -v -z -a aes-gcm256 -d soft

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

This diff seems to be missing the part where the lock is taken.

In D17307#368980, @cem wrote:

This diff seems to be missing the part where the lock is taken.

WTH. You're right. I ran the diff in the wrong directory. Let me update it.

Ran the diff in the correct directory this time. Sorry.

The change looks ok to me. Soft crypto is going to be slow anyway; I don't think the locking will be particularly expensive in comparison.

I don't understand how my OCF changes (or which ones, I guess) introduced this, however. Can you explain the theory? Thanks.

This revision is now accepted and ready to land.Sep 25 2018, 12:27 AM

I didn't profile it too much; I just walked through the code and saw that, with Reinit, the key schedule would get changed, and without locking, that could cause some problems.

I mostly think it was due to a performance change -- looking up the session ID took time, while just using the pointer doesn't.

sys/opencrypto/cryptosoft.c
1222

I would move this mtx_unlock() one line higher, before crypto_done(). Without real reason why it should be covered by the lock, I think it is bad to call any callbacks while holding random unexpected locks. It is bad for both reenterability and performance.

sef marked an inline comment as done.Sep 25 2018, 4:57 PM
sef added inline comments.
sys/opencrypto/cryptosoft.c
1222

I'm not sure it matters much either way, but my thinking was that the callback function was far more likely to manipulate the session than it was to chain-call any crypto functions. Now, it's a mutex, so perhaps I should change it to an sx lock, or move the unlock above crypto_done(). cem, any thoughts?

sys/opencrypto/cryptosoft.c
1222

+1 mav's suggestion. I don't think there is any reason to worry about the callback reusing the session -- it's free to be reused now. The crp state is independent of the session at this point. I don't see a reason to change it to an sxlock.

Per feedback from mav and cem, move the unlock to before the callback..

This revision now requires review to proceed.Sep 25 2018, 5:03 PM

Upload the _right_ diff file.

But without the cryptocheck changes.

This revision is now accepted and ready to land.Sep 25 2018, 5:10 PM
cem added inline comments.
sys/opencrypto/cryptosoft.c
769–771

the whitespace here looks a little weird. it may just be phabricator, but maybe double check that the indent is just tabs and not space-tab.

sef marked an inline comment as done.

Turn a blank line into a real blank line.

This revision now requires review to proceed.Sep 25 2018, 6:10 PM

On a system with GELI swap, moving the lock until after the call to crypto_done() resulted in a panic (page fault while lock held). That doesn't make any sense to me. Changing it back to what I had initially resulted in a successful boot. So I'm going to be investigating this a bit more.

sys/opencrypto/cryptosoft.c
769–771

Emacs putting in a tab via autoformatting. Changed it to a blank line.

I have been unable to reproduce that panic, even on the same machine.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 26 2018, 8:23 PM
This revision was automatically updated to reflect the committed changes.