Page MenuHomeFreeBSD

powerpc: Implement fpu_kern_enter/fpu_kern_leave
ClosedPublic

Authored by sanastasio_raptorengineering.com on Aug 21 2023, 11:01 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, May 6, 3:40 AM
Unknown Object (File)
Mon, May 6, 12:42 AM
Unknown Object (File)
Sat, May 4, 12:02 PM
Unknown Object (File)
Fri, Apr 26, 1:30 AM
Unknown Object (File)
Tue, Apr 23, 5:04 AM
Unknown Object (File)
Sat, Apr 20, 7:25 PM
Unknown Object (File)
Sat, Apr 20, 7:24 PM
Unknown Object (File)
Sat, Apr 20, 7:20 PM

Details

Summary

Provide an implementation of fpu_kern_enter/fpu_kern_leave for PPC to
enable FPU, VSX, and Altivec usage in-kernel. The functions currently
only support FPU_KERN_NOCTX, but this is sufficient for ossl(1) and many
other users of the API.

Test Plan

This patchset has been tested on powerpc64le using a modified version of the in-tree tools/tools/crypto/cryptocheck.c tool to check for FPU/Vec register clobbering along with a follow-up patch to enable ossl(4) on powerpc64*.

The (work-in-progress) cryptocheck patch that implements FPU clobber testing is available here: https://gitlab.raptorengineering.com/sanastasio/freebsd-patches/-/blob/master/0003-WIP-Add-FPU-clobber-check-for-ppc64-to-cryptocheck.patch

The follow-up (work-in-progress) ossl(4) patch for powerpc64* support is available here: https://gitlab.raptorengineering.com/sanastasio/freebsd-patches/-/blob/master/0002-ossl-Add-support-for-powerpc64-powerpc64le.patch

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Awesome, thanks for this patch! This might pave the way to fixing GPU drivers as well. Could you bump the version in https://cgit.freebsd.org/src/tree/sys/sys/param.h#n78? This will make it possible to add version checks in https://github.com/freebsd/drm-kmod/blob/5.15-lts/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c#L97 to support FreeBSD/powerpc64* as well.

As for the contents of the current patch, I will leave it to people who are able to correctly review it. But I think you should be able to switch https://cgit.freebsd.org/src/tree/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_powerpc.h#n53 to 1 for testing - if you can load ZFS module with FPU and do some basic operations, this patch should be ok.

Also, regarding your commit description for ossl: "The required assembly files for ppc were already present, but never used before now.". That's actually wrong. I added those files to be used by userspace OpenSSL libraries and they in fact allow up to 20x speedup over the previous portable C code. Back then, I tried adding ossl(4) support as well but didn't have enough knowledge to add kfpu support.

One tip for future uploads - please generate patches with full context (e.g. git show -U999999 <hash>) so that Phabricator will be able to expand the rest of the file. See https://wiki.freebsd.org/Phabricator for details.

Testing with ZFS would be a plus. It also only uses FPU_KERN_NOCTX on aarch64 and x86. There is a fpu_kern.9 manual page you will also want to update.

Thank you all for the comments. I'll coalesce my replies into a single message.

@pkubaj - Would it be most appropriate to perform that version bump in this patch or in a separate follow-up patch? Also thank you for pointing out the issue with the ossl patch description -- I'll correct that before submission here.

@emaste - Noted -- I'll fix that going forward.

@jhb - Thanks for pointing out the missing man page change, I'll update it and upload a new patch here. I'll also look into doing some basic tests with ZFS.

If possible please!make sure this doesn't break the boot loader. I don't see how it could, but ZFS and float has broken the loader before

It would probably be fine to bump the version in one commit.

I've just done some preliminary tests with ZFS kernel FPU enabled on power and it seems to be working, though I'm not sure if I've fully exercised the relevant code paths.

For reference, the patch enabling ZFS kfpu support I wrote is here: https://gitlab.raptorengineering.com/sanastasio/freebsd-patches/-/blob/master/0004-zfs-powerpc-enable-kernel-floating-point.patch

The commands I ran to test basic functionality were:

  1. dd if=/dev/zero of=/zfs.img bs=1K count=1 seek=1M
  2. zpool create zvol0 /zfs.img
  3. dd if=/dev/urandom of=/zvol0/junk bs=1K count=200K
  4. zpool scrub zvol0

I also added a printf inside kfpu_begin() to ensure that ZFS was indeed using the kernel FPU facilities.

  • Update fpu_kern.9 manual page
  • Bump __FreeBSD_version
  • Rebase

Overall this looks good, just the one comment on SPE. Can you also test with powerpc64 BE to avoid surprises?

sys/powerpc/powerpc/fpu.c
332

I think this whole block is not correct for SPE. SPE uses PSL_VEC and enable_vec(). This file, fpu.c, is used only for FP emulation, which would be a waste for fpu_kern with SPE, quite likely a severe performance penalty.

Since I'm pretty sure there are very few users of SPE (quite possibly just me), I recommend just blocking this whole thing from SPE, and if someone is adventurous enough they can try making it work.

sys/powerpc/powerpc/fpu.c
332

Sounds good -- I'll wrap the entire function bodies in an ifndef __SPE__.

  • Guard out entire implementation on SPE

Additionally, I have tested the patch along with my aforementioned ossl patches on powerpc64 big endian and confirmed that the fpu_kern* facilities work as expected.

Alarmingly, though, independent of any of my patches, I discovered that the cryptocheck tool doesn't fully pass on big endian with the cryptosoft backend. The ossl backend enabled by my patches works fine though.

Can you share which tests fail on big-endian? Is it specific to certain ciphers or hash algorithms?

It seems like aes-xts and all its variants are the only ones failing. See below.

# ./cryptocheck -a all -d soft -v
ripemd160 (16) matched (cryptodev device cryptosoft0)
sha1 (16) matched (cryptodev device ossl0)
sha224 (16) matched (cryptodev device ossl0)
sha256 (16) matched (cryptodev device ossl0)
sha384 (16) matched (cryptodev device ossl0)
sha512 (16) matched (cryptodev device ossl0)
blake2b (16) matched (cryptodev device cryptosoft0)
blake2s (16) matched (cryptodev device cryptosoft0)
ripemd160hmac (16) matched (cryptodev device cryptosoft0)
sha1hmac (16) matched (cryptodev device ossl0)
sha224hmac (16) matched (cryptodev device ossl0)
sha256hmac (16) matched (cryptodev device ossl0)
sha384hmac (16) matched (cryptodev device ossl0)
sha512hmac (16) matched (cryptodev device ossl0)
gmac128 (16) matched (cryptodev device cryptosoft0)
gmac192 (16) matched (cryptodev device cryptosoft0)
gmac256 (16) matched (cryptodev device cryptosoft0)
poly1305 (16) matched (cryptodev device ossl0)
aes-cbc128 (16) matched (cryptodev device ossl0)
aes-cbc192 (16) matched (cryptodev device ossl0)
aes-cbc256 (16) matched (cryptodev device ossl0)
aes-ctr128 (16) matched (cryptodev device cryptosoft0)
aes-ctr192 (16) matched (cryptodev device cryptosoft0)
aes-ctr256 (16) matched (cryptodev device cryptosoft0)
aes-xts128 (16) encryption mismatch:
control:
0000   f5 dc 96 14 60 43 95 99 e0 cf fc 7f d8 89 08 38  |....`C.........8|
test (cryptodev device cryptosoft0):
0000   7d 28 37 e0 95 f7 4d 97 34 5b 02 ad 1a ed 51 e6  |}(7...M.4[....Q.|
aes-xts256 (16) encryption mismatch:
control:
0000   d1 c2 99 6d 13 f7 dc 12 49 e2 43 49 ae d4 2b 61  |...m....I.CI..+a|
test (cryptodev device cryptosoft0):
0000   27 cc f5 72 13 6a d6 05 8a 49 cd 47 98 ae bc c2  |'..r.j...I.G....|
camellia-cbc128 (16) matched (cryptodev device cryptosoft0)
camellia-cbc192 (16) matched (cryptodev device cryptosoft0)
camellia-cbc256 (16) matched (cryptodev device cryptosoft0)
chacha20 (16) matched (cryptodev device ossl0)
aes-cbc128+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-cbc128+sha1hmac (0, 16) matched (cryptodev device ossl0)
aes-cbc128+sha224hmac (0, 16) matched (cryptodev device ossl0)
aes-cbc128+sha256hmac (0, 16) matched (cryptodev device ossl0)
aes-cbc128+sha384hmac (0, 16) matched (cryptodev device ossl0)
aes-cbc128+sha512hmac (0, 16) matched (cryptodev device ossl0)
aes-cbc192+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-cbc192+sha1hmac (0, 16) matched (cryptodev device ossl0)
aes-cbc192+sha224hmac (0, 16) matched (cryptodev device ossl0)
aes-cbc192+sha256hmac (0, 16) matched (cryptodev device ossl0)
aes-cbc192+sha384hmac (0, 16) matched (cryptodev device ossl0)
aes-cbc192+sha512hmac (0, 16) matched (cryptodev device ossl0)
aes-cbc256+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-cbc256+sha1hmac (0, 16) matched (cryptodev device ossl0)
aes-cbc256+sha224hmac (0, 16) matched (cryptodev device ossl0)
aes-cbc256+sha256hmac (0, 16) matched (cryptodev device ossl0)
aes-cbc256+sha384hmac (0, 16) matched (cryptodev device ossl0)
aes-cbc256+sha512hmac (0, 16) matched (cryptodev device ossl0)
aes-ctr128+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-ctr128+sha1hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-ctr128+sha224hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-ctr128+sha256hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-ctr128+sha384hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-ctr128+sha512hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-ctr192+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-ctr192+sha1hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-ctr192+sha224hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-ctr192+sha256hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-ctr192+sha384hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-ctr192+sha512hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-ctr256+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-ctr256+sha1hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-ctr256+sha224hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-ctr256+sha256hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-ctr256+sha384hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-ctr256+sha512hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-xts128+ripemd160hmac (0, 16) encryption mismatch:
control:
0000   61 08 43 05 51 e1 7e 7c ce b6 25 7d 97 ff 74 f4  |a.C.Q.~|..%}..t.|
test (cryptodev device cryptosoft0):
0000   c8 91 66 30 83 ef 3f 98 8f e4 a0 f7 52 58 cf 7e  |..f0..?.....RX.~|
aes-xts128+sha1hmac (0, 16) encryption mismatch:
control:
0000   f1 a4 90 dd 88 61 00 1c 97 ae ac f4 e8 7e 5d 73  |.....a.......~]s|
test (cryptodev device cryptosoft0):
0000   4b 40 61 77 2a f4 96 6b 59 a7 92 4c 45 1a 38 c8  |K@aw*..kY..LE.8.|
aes-xts128+sha224hmac (0, 16) encryption mismatch:
control:
0000   48 8c 9a cb 5d 82 59 ae 6a db 13 b9 76 29 67 fb  |H...].Y.j...v)g.|
test (cryptodev device cryptosoft0):
0000   5c 50 f8 99 c4 5d 7b 5c 00 34 51 27 23 89 87 5f  |\P...]{\.4Q'#.._|
aes-xts128+sha256hmac (0, 16) encryption mismatch:
control:
0000   fb 6f 7e 08 09 4a 44 71 3a 9a fe 91 a2 c8 f0 2d  |.o~..JDq:......-|
test (cryptodev device cryptosoft0):
0000   ba 65 d9 1e 14 f0 bb 6f b4 24 16 8a e6 95 24 0d  |.e.....o.$....$.|
aes-xts128+sha384hmac (0, 16) encryption mismatch:
control:
0000   ec c9 51 67 10 44 7d 2f 3c 97 34 53 62 f0 62 89  |..Qg.D}/<.4Sb.b.|
test (cryptodev device cryptosoft0):
0000   4d f5 e9 91 c3 27 cd cb ba 2f 5e c8 4e 0d af 7a  |M....'.../^.N..z|
aes-xts128+sha512hmac (0, 16) encryption mismatch:
control:
0000   64 5a 65 bb f9 a7 9d 22 49 44 e9 dc b9 fa 75 fc  |dZe...."ID....u.|
test (cryptodev device cryptosoft0):
0000   1c fc 16 fa 6c 24 11 e8 39 ac 63 b6 73 a0 1b f0  |....l$..9.c.s...|
aes-xts256+ripemd160hmac (0, 16) encryption mismatch:
control:
0000   86 b3 7d 97 4f ad 72 41 e7 73 2a d0 75 92 7a c9  |..}.O.rA.s*.u.z.|
test (cryptodev device cryptosoft0):
0000   5e 74 e1 59 79 93 0c 63 57 7a 6f 87 ae 6a 4c 66  |^t.Yy..cWzo..jLf|
aes-xts256+sha1hmac (0, 16) encryption mismatch:
control:
0000   4a 8d 97 6c 6c 96 d7 9c c0 8e e1 57 14 4a 79 be  |J..ll......W.Jy.|
test (cryptodev device cryptosoft0):
0000   a4 ca f7 29 cc 9b c2 07 4a a6 36 7f c6 88 f2 68  |...)....J.6....h|
aes-xts256+sha224hmac (0, 16) encryption mismatch:
control:
0000   8e c7 c1 f0 91 aa cb 58 3c 59 98 e5 11 a3 3b 7c  |.......X<Y....;||
test (cryptodev device cryptosoft0):
0000   7b 1e 06 5e 63 b2 33 2b 59 e2 51 e0 7e 49 50 cf  |{..^c.3+Y.Q.~IP.|
aes-xts256+sha256hmac (0, 16) encryption mismatch:
control:
0000   6c 26 fa 56 ec 86 74 c9 7b c9 2b 11 6a b4 eb 60  |l&.V..t.{.+.j..`|
test (cryptodev device cryptosoft0):
0000   23 56 50 1b 35 48 d2 b7 81 07 dc aa a6 b4 f9 ae  |#VP.5H..........|
aes-xts256+sha384hmac (0, 16) encryption mismatch:
control:
0000   7b 88 d0 ec 4b e5 37 3c d3 76 3d 42 ee 29 1a 82  |{...K.7<.v=B.)..|
test (cryptodev device cryptosoft0):
0000   6c 7c b7 55 a8 8c fc ab f9 e2 cb 6a 40 0c 61 b4  |l|.U.......j@.a.|
aes-xts256+sha512hmac (0, 16) encryption mismatch:
control:
0000   1c e3 85 e3 d0 a7 8f 0d 1a 00 aa 9c 55 8e 3f 96  |............U.?.|
test (cryptodev device cryptosoft0):
0000   2d 65 e0 82 c6 02 9b 04 76 3b 8f ce 7f 7e 9f c5  |-e......v;...~..|
camellia-cbc128+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0)
camellia-cbc128+sha1hmac (0, 16) matched (cryptodev device cryptosoft0)
camellia-cbc128+sha224hmac (0, 16) matched (cryptodev device cryptosoft0)
camellia-cbc128+sha256hmac (0, 16) matched (cryptodev device cryptosoft0)
camellia-cbc128+sha384hmac (0, 16) matched (cryptodev device cryptosoft0)
camellia-cbc128+sha512hmac (0, 16) matched (cryptodev device cryptosoft0)
camellia-cbc192+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0)
camellia-cbc192+sha1hmac (0, 16) matched (cryptodev device cryptosoft0)
camellia-cbc192+sha224hmac (0, 16) matched (cryptodev device cryptosoft0)
camellia-cbc192+sha256hmac (0, 16) matched (cryptodev device cryptosoft0)
camellia-cbc192+sha384hmac (0, 16) matched (cryptodev device cryptosoft0)
camellia-cbc192+sha512hmac (0, 16) matched (cryptodev device cryptosoft0)
camellia-cbc256+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0)
camellia-cbc256+sha1hmac (0, 16) matched (cryptodev device cryptosoft0)
camellia-cbc256+sha224hmac (0, 16) matched (cryptodev device cryptosoft0)
camellia-cbc256+sha256hmac (0, 16) matched (cryptodev device cryptosoft0)
camellia-cbc256+sha384hmac (0, 16) matched (cryptodev device cryptosoft0)
camellia-cbc256+sha512hmac (0, 16) matched (cryptodev device cryptosoft0)
chacha20+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0)
chacha20+sha1hmac (0, 16) matched (cryptodev device cryptosoft0)
chacha20+sha224hmac (0, 16) matched (cryptodev device cryptosoft0)
chacha20+sha256hmac (0, 16) matched (cryptodev device cryptosoft0)
chacha20+sha384hmac (0, 16) matched (cryptodev device cryptosoft0)
chacha20+sha512hmac (0, 16) matched (cryptodev device cryptosoft0)
aes-gcm128/12 (0, 16) matched (cryptodev device ossl0)
aes-gcm192/12 (0, 16) matched (cryptodev device ossl0)
aes-gcm256/12 (0, 16) matched (cryptodev device ossl0)
aes-ccm128/12 (0, 16) matched (cryptodev device cryptosoft0)
aes-ccm192/12 (0, 16) matched (cryptodev device cryptosoft0)
aes-ccm256/12 (0, 16) matched (cryptodev device cryptosoft0)
chacha20-poly1305/12 (0, 16) matched (cryptodev device ossl0)

Hi all,

Just a quick ping to see what the status was on this patch and whether or not it's in a state that it could be committed.

As long as it doesn't require VSX (able to run on PowerMac G5), I think it's good.

This revision is now accepted and ready to land.Sep 12 2023, 4:54 PM

As long as it doesn't require VSX (able to run on PowerMac G5), I think it's good.

Though I haven't tested it on that hardware, I did take care to check for VSX support before setting the corresponding MSR bit in enable_fpu_kern, so it should be good.

Also, thank you @jhibbits for approving the patch. I'm not super familiar with the FreeBSD contribution work flow yet, so excuse the question, but is there anything that I need to do at this point to have this patch committed? I presume the patch must now be merged by someone other than me with commit privileges.

Also, thank you @jhibbits for approving the patch. I'm not super familiar with the FreeBSD contribution work flow yet, so excuse the question, but is there anything that I need to do at this point to have this patch committed? I presume the patch must now be merged by someone other than me with commit privileges.

@sanastasio_raptorengineering.com Yes, a committer (probably me) needs to push this change. I'll try to find the time in the next few days to merge it. Thanks for your work on this!

Assuming everything works fine after a MFC period, could this change be also MFS'd to releng/14.0?