Page MenuHomeFreeBSD

PowerPC: Implement new memcpy using vectors
AbandonedPublic

Authored by pdk_semihalf.com on Feb 27 2018, 7:38 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 20 2023, 2:19 AM
Unknown Object (File)
Dec 13 2023, 6:28 AM
Unknown Object (File)
Nov 28 2023, 6:20 PM
Unknown Object (File)
Nov 12 2023, 2:34 PM
Unknown Object (File)
Nov 10 2023, 11:30 AM
Unknown Object (File)
Nov 7 2023, 7:09 AM
Unknown Object (File)
Oct 19 2023, 6:58 PM
Unknown Object (File)
Oct 9 2023, 10:23 AM
Subscribers

Details

Summary

Note that this implementation doesn't handle case when buffers are relatively unaligned.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

I have a few main worries about this, mostly also in the inline comments:

  1. I think this trashes userland vector registers, which is very bad.
  2. It breaks booting on Apple hardware.
  3. Is there really a performance benefit to this? The kernel tries pretty vigorously to avoid memcpy and no other architectures have vectorized in-kernel memcpy(), substantially to avoid the serious issues associated with concern 1, but also because it doesn't help much. I would imagine you would get much more benefit, with much less hassle, from optimizing the copy in libc, which would be 90% of the same code but with all the complex cases deleted.
sys/powerpc/aim/locore64.S
165

There have to be better ways to check for this than relying on magic SPRG0 values. This also breaks Apple hardware, which needs to preserve the firmware's SPRG0.

sys/powerpc/powerpc/bcopy.c
84

Why is this PowerNV-specific? It should work on all Altivec-supporting machines and, at the very least, should work inside hypervisors.

sys/powerpc/powerpc/cpu_subr64.S
111

Doesn't this trash userland vector state?

This revision now requires changes to proceed.Feb 27 2018, 7:46 PM

I have a question about your 1st worry. Is this a scenario in which we are doing memcpy in eg. interrupt filter code? If not, could you explain what are you worry about?

I have a question about your 1st worry. Is this a scenario in which we are doing memcpy in eg. interrupt filter code? If not, could you explain what are you worry about?

Suppose we have an altivec- (or VSX-)using userland program. Vector registers on PPC, like all other architectures, are not saved and restored by interrupt handler code. Instead, they are only saved/restored by cpu_switch() and only if PCB_VEC is set. If a userland thread makes a syscall or otherwise ends up in the kernel, I do not see how this avoids clobbering the vector register state of the userland process. Is there something I am missing?

Overall, I think altivec memcpy doesn't belong in the kernel, but does belong in user space. As Nathan mentioned, the kernel very rarely copies such large blocks that altivec would be worth it.

Did you benchmark any of this work, or benchmark anything that would point to this being a net win?

sys/powerpc/aim/locore64.S
165

We stopped preserving the firmware's SPRG0 on powerpc64 last year, due to firmware issues once the MMU was up.

sys/powerpc/powerpc/cpu_subr64.S
99

You need to preserve all the vector registers used. After that point, is there any benefit in using altivec registers in the kernel? We don't do large memory copies in the kernel so I think the setup cost would likely outweigh the benefits. That said, altivec memcpy would be a nice bonus in userspace, taking advantage of IFUNCs, etc, for runtime selection.

And, to eek out every last ounce of performance, stvxl would be better, wouldn't it?

I have a question about your 1st worry. Is this a scenario in which we are doing memcpy in eg. interrupt filter code? If not, could you explain what are you worry about?

Suppose we have an altivec- (or VSX-)using userland program. Vector registers on PPC, like all other architectures, are not saved and restored by interrupt handler code. Instead, they are only saved/restored by cpu_switch() and only if PCB_VEC is set. If a userland thread makes a syscall or otherwise ends up in the kernel, I do not see how this avoids clobbering the vector register state of the userland process. Is there something I am missing?

You are right. Therefore I need to save and restore 16 vector registers, which is waste of time.

sys/powerpc/powerpc/cpu_subr64.S
99

Yes, stvxl would be better.

You both convinced me that implementing this in kernel is bad idea.