Page MenuHomeFreeBSD

XSAVE/AVX support for i386.
ClosedPublic

Authored by jhb on Oct 31 2014, 11:03 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 11, 9:07 AM
Unknown Object (File)
Sat, Jan 11, 7:00 AM
Unknown Object (File)
Fri, Jan 10, 5:50 AM
Unknown Object (File)
Tue, Jan 7, 6:25 PM
Unknown Object (File)
Dec 9 2024, 8:13 PM
Unknown Object (File)
Nov 21 2024, 9:33 PM
Unknown Object (File)
Nov 13 2024, 11:33 AM
Unknown Object (File)
Nov 7 2024, 11:34 AM
Subscribers

Details

Summary

I will probably break this up into multiple commits into SVN, but
here are the changes in total. I was able to run i386 binaries
using AVX in an i386 bhyve VM on my Sandy Bridge laptop. (Specifically,
I wrote a test of getcontextx() that would dump out xmm/ymm regs and
dumped them after some simple tests that manipulated xmm/ymm regs.)

After this the further cleanups I'd like to do is remove the
CPU_DISABLE_SSE option (would clear out some #ifdef's) and make
'device npx' mandatory. We can then move it to sys/i386/i386.
One odd thing to maybe fix next is to move the SSE-enabling code
out of initializecpu() and into npxinit/fpuinit() (either that or move
the xsave bootstrap into initializecpu(); it is really odd to have the
FXSAVE / XSAVE setup split across differnet files, etc. I think having
it all in the FPU code probably makes the most sense. If we unify the
APIs then we can probably have much of these bits shared (e.g. the
xsave probing is all identical). A 'fpstate_t' type could possibly allow
us to share most of fpu.c/npx.c.

Also, I almost created a separate 'bootstack' like amd64 for locore to
use to call init386 but instead decided to just assume that KSTACK_PAGES
was > 1 and use the bottom-most page for the boot stack. This avoids
losing a page of RAM and KVA to hold the bootstack that would only be
used in locore.

Test Plan
  • Ran a test program that registered a signal handler and raise'd itself to trigger the handler after doing an AVX instruction.
  • Ran a test program that printed xmm/ymm regs from the context returned by getcontextx() and verified they looked correct after performing some SSE/AVX operations.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

jhb retitled this revision from to XSAVE/AVX support for i386..
jhb updated this object.
jhb edited the test plan for this revision. (Show Details)
jhb added a reviewer: kib.

Looks good.

sys/i386/i386/machdep.c
184 ↗(On Diff #2194)

Use register_t as return type (and arg too) ?

sys/i386/include/pcb.h
117 ↗(On Diff #2194)

I doubt that this comment is relevant to the i386. For amd64, the culprit is PCB_FULL_IRET flag, which may be set from the interrupt handler. I am not aware of similar situation for i386.

I do not object against adding set_pcb_flags() and clear_pcb_flags() to i386, but comment is wrong IMO, and assembler implementation is not strictly necessary.

sys/i386/isa/npx.c
975 ↗(On Diff #2194)

I wanted to do this for long time.

sys/i386/i386/machdep.c
184 ↗(On Diff #2194)

hammer_time() hardcodes its return value and args as uintXX_t. 'int' seems to be somewhat assumed for first (it's passed to getmemsize(), etc.) so changing that would be a bit invasive. I do think register_t for the return value makes sense since the purpose is to return a direct register value (%esp).

sys/i386/include/pcb.h
117 ↗(On Diff #2194)

Ah, I was thrown off by the cpu_switch() comment. For some reason I thought npxsave() was also clearing a flag during cpu_switch(), but it is not.

I think for now I will revert these changes entirely. Perhaps if we end up with shared code in sys/x86/x86 that needs it I will re-add it so there is one less #ifdef for i386 code, but I will wait until then.

sys/i386/isa/npx.c
447 ↗(On Diff #2194)

I was surprised to find this omission btw.

975 ↗(On Diff #2194)

It only somewhat made sense when it wanted to allocate IRQ 13 for external co-processors, but those days are long gone.

kib edited edge metadata.
kib added inline comments.
sys/i386/i386/machdep.c
184 ↗(On Diff #2194)

Ok.

sys/i386/isa/npx.c
975 ↗(On Diff #2194)

It caused actual problems, e.g. reported by pho to me. You may remember it.

When early initialization called into bios and bios coders where not careful enough to allow fpu exception to be thrown, the things break badly. The kernel #fp handler is installed much later, when devices are attached.

So removal of this code and moving the initialization before any bios (or EFI) calls are made is good thing.

This revision is now accepted and ready to land.Nov 2 2014, 9:37 AM
jhb edited edge metadata.
  • Change return type of init386 to register_t.
  • Revert clear_pcb_flags/set_pcb_flags as they aren't needed on i386.
  • Fixups.
  • Fix xstate return in sigreturn() by not trashing mc_flags in sendsig().
sys/i386/i386/machdep.c
702 ↗(On Diff #2194)

Thanks for your tests. avx1.c exposed this bug.

  • Merge branch 'master' into i386_xsave
jhb updated this revision to Diff 2230.

Closed by commit rS273995 (authored by @jhb).