I will probably break this up into multiple commits into SVN, but
here are the changes in total. I was able to run i386 binaries
using AVX in an i386 bhyve VM on my Sandy Bridge laptop. (Specifically,
I wrote a test of getcontextx() that would dump out xmm/ymm regs and
dumped them after some simple tests that manipulated xmm/ymm regs.)
After this the further cleanups I'd like to do is remove the
CPU_DISABLE_SSE option (would clear out some #ifdef's) and make
'device npx' mandatory. We can then move it to sys/i386/i386.
One odd thing to maybe fix next is to move the SSE-enabling code
out of initializecpu() and into npxinit/fpuinit() (either that or move
the xsave bootstrap into initializecpu(); it is really odd to have the
FXSAVE / XSAVE setup split across differnet files, etc. I think having
it all in the FPU code probably makes the most sense. If we unify the
APIs then we can probably have much of these bits shared (e.g. the
xsave probing is all identical). A 'fpstate_t' type could possibly allow
us to share most of fpu.c/npx.c.
Also, I almost created a separate 'bootstack' like amd64 for locore to
use to call init386 but instead decided to just assume that KSTACK_PAGES
was > 1 and use the bottom-most page for the boot stack. This avoids
losing a page of RAM and KVA to hold the bootstack that would only be
used in locore.