Page MenuHomeFreeBSD

x86: Use XSAVEOPT for fpusave(), when available
ClosedPublic

Authored by cem on Mar 20 2019, 9:53 PM.

Details

Summary

Suggested by: Anton Rang

Test Plan

It seems to boot and run fine on my bhyve VM, which I verified passes through
the cpuid bit. I don't know any particular tests to stress the path.

i386 kernel builds ok in tinderbox; I did not test it.

Prompted by Anton; some earlier discussion from 2017: https://lists.freebsd.org/pipermail/freebsd-amd64/2017-August/019026.html

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

cem created this revision.Mar 20 2019, 9:53 PM
kib added inline comments.Mar 21 2019, 2:54 PM
sys/amd64/amd64/fpu.c
381 ↗(On Diff #55300)

I am not sure that this use of fpusave() is safe for XSAVEOPT. It might be occasionally in your tests due to pre-existing state, but I think we want unconditional save of all FPU register file components regardless of chance that OPT part would hit us.

cem edited the test plan for this revision. (Show Details)Mar 21 2019, 5:38 PM
cem added inline comments.Mar 21 2019, 5:41 PM
sys/amd64/amd64/fpu.c
381 ↗(On Diff #55300)

Perhaps this particular fpusave() invocation should be replaced with use_xsave ? fpusave_xsave(...) : fpusave_fxsave(...)?

I'm hoping @rang_acm.org understands XSAVEOPT better than I do and can chime in here.

kib added inline comments.Mar 21 2019, 6:24 PM
sys/amd64/amd64/fpu.c
381 ↗(On Diff #55300)

Yes, I think you are right.

rang_acm.org added inline comments.Mar 22 2019, 6:51 AM
sys/amd64/amd64/fpu.c
370–371 ↗(On Diff #55300)

It bothers me a little that this relies on the FPU not being used between the early call to fpuinit() and this SYSINIT function running, but I don’t think it’s worth changing.

381 ↗(On Diff #55300)

Yes, XSAVEOPT is probably not what we want when setting the initial state. There’s a slight chance that the “modified” optimization would do the wrong thing (perhaps with virtualization, or if we were particularly unlucky in what the BIOS/EFI had done).

XSAVE is not necessarily a perfect choice either, but unless the BIOS/EFI is doing something odd like using AVX registers and not clearing them before transferring control to us, it should be OK.

Using FXSAVE unconditionally is actually probably the best choice here. It will save the “legacy” state, while leaving the “extended” state zeroed. This will ensure that restoring with XRSTOR is as efficient as possible, no matter what the CPU state is as we’re initializing.

Another alternative would be to just set the non-zero words (FCW & FTW, per 13.6 of Intel volume 1, and MXCSR), rather than using a save instruction at all. (This is what Linux does.) However, as far as I can see, the only way to determine what MXCSR should be set to is by executing an FXSAVE anyway (or hard-coding the value, since x87 isn’t changing), so FXSAVE (after FNINIT) seems fine to me.

If we add support XSAVES/XRSTORS, we’ll need to also set the compacted flag (XCOMP_BV[63]) on the save area, but otherwise nothing here changes.

kib added inline comments.Mar 22 2019, 11:29 AM
sys/amd64/amd64/fpu.c
370–371 ↗(On Diff #55300)

You mean, why not fpuinitstate() done from fpuinit() ? We do not have malloc operational at the time of fpuinit().

kib added inline comments.Mar 22 2019, 1:18 PM
sys/amd64/amd64/fpu.c
381 ↗(On Diff #55300)

I tend to agree with the proposal (FNINIT/FXSAVE, set everything else, including the XSTATE_BV mask, to zero).

I had an impression that XSTATE_BV[0] and XSTATE_BV[1] better be 1, but cannot find the confirmation in the current edition of SDM.

Also I cannot find a statement that all future xsave-managed states initial conditions will be all-zero. SDM only lists enumerates already defined states (vol.1 13.6) and for each lists zeroing individually.

Anyway, regardless of this FUD, I think it is reasonable to go with your proposal. I find it more worrying that some BIOS corrupts the AVX* environment on BSP than to expect that non-zero initialization would be required.

rang_acm.org added inline comments.Mar 22 2019, 1:27 PM
sys/amd64/amd64/fpu.c
370–371 ↗(On Diff #55300)

That makes sense. I was thinking more “why fpuinit() not called from fpuinitstate()”; it just seems a little odd to me that we clear the FPU state early, but then save the cleared state later. Not a big deal.

381 ↗(On Diff #55300)

Yeah, I don’t think the SDM states that initial conditions will be all-zero. However, the documentation (in volume 1) for XRSTOR states:

If none of these conditions cause a fault, the processor updates each state component i for which RFBM[i] = 1. XRSTOR updates state component i based on the value of bit i in the XSTATE_BV field of the XSAVE header:
• If XSTATE_BV[i] = 0, the state component is set to its initial configuration. Section 13.6 specifies the initial configuration of each state component.
The initial configuration of state component 1 pertains only to the XMM registers and not to MXCSR. See below for the treatment of MXCSR
• If XSTATE_BV[i] = 1, the state component is loaded with data from the XSAVE area. See Section 13.5 for specifics for each state component and for details regarding mode-specific operation and operation determined by instruction prefixes. See Section 13.13 for details regarding faults caused by memory accesses.

My reading is that so long as XSTATE_BV[i] is zeroed, it doesn’t matter whether we zero the state area for that component or not; any restore will ignore the state & set initial conditions.

kib added inline comments.Mar 22 2019, 2:00 PM
sys/amd64/amd64/fpu.c
370–371 ↗(On Diff #55300)

Calling fpuinit() from fpuinitstate() is too late. We need to know the size of the fpu save area because it is located right after pcb, before we switch to the final thread0 kstack. So fpuinit() reconfigures FPU to know how much to reserve.

The later part is run when we can malloc(), actually it should be run before first use of FPU in kernel or first execution of usermode instruction, whatever comes first.

381 ↗(On Diff #55300)

Sometimes we copy out the initial state blob to usermode. E.g. if userspace did not touched FPU at all, and a signal delivered to the thread, then we do sendsig()->get_fpcontext()->fpugetregs()->direct copy from initial state into kstack fpu user save area, then sendsig copyouts from the user save area. If there should be something non-zero for initial state, we must provide it.

cem added inline comments.Mar 22 2019, 5:23 PM
sys/amd64/amd64/fpu.c
381 ↗(On Diff #55300)

Ok, what is the concrete proposal?

  1. use fxsave() unconditionally here
  2. We invoke fninit() in fpuinit(); should it be invoked again before fxsave?
  3. Finally, are ifuncs resolved at link time, or on-the-fly? (Can we rely on init_xsave() being invoked by ifunc resolver, or no?) If not, we need init_xsave() at least prior to the if (use_xsave) condition below.

Or something else?

kib added inline comments.Mar 22 2019, 5:32 PM
sys/amd64/amd64/fpu.c
381 ↗(On Diff #55300)

Simplest is to use fxsave() there unconditionally.
No need to repeat fninit().
ifuncs() are resolved very early during boot, not at the call time. init_xsave() was already called when even fpuinit_bsp1() is executing.

rang_acm.org added inline comments.Mar 22 2019, 9:14 PM
sys/amd64/amd64/fpu.c
381 ↗(On Diff #55300)

Agreed, use fxsave() unconditionally.

kernel ifuncs are resolved by the link_elf_ireloc() in hammer_time(), I believe. That’s after CPU identification, so init_xsave() should be fine.

cem marked 8 inline comments as done.Mar 22 2019, 10:18 PM
cem updated this revision to Diff 55370.

Use fxsave unconditionally to save initial FPU state (amd64)

rang_acm.org accepted this revision.Mar 23 2019, 1:44 AM
This revision is now accepted and ready to land.Mar 23 2019, 1:44 AM
kib added inline comments.Mar 23 2019, 9:04 AM
sys/i386/i386/npx.c
498 ↗(On Diff #55370)

Don't you need a similar change there ? Take cpu_fxsr into account.

cem marked an inline comment as done.Mar 24 2019, 2:11 PM
cem updated this revision to Diff 55404.

Also use a full save for i386 initialstate.

This revision now requires review to proceed.Mar 24 2019, 2:11 PM
cem marked an inline comment as not done.Mar 24 2019, 2:12 PM
cem added inline comments.
sys/i386/i386/npx.c
355–377 ↗(On Diff #55370)

These seem redundant now?

kib accepted this revision.Mar 24 2019, 3:46 PM
kib added inline comments.
sys/i386/i386/npx.c
355–377 ↗(On Diff #55370)

Why ? What do you mean by redundant ?

This revision is now accepted and ready to land.Mar 24 2019, 3:46 PM
cem added inline comments.Mar 25 2019, 12:05 AM
sys/i386/i386/npx.c
355–377 ↗(On Diff #55370)

It seems like fpusave and npxsave_core have identical ifunc dispatch, no? Why have both?

kib added inline comments.Mar 25 2019, 11:15 AM
sys/i386/i386/npx.c
355–377 ↗(On Diff #55370)

I see. Yes, it is redundant.

This revision was automatically updated to reflect the committed changes.