Page MenuHomeFreeBSD

in PTI case account for stack alignment adjustment while copying frames during nested fault
ClosedPublic

Authored by tychon on Apr 24 2018, 3:19 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 29, 7:00 PM
Unknown Object (File)
Nov 26 2024, 10:29 AM
Unknown Object (File)
Nov 19 2024, 2:40 PM
Unknown Object (File)
Nov 15 2024, 6:50 PM
Unknown Object (File)
Oct 16 2024, 6:53 AM
Unknown Object (File)
Sep 30 2024, 10:28 AM
Unknown Object (File)
Sep 30 2024, 10:28 AM
Unknown Object (File)
Sep 30 2024, 10:28 AM
Subscribers

Details

Summary

[While incorporating feedback I have received in D15100, I discovered this existing issue.]

If a trap is encountered upon executing iretq from within doreti() the hardware will align the stack pointer to a 16-byte boundary before saving the fault state on the stack.

In the PTI case, handle this potential alignment adjustment by copying both frames independently while unwinding the stack in between.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

So the existing code failed to copy the old %ss from the trampoline stack, to the regular stack, right ? And then it worked to return since old %ss was kept intact on the top of the trampoline.

This revision is now accepted and ready to land.Apr 25 2018, 12:47 PM

The current code works because the offset of pc_pti_stack in struct pcpu is such that:

        movq    PCPU(PRVSPACE),%rdx
        addq    $PC_PTI_STACK+PC_PTI_STACK_SZ*8-PTI_SIZE,%rdx
	...
        movq    %rdx,%rsp
        popq    %rdx
        popq    %rax
        addq    $8,%rsp

leaves the stack 16-byte aligned and hence the two trap frames are sequential in memory. However, if struct pcpu is modified in a way that shifts the offset of pc_pti_stack by 8 bytes then the stack pointer will no longer be 16-byte aligned after executing the snippet above. Should iretq fail in this case the stack pointer will get a hardware adjustment and an 8-byte gap occurs between the two trap frames. Not only will MOVE_STACK truncate the frame during the copy but everything will be offset by 8-bytes. Then trap() will subsequently misinterpret the faulting frame.

I discovered this because in D15100, I'm introducing a top-of-PTI-stack field into pcpu which I had aligned to 16-bytes. Changing the code to use the new field:

	-  movq	   PCPU(PRVSPACE),%rdx
	-  addq	   $PC_PTI_STACK+PC_PTI_STACK_SZ*8-PTI_SIZE,%rdx
	+  movq	   PCPU(PTI_RSP0),%rdx
	+  subq	   $PTI_SIZE,%rdx

was causing a kernel panic rather than delivering a SIGBUS to the offending program.

Thanks also for sharing your test program! I ended up reling on a tweaking to the ucontext_t too:

#include <ucontext.h>

int
main(void)
{
	ucontext_t context;
	getcontext(&context);
	context.uc_mcontext.mc_rip = 0x8000000000000000; /* non-canonical */
	setcontext(&context);
	return (0);
}
This revision was automatically updated to reflect the committed changes.