Page MenuHomeFreeBSD

Fix handling of the segment registers on i386.
ClosedPublic

Authored by kib on Sep 17 2017, 9:14 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Dec 7, 6:06 PM
Unknown Object (File)
Sun, Dec 1, 9:25 PM
Unknown Object (File)
Sat, Nov 30, 4:46 AM
Unknown Object (File)
Sat, Nov 30, 3:29 AM
Unknown Object (File)
Oct 2 2024, 10:25 PM
Unknown Object (File)
Sep 16 2024, 8:24 PM
Unknown Object (File)
Sep 11 2024, 2:33 AM
Unknown Object (File)
Sep 10 2024, 9:59 PM
Subscribers

Details

Summary

Suppose that userspace is executing with the non-standard segment descriptors. Then, until exception or interrupt handler executed SET_KERNEL_SEGS, kernel is still executing with user %ds, %es and %fs. If an interrupt occurs in this window, the interrupt handler is executed unsafely. If interrupt results in the context switch, the contamination of the kernel state spreads to the newly switched thread. As result, kernel data accesses might fault or worse, if only base is changed, completely messed up.

More, if the user segment was allocated in LDT, another thread might mark the descriptor as invalid before doreti code tried to reload them. In this case kernel panics.

The issue exists for all exception entry points which use trap gate, and thus do not automatically disable interrupts on entry, and for lcall_handler.

Fix is two fold: first, we need to disable interrupts for all kernel entries, changing the IDT descriptor types from trap gate to interrupt gate. Interrupts are re-enabled not earlier than the kernel segments are loaded into the segment registers. Second, we only load the segment registers from the trap frame when returning to usermode. For the later, all interrupt return paths must happen through the doreti common code.

There is no way to disable interrupts on call gate, which is the supposed mode of operation for lcall $7,$0 syscalls. Change the LDT entry 0 into code segment type and point it to the userspace trampoline which redirects the syscall to int $0x80.

All the measures make the segment register handling similar to that of amd64. We do not apply amd64 optimizations of not reloading segment registers on return from the syscall.

Reported by: Maxime Villard <max@m00nbsd.net>
Tested by: pho

Test Plan

Test program to illustrate the race with LDT: https://gist.github.com/kostikbel/6353128c10c8344ea4292bd44716b3b7

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib edited the summary of this revision. (Show Details)
kib edited the summary of this revision. (Show Details)

Tested OK on i386 with all stress2 tests run.

kib edited the test plan for this revision. (Show Details)
kib added a reviewer: jhb.
sys/i386/i386/locore.s
341 ↗(On Diff #33171)

Do you need the special handling for SYS_vfork? The lcall in KERNCALL will still push the EIP value of the instruction after 'lcall' so the normal path would just return to the 'jb'? __sys_vfork would then still jump to *%ecx?

sys/i386/i386/locore.s
341 ↗(On Diff #33171)

Yes, I do need. The return frame is destroyed by child so if the child path after vfork() uses the stack, parent return frame contains the garbage.

For the same reason libc provides an asm stub for vfork.

jhb added inline comments.
sys/i386/i386/locore.s
341 ↗(On Diff #33171)

Ok. It might be worth adding a comment here as to why vfork is special in this regard.

This revision is now accepted and ready to land.Sep 18 2017, 7:20 PM
This revision was automatically updated to reflect the committed changes.