Page MenuHomeFreeBSD

Implement enforcing write XOR execute mapping policy.
ClosedPublic

Authored by kib on Fri, Jan 8, 10:47 PM.

Details

Summary

It is checked in vm_map_insert() and vm_map_protect() that PROT_WRITE | PROT_EXEC are never specified together, if vm_map has MAP_WX flag set. FreeBSD control flag allows specific binary to request WX exempt, and there are per ABI boolean sysctls kern.elf{32,64}.allow_wx to enable/disable globally.

For vm_map_protect(), only max_prot is checked against policy. It is harmless and actually might be useful to not punish apps that apply disallowed permissions on holes and guards.

This is basically a rewrite of D24933 in the way I think it should be done.

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

kib requested review of this revision.Fri, Jan 8, 10:47 PM
kib created this revision.
sys/kern/kern_exec.c
1077–1078

ASLR and W^X state maybe?

sys/vm/vm_map.h
230

I wonder if MAP_NO_WX or MAP_WXORX is more descriptive? allow_wx above uses wx in the sense of W and X

kib marked 2 inline comments as done.

Rename MAP_WX to MAP_WXORX.
Stop doing set_max check in vm_map_protect(), it does not work.
Update comment.

Hm, ctfmerge hangs in cond_wait_common_thr_umtx_timedwait_uint_umtx_op_err
(I think this didn't happen with the previous W^X variant enabled but I'm not 100% sure)

Right in between opening files it tries to RWX MAP_STACK:

47903: mmap(0x0,4096,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34395492352 (0x802219000)
47903: mmap(0x7fffdfdfd000,2101248,PROT_READ|PROT_WRITE|PROT_EXEC,MAP_STACK,-1,0x0) ERR#13 'Permission denied'
47903: mmap(0x7fffdfbfc000,2101248,PROT_READ|PROT_WRITE|PROT_EXEC,MAP_STACK,-1,0x0) ERR#13 'Permission denied'
47903: mmap(0x7fffdf9fb000,2101248,PROT_READ|PROT_WRITE|PROT_EXEC,MAP_STACK,-1,0x0) ERR#13 'Permission denied'

I even have https://github.com/emaste/freebsd/commit/6b916fe1a1ef0ebd59196c35080c75d8f5c974aa applied.

Searching for MAP_STACK in base finds:

lib/libthr/thread/thr_stack.c
277:                 _rtld_get_stack_prot(), MAP_STACK,

Which would do RWX:

libexec/rtld-elf/amd64/rtld_machdep.h
72:#define      RTLD_DEFAULT_STACK_EXEC         PROT_EXEC

libexec/rtld-elf/rtld.c
263:static int stack_prot = PROT_READ | PROT_WRITE | RTLD_DEFAULT_STACK_EXEC;

(plus there's the dlfcn.c libc version of _rtld_get_stack_prot)

Even though there's no setstack anything in cddl/contrib, changing both rtld and libc to not return EXEC here does seem to fix it \o/
(after cleaning /usr/obj because the kernel build uses a statically-linked ctfmerge, not the system one)

Soooo I guess this patch might need to allow WX on MAP_STACK for now?
(if non-exec stacks won't be merged before W^X)

In other words, new version works as advertised.

No, kernel should not do any exceptions, rather the binary should be marked with the flag.

We also might consider disabling WX for a binary which stack requires X, but I am not sure about this, might be this should be an additional option. But definitely there should be no runtime knob to allow the binary to manipulate its own WX status.

The code of the binary itself doesn't require X stack, but the attempt to map stack as X somehow comes from libthr anyway for no good reason.
Anyway I just made these stacks non-executable too in my tree, so this is just notes for upstream about what things conflict with W^X.

The code of the binary itself doesn't require X stack, but the attempt to map stack as X somehow comes from libthr anyway for no good reason.
Anyway I just made these stacks non-executable too in my tree, so this is just notes for upstream about what things conflict with W^X.

There cannot be 'no good reason'. The stack protection is OR of PT_GNU_STACK p_flags for all loaded objects. There could be bugs of course, but it is not 'out of thin air'. In other words, first thing to investigate is what shared objects are loaded and what is their PT_GNU_STACK flags.

Looking at program headers I see all binaries and shared libs on my system have a PT_GNU_STACK with RW permissions.

(plus there's the dlfcn.c libc version of _rtld_get_stack_prot)

Ah, so there is.

#pragma weak _rtld_get_stack_prot
int
_rtld_get_stack_prot(void)
{

        return (PROT_EXEC | PROT_READ | PROT_WRITE);
}

To fix this properly we need to iterate over the phdr and set PROT_EXEC appropriately.
I've submitted PR252549 for this. (But I'm just going to drop PROT_EXEC locally for now.)

LGTM, but not yet tested

sys/kern/imgact_elf.c
195

Do you think "Allow pages to be mapped simultaneously writable and executable" (adding simultaneously) makes it more clear?

This revision is now accepted and ready to land.Sun, Jan 10, 3:54 AM
kib marked an inline comment as done.Sun, Jan 10, 3:59 AM
kib added inline comments.
sys/kern/imgact_elf.c
195

I updated the string in my branch, will update phab if more substantial changes come out.

my hacky W+X thread stack demo (which tries to execute an INT3 on the stack) confirmed D28075 works, cc -Wall -g -Os -static -lpthread -o exec-stack exec-stack.c previously demonstrated that the thread stack was incorrectly RWX. Wtih D28075 the stack is RW and the test case segfaulted as expected. With D28075 building the demo with -Wl,-zexecstack gives RWX stack as desired.

for the non-executable RW stack case lldb reports:

* thread #2, name = 'exec-stack', stop reason = signal SIGSEGV: address access protected (fault address: 0x7fffdfffdfb7)
    frame #0: 0x00007fffdfffdfb7
->  0x7fffdfffdfb7: int3   
    0x7fffdfffdfb8: addb   %al, 0x80061(%rbp)
    0x7fffdfffdfbe: addb   %al, (%rax)
    0x7fffdfffdfc0: lock

With this D28050 change applied and allow_wx=1 both demo cases (with and without -Wl,-zexecstack) behave as expected.

With allow_wx=0 the RW stack case still segfaults as expected and now the RWX case aborts at start:

$ cc -Wall -g -Os -Wl,-zexecstack -static -lpthread -o exec-stack exec-stack.c
$ ./exec-stack
Abort trap

Attempting to debug is unsuccessful:

$ lldb ./exec-stack
(lldb) target create "./exec-stack"
Current executable set to '/home/emaste/src/freebsd-git/main/exec-stack' (x86_64).
(lldb) run
Assertion failed: (WIFSTOPPED(status) && wpid == (::pid_t)pid && "Could not sync with inferior process."), function Launch, file /usr/home/emaste/src/freebsd-git/head/contrib/llvm-project/lldb/source/Plugins/Process/FreeBSD/ProcessMonitor.cpp, line 930.
PLEASE submit a bug report to https://bugs.freebsd.org/submit/ and include the crash backtrace.
#0 0x0000000004444ade PrintStackTrace /usr/home/emaste/src/freebsd-git/head/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:564:13
#1 0x0000000004442e11 RunSignalHandlers /usr/home/emaste/src/freebsd-git/head/contrib/llvm-project/llvm/lib/Support/Signals.cpp:69:18
#2 0x0000000004445335 SignalHandler /usr/home/emaste/src/freebsd-git/head/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:0:3
#3 0x0000000805299b20 handle_signal /usr/home/emaste/src/freebsd-git/head/lib/libthr/thread/thr_sig.c:0:3
Abort trap (core dumped)

$ gdb ./exec-stack
GNU gdb (GDB) 9.2 [GDB v9.2 for FreeBSD]
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd13.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./exec-stack...
(gdb) run
Starting program: /usr/home/emaste/src/freebsd-git/head/exec-stack 
/wrkdirs/usr/ports/devel/gdb/work-py37/gdb-9.2/gdb/target.c:2187: internal-error: void target_mourn_inferior(ptid_t): Assertion `ptid == inferior_ptid' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) 
kib marked an inline comment as done.Sun, Jan 10, 10:23 PM

Abort trap is expected, it is reported ('faked') by kernel when image activator generates some error. In this case, it cannot create initial process stack because stack is rwx and rwx mappings are disabled.

Debugger is also appears confused when debugee disappears without notice.

In D28050#627596, @kib wrote:

Abort trap is expected, it is reported ('faked') by kernel when image activator generates some error. In this case, it cannot create initial process stack because stack is rwx and rwx mappings are disabled.

Debugger is also appears confused when debugee disappears without notice.

Having just debugged a local bug in a branch in exec_new_vmspace() that gives similar behavior (Abort), it might be nice for this case if we did a uprintf during exec of the actual error, something like "Failed to create initial stack during exec" when the routines to map the stack in exec_new_vmspace() fail might be sufficient to give slightly nicer user-facing behavior for this? The bare Abort is pretty hard for a user to reason about/debug.

sys/vm/vm_map.c
2757

I do think it would be fairly simple to only do this for !set_max to exempt changes to max_prot from this. As it is, the current code allows PROT_MAX(RWX) for mmap() but not for mprotect() which I think is a bit odd. IIRC, mprotect() changes the non-max prot first, so a W^X mprotect() failure would not result in a partial update where only max_prot was changed but not prot.

Add uprintf on stack mapping failure (this is separate commit).

This revision now requires review to proceed.Mon, Jan 11, 6:56 PM
sys/vm/vm_map.c
2757

kern_mprotect() first does set_max call for max_prot, and then !set_max for active prot. But I do not believe that vm_map_protect() allows to change map entry protection to larger than was initially created. Look below for lines 2789-2791.

Otherwise e.g. you could map readonly file, and then change protection on the mapping.

In D28050#627877, @jhb wrote:

it might be nice for this case if we did a uprintf during exec of the actual error

Yeah, I had similar trouble when looking at D10758, I think it's a useful endeavor independent of this work.

sys/vm/vm_map.c
2757

Yes, agreed

sys/vm/vm_map.c
2757

And more, I believe that kern_mprotect() should be rewritten, that is vm_map_protect() should take whole prot arg and handle both max_prot and current prot in one locking pass, and in single check to avoid inconsistencies.

sys/vm/vm_map.c
2757

But we should be able to set_max with RWX even with allow_wx=0

emaste added inline comments.
sys/vm/vm_map.c
2757

vm_map_protect() should take whole prot arg and handle both max_prot and current prot in one locking pass

I see - I suspect @brooks wanted to avoid changing the vm_map_protect interface in D18880.
Should we change vm_map_protect or introduce a new interface (and leave vm_map_protect as wrapper around it)?

sys/vm/vm_map.c
2757

I think there is no reason not to change vm_map_protect(). I will look at it shortly, but I do not believe this is in scope of this review.

This looks promising. I would like to test it out, but I think it looks good.

sys/vm/vm_map.c
2757

I think there is no reason not to change vm_map_protect(). I will look at it shortly, but I do not believe this is in scope of this review.

I agree that it would be nicer to fix vm_map_protect(). I am fine with having that in a separate review. I do think we want the eventual state to be that prot_max isn't subject to the MAP_WXORX, but I'm ok with fixing that up later as part of fixing vm_map_protect().

This revision is now accepted and ready to land.Mon, Jan 11, 10:17 PM
sys/kern/kern_exec.c
1082

The flag is also not preserved by vmspace_fork(). Is it intentional?

sys/vm/vm_map.c
2757

In regards to prot_max, I agree that it should not be subject to MAP_WXORX. Imagine a JIT compilation system that enables write access (but not execute access) during compilation, and then when compilation is finished removes write access and enables execute access. To support this, prot_max has to allow both write and execute access even if we don't allow the prot field to have write and execute access simultaneously enabled.