Protect calls to explicit_bzero() via by explicitly disabling the link-time and other optimizations that can cause code elimination.
Needs ReviewPublic

Authored by jim_netgate.com on Jan 17 2016, 4:07 PM.

Details

Reviewers
theraven
imp
gnn
rwatson
Group Reviewers
manpages
Summary

There is an issue where calls to bzero (memset(), etc) can be eliminated due to an optimizing compiler eliminating the call to bzero() (or memset(), etc) because the arguments to the call are not subsequently used by the function. The compiler can interpret this as "no side effects", and eliminate the call.

The origin source of issue to being brought to light with a 'security focus' is here: http://cwe.mitre.org/data/definitions/14.html

OpenBSD implemented explicit_bzero() as a response (over a decade after the report) in OpenBSD 5.5 (released May 1, 2014).
http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/bzero.3?query=explicit%5fbzero&arch=i386

The implementation in OpenBSD is here:
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/lib/libkern/explicit_bzero.c?rev=1.3&content-type=text/x-cvsweb-markup
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/lib/libkern/bzero.c?rev=1.9&content-type=text/x-cvsweb-markup
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/lib/libkern/memset.c?rev=1.7&content-type=text/x-cvsweb-markup

FreeBSD subsequently copied this implementation.
https://github.com/freebsd/freebsd/blob/e79c62ff68fc74d88cb6f479859f6fae9baa5101/crypto/openssh/openbsd-compat/explicit_bzero.c
https://github.com/freebsd/freebsd/blob/e79c62ff68fc74d88cb6f479859f6fae9baa5101/sys/libkern/explicit_bzero.c

I believe both implementations are flawed.

Link Time Optimization (LTO) is a problem for several implementations of explicit_bzero(). LTO enabled today with clang, it also works with several versions of GCC, and only the absence of WITH_LLD_IS_LLD=yes in src.conf is keeping LTO from being enabled in the linker with:

CFLAGS = -O <any level) -flto
LDFLAGS += -fuse-ld=lld

http://llvm.org/docs/LinkTimeOptimization.html
http://llvm.org/docs/GoldPlugin.html
https://wiki.freebsd.org/LinkTimeOptimisations <--- page is out of date

This style of implementation in FreeBSD (and by extension OpenBSD) today is flawed, because the compiler/linker is free to look into the
call tree and still eliminate the call. The issue shows up on LTO (today), but is not limited to systems where LTO is in-use.

This exact scenario happened, on FreeBSD, (and OpenBSD): https://github.com/libressl-portable/openbsd/issues/5
(note that someone tested on OpenBSD 5.5 with GCC 4.8.2 from ports, and developed a "patch" that depends on mfence (compiler side-effects):
https://github.com/libressl-portable/openbsd/issues/5#issuecomment-50775260)

OpenBSD has not implemented this patch.

If you look at OPENSSL_cleanse() in boringssl, there is a fix for the problem which again, leverages a mfence:
https://boringssl.googlesource.com/boringssl/+/ad1907fe73334d6c696c8539646c21b11178f20f/crypto/mem.c

I have prepared and attached a patch for FreeBSD that implements the same memory barrier, solving the problem in a more portable and safe manner.

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped
jim_netgate.com retitled this revision from to Protect calls to explicit_bzero() via by explicitly disabling the link-time and other optimizations that can cause code elimination..
jim_netgate.com updated this object.
jim_netgate.com edited the test plan for this revision. (Show Details)
jim_netgate.com added reviewers: gnn, rwatson, theraven.
jim_netgate.com edited edge metadata.
jim_netgate.com removed rS FreeBSD src repository as the repository for this revision.

eliminate the changes to release/amd64/make-memstick.sh

imp requested changes to this revision.Jan 17 2016, 6:48 PM
imp added a reviewer: imp.

I'd prefer we don't ditch support for the in-tree gcc compiler to get this. See my suggestions on how to accomplish that.
Sadly, there doesn't appear to be a #define for when the LTO linker is used, I guess because that's link time and not
run time.

sys/libkern/explicit_bzero.c
16–21

I'd move this to after the clang / gcc block that you aded below. I'd condition it on #ifndef NOOPT.
I'd also include the explicit_bzero definition from below.
This will support old compilers that don't do LTO as well as new that can / do.

25–26

I'd ditch the #error stuff here and below.

44–53

I'd be tempted to do:

#ifdef NOOPT
NOOPT NOLINE void
explicit_bzero(void *buf, size_t len)
{
<body>
}
#endif

This revision now requires changes to proceed.Jan 17 2016, 6:48 PM
jim_netgate.com edited edge metadata.

I've attempted to do what I think Warner asked for here.

Fallback (via conditional compilation) is now the old code, though I think this is still risky code.

gnn accepted this revision.Jan 17 2016, 8:00 PM
gnn edited edge metadata.

I'm down with this but I will also push the other reviewers to take a look before committing.

theraven edited edge metadata.Jan 18 2016, 8:56 AM

Hi Jim,

I don't think that this approach will work, because noinline does not do what you might expect. The compiler will not inline the function, but it can still look inside it. In particular, it will determine that it does not read or write any memory other than the arguments and propagate these attributes to anything that calls it. This will happen even with the ThinLTO model that Google is slowly upstreaming to LLVM.

In particular, if this function is called immediately before a free(), then the compiler is free to elide it and leave the sensitive data in the free list.

Note that the version using memory barriers can be made portable by using the relevant memory barrier function (memory order: sequentially consistent) from <stdatomic.h>.

The simplest way to prevent the compiler eliding the stores would be to have an extern volatile void* variable (defined in an assembly file, so that the compiler can never see it) and so assign the buf variable to it before doing the bzero. The compiler is not permitted to elide volatile stores to variables that it can not guarantee that it can see the entire lifetime of, because they may be device memory. This will guarantee that the compiler believes that the pointer has escaped and so it then may not elide any stores to it. This does not establish a happens-before relationship with any other threads, so the compiler should still be able to emit efficient code for everything.

rwatson edited edge metadata.Jan 18 2016, 9:36 AM

Hi Jim,

I don't think that this approach will work, because noinline does not do what you might expect. The compiler will not inline the function, but it can still look inside it. In particular, it will determine that it does not read or write any memory other than the arguments and propagate these attributes to anything that calls it. This will happen even with the ThinLTO model that Google is slowly upstreaming to LLVM.

In particular, if this function is called immediately before a free(), then the compiler is free to elide it and leave the sensitive data in the free list.

Note that the version using memory barriers can be made portable by using the relevant memory barrier function (memory order: sequentially consistent) from <stdatomic.h>.

The simplest way to prevent the compiler eliding the stores would be to have an extern volatile void* variable (defined in an assembly file, so that the compiler can never see it) and so assign the buf variable to it before doing the bzero. The compiler is not permitted to elide volatile stores to variables that it can not guarantee that it can see the entire lifetime of, because they may be device memory. This will guarantee that the compiler believes that the pointer has escaped and so it then may not elide any stores to it. This does not establish a happens-before relationship with any other threads, so the compiler should still be able to emit efficient code for everything.

A key design goal throughout this code is avoiding the use of globals that will introduce cache misses / contention. Is there a way to accomplish this goal without introducing an extraneous global write? (I don't think we can mark local variables as extern? Or can we?)

Not really. The really correct solution is to introduce a __builtin_secure_bzero() that the compiler knows it is now allowed to elide. Anything else is trying to trick the compiler into thinking that it doesn't know what is happening.

Another good solution would be to provide a free-and-erase function as part of jemalloc (and make sure that it calls a function that free calls, not free itself.

Hi Jim,

I don't think that this approach will work, because noinline does not do what you might expect. The compiler will not inline the function, but it can still look inside it. In particular, it will determine that it does not read or write any memory other than the arguments and propagate these attributes to anything that calls it. This will happen even with the ThinLTO model that Google is slowly upstreaming to LLVM.

Hi David,

The real work is done by "optnone".

http://clang.llvm.org/docs/AttributeReference.html#optnone-clang-optnone

The "noinline" is there to turn off inlining, because inlining ("always_inline") is incompatible with optnone.

clang's VerifyFunctionAttrs() (in ./contrib/llvm/lib/IR/Verifier.cpp) requires that noinline be 'on' when optnone is used.

Please let me know if you think 'optnone' isn't workable.

Note that the version using memory barriers can be made portable by using the relevant memory barrier function (memory order: sequentially consistent) from <stdatomic.h>.

If memory barriers is your preferred approach, then I will rework the patch.

The simplest way to prevent the compiler eliding the stores would be to have an extern volatile void* variable (defined in an assembly file, so that the compiler can never see it) and so assign the buf variable to it before doing the bzero. The compiler is not permitted to elide volatile stores to variables that it can not guarantee that it can see the entire lifetime of, because they may be device memory. This will guarantee that the compiler believes that the pointer has escaped and so it then may not elide any stores to it. This does not establish a happens-before relationship with any other threads, so the compiler should still be able to emit efficient code for everything.

This also seems workable.

emaste added a subscriber: emaste.Jun 10 2017, 5:03 PM
This revision now requires changes to proceed.Jun 10 2017, 5:03 PM
wblock added a subscriber: wblock.Jul 14 2017, 4:16 PM
wblock added inline comments.
lib/libc/string/bzero.3
65 ↗(On Diff #12392)
variant is the same, but has flags to force the call to occur even
if the compiler thinks it is pointless.
This makes
.Fn explicit_bzero
useful for clearing memory with sensitive contents like a password.
theraven added inline comments.Jul 14 2017, 6:20 PM
sys/libkern/explicit_bzero.c
27

Does this do anything? Can we compile the kernel with a compiler that doesn't support GNU C?

39

I'm not sure that NOOPT is needed here. Even turning this into a memset / bzero call should be fine for the compiler to do, if this function is not inlined. It should also be safe for LLVM to replace a call to explicit_bzero with an LLVM memset intrinsic with the volatile flag set, and so generate an optimised memset sequence that can't be removed.

jim_netgate.com edited the summary of this revision. (Show Details)

We have LTO in base now. The day of reckoning has come.

tl;dr:

jim@x540:~/explict_bzero_tests % sysctl kern.version
kern.version: FreeBSD 12.0-CURRENT #0 r328212M: Sun Jan 21 05:20:18 CST 2018

jim@x540:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG

jim@x540:~/explict_bzero_tests % make clean;make
rm explicit_bzero_test.o explicit_bzero_test
cc -c -O -DELF_HOOK_IMPL=1 -flto explicit_bzero_test.c
cc -fuse-ld=lld -o explicit_bzero_test explicit_bzero_test.o
jim@x540:~/explict_bzero_tests % ./explicit_bzero_test
Assertion failed: ((count_secrets(buf)) == (0)), function do_test_with_bzero, file explicit_bzero_test.c, line 234.
Abort (core dumped)
jim@x540:~/explict_bzero_tests %

No optimization (just -g instead of -O) or no LTO and things still work.

But, save for WITH_LLD_IS_LLD=yes, we don’t compile the kernel or that way.

emaste added a subscriber: dim.Mar 29 2018, 5:39 PM
dim added a comment.Mar 30 2018, 12:26 AM

@jim_netgate.com, does this trick work for both clang and gcc (both base and newer)? If so, I think this would be fine to commit. explicit_bzero could even be a macro! (And there might be a corresponding explicit_memset too, but I digress.)