Page MenuHomeFreeBSD

Protect calls to explicit_bzero() via by explicitly disabling the link-time and other optimizations that can cause code elimination.
Needs ReviewPublic

Authored by jim_netgate.com on Jan 17 2016, 4:07 PM.
Tags
None
Referenced Files
F81548069: D4964.diff
Wed, Apr 17, 9:04 PM
Unknown Object (File)
Mon, Apr 8, 7:57 PM
Unknown Object (File)
Sun, Apr 7, 2:45 AM
Unknown Object (File)
Sat, Apr 6, 12:13 PM
Unknown Object (File)
Sun, Mar 24, 10:32 PM
Unknown Object (File)
Sun, Mar 24, 8:31 PM
Unknown Object (File)
Mar 18 2024, 11:19 AM
Unknown Object (File)
Mar 16 2024, 8:31 AM

Details

Reviewers
theraven
imp
gnn
rwatson
gordon
Group Reviewers
secteam
Summary

There is an issue where calls to bzero (memset(), etc) can be eliminated due to an optimizing compiler eliminating the call to bzero() (or memset(), etc) because the arguments to the call are not subsequently used by the function. The compiler can interpret this as "no side effects", and eliminate the call.

The origin source of issue to being brought to light with a 'security focus' is here: http://cwe.mitre.org/data/definitions/14.html

OpenBSD implemented explicit_bzero() as a response (over a decade after the report) in OpenBSD 5.5 (released May 1, 2014).
http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/bzero.3?query=explicit%5fbzero&arch=i386

The implementation in OpenBSD is here:
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/lib/libkern/explicit_bzero.c?rev=1.3&content-type=text/x-cvsweb-markup
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/lib/libkern/bzero.c?rev=1.9&content-type=text/x-cvsweb-markup
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/lib/libkern/memset.c?rev=1.7&content-type=text/x-cvsweb-markup

FreeBSD subsequently copied this implementation.
https://github.com/freebsd/freebsd/blob/e79c62ff68fc74d88cb6f479859f6fae9baa5101/crypto/openssh/openbsd-compat/explicit_bzero.c
https://github.com/freebsd/freebsd/blob/e79c62ff68fc74d88cb6f479859f6fae9baa5101/sys/libkern/explicit_bzero.c

I believe both implementations are flawed.

Link Time Optimization (LTO) is a problem for several implementations of explicit_bzero(). LTO enabled today with clang, it also works with several versions of GCC, and only the absence of WITH_LLD_IS_LLD=yes in src.conf is keeping LTO from being enabled in the linker with:

CFLAGS = -O <any level) -flto
LDFLAGS += -fuse-ld=lld

http://llvm.org/docs/LinkTimeOptimization.html
http://llvm.org/docs/GoldPlugin.html
https://wiki.freebsd.org/LinkTimeOptimisations <--- page is out of date

This style of implementation in FreeBSD (and by extension OpenBSD) today is flawed, because the compiler/linker is free to look into the
call tree and still eliminate the call. The issue shows up on LTO (today), but is not limited to systems where LTO is in-use.

This exact scenario happened, on FreeBSD, (and OpenBSD): https://github.com/libressl-portable/openbsd/issues/5
(note that someone tested on OpenBSD 5.5 with GCC 4.8.2 from ports, and developed a "patch" that depends on mfence (compiler side-effects):
https://github.com/libressl-portable/openbsd/issues/5#issuecomment-50775260)

OpenBSD has not implemented this patch.

If you look at OPENSSL_cleanse() in boringssl, there is a fix for the problem which again, leverages a mfence:
https://boringssl.googlesource.com/boringssl/+/ad1907fe73334d6c696c8539646c21b11178f20f/crypto/mem.c

I have prepared and attached a patch for FreeBSD that implements the same memory barrier, solving the problem in a more portable and safe manner.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

jim_netgate.com retitled this revision from to Protect calls to explicit_bzero() via by explicitly disabling the link-time and other optimizations that can cause code elimination..
jim_netgate.com updated this object.
jim_netgate.com edited the test plan for this revision. (Show Details)
jim_netgate.com added reviewers: gnn, rwatson, theraven.
jim_netgate.com edited edge metadata.
jim_netgate.com removed rS FreeBSD src repository - subversion as the repository for this revision.

eliminate the changes to release/amd64/make-memstick.sh

imp requested changes to this revision.Jan 17 2016, 6:48 PM
imp added a reviewer: imp.

I'd prefer we don't ditch support for the in-tree gcc compiler to get this. See my suggestions on how to accomplish that.
Sadly, there doesn't appear to be a #define for when the LTO linker is used, I guess because that's link time and not
run time.

sys/libkern/explicit_bzero.c
16–21

I'd move this to after the clang / gcc block that you aded below. I'd condition it on #ifndef NOOPT.
I'd also include the explicit_bzero definition from below.
This will support old compilers that don't do LTO as well as new that can / do.

25–26

I'd ditch the #error stuff here and below.

44–53

I'd be tempted to do:

#ifdef NOOPT
NOOPT NOLINE void
explicit_bzero(void *buf, size_t len)
{
<body>
}
#endif

This revision now requires changes to proceed.Jan 17 2016, 6:48 PM
jim_netgate.com edited edge metadata.

I've attempted to do what I think Warner asked for here.

Fallback (via conditional compilation) is now the old code, though I think this is still risky code.

gnn edited edge metadata.

I'm down with this but I will also push the other reviewers to take a look before committing.

Hi Jim,

I don't think that this approach will work, because noinline does not do what you might expect. The compiler will not inline the function, but it can still look inside it. In particular, it will determine that it does not read or write any memory other than the arguments and propagate these attributes to anything that calls it. This will happen even with the ThinLTO model that Google is slowly upstreaming to LLVM.

In particular, if this function is called immediately before a free(), then the compiler is free to elide it and leave the sensitive data in the free list.

Note that the version using memory barriers can be made portable by using the relevant memory barrier function (memory order: sequentially consistent) from <stdatomic.h>.

The simplest way to prevent the compiler eliding the stores would be to have an extern volatile void* variable (defined in an assembly file, so that the compiler can never see it) and so assign the buf variable to it before doing the bzero. The compiler is not permitted to elide volatile stores to variables that it can not guarantee that it can see the entire lifetime of, because they may be device memory. This will guarantee that the compiler believes that the pointer has escaped and so it then may not elide any stores to it. This does not establish a happens-before relationship with any other threads, so the compiler should still be able to emit efficient code for everything.

Hi Jim,

I don't think that this approach will work, because noinline does not do what you might expect. The compiler will not inline the function, but it can still look inside it. In particular, it will determine that it does not read or write any memory other than the arguments and propagate these attributes to anything that calls it. This will happen even with the ThinLTO model that Google is slowly upstreaming to LLVM.

In particular, if this function is called immediately before a free(), then the compiler is free to elide it and leave the sensitive data in the free list.

Note that the version using memory barriers can be made portable by using the relevant memory barrier function (memory order: sequentially consistent) from <stdatomic.h>.

The simplest way to prevent the compiler eliding the stores would be to have an extern volatile void* variable (defined in an assembly file, so that the compiler can never see it) and so assign the buf variable to it before doing the bzero. The compiler is not permitted to elide volatile stores to variables that it can not guarantee that it can see the entire lifetime of, because they may be device memory. This will guarantee that the compiler believes that the pointer has escaped and so it then may not elide any stores to it. This does not establish a happens-before relationship with any other threads, so the compiler should still be able to emit efficient code for everything.

A key design goal throughout this code is avoiding the use of globals that will introduce cache misses / contention. Is there a way to accomplish this goal without introducing an extraneous global write? (I don't think we can mark local variables as extern? Or can we?)

Not really. The really correct solution is to introduce a __builtin_secure_bzero() that the compiler knows it is now allowed to elide. Anything else is trying to trick the compiler into thinking that it doesn't know what is happening.

Another good solution would be to provide a free-and-erase function as part of jemalloc (and make sure that it calls a function that free calls, not free itself.

Hi Jim,

I don't think that this approach will work, because noinline does not do what you might expect. The compiler will not inline the function, but it can still look inside it. In particular, it will determine that it does not read or write any memory other than the arguments and propagate these attributes to anything that calls it. This will happen even with the ThinLTO model that Google is slowly upstreaming to LLVM.

Hi David,

The real work is done by "optnone".

http://clang.llvm.org/docs/AttributeReference.html#optnone-clang-optnone

The "noinline" is there to turn off inlining, because inlining ("always_inline") is incompatible with optnone.

clang's VerifyFunctionAttrs() (in ./contrib/llvm/lib/IR/Verifier.cpp) requires that noinline be 'on' when optnone is used.

Please let me know if you think 'optnone' isn't workable.

Note that the version using memory barriers can be made portable by using the relevant memory barrier function (memory order: sequentially consistent) from <stdatomic.h>.

If memory barriers is your preferred approach, then I will rework the patch.

The simplest way to prevent the compiler eliding the stores would be to have an extern volatile void* variable (defined in an assembly file, so that the compiler can never see it) and so assign the buf variable to it before doing the bzero. The compiler is not permitted to elide volatile stores to variables that it can not guarantee that it can see the entire lifetime of, because they may be device memory. This will guarantee that the compiler believes that the pointer has escaped and so it then may not elide any stores to it. This does not establish a happens-before relationship with any other threads, so the compiler should still be able to emit efficient code for everything.

This also seems workable.

This revision now requires changes to proceed.Jun 10 2017, 5:03 PM
wblock added inline comments.
lib/libc/string/bzero.3
65 ↗(On Diff #12392)
variant is the same, but has flags to force the call to occur even
if the compiler thinks it is pointless.
This makes
.Fn explicit_bzero
useful for clearing memory with sensitive contents like a password.
sys/libkern/explicit_bzero.c
27

Does this do anything? Can we compile the kernel with a compiler that doesn't support GNU C?

39

I'm not sure that NOOPT is needed here. Even turning this into a memset / bzero call should be fine for the compiler to do, if this function is not inlined. It should also be safe for LLVM to replace a call to explicit_bzero with an LLVM memset intrinsic with the volatile flag set, and so generate an optimised memset sequence that can't be removed.

jim_netgate.com edited the summary of this revision. (Show Details)

We have LTO in base now. The day of reckoning has come.

tl;dr:

jim@x540:~/explict_bzero_tests % sysctl kern.version
kern.version: FreeBSD 12.0-CURRENT #0 r328212M: Sun Jan 21 05:20:18 CST 2018

jim@x540:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG

jim@x540:~/explict_bzero_tests % make clean;make
rm explicit_bzero_test.o explicit_bzero_test
cc -c -O -DELF_HOOK_IMPL=1 -flto explicit_bzero_test.c
cc -fuse-ld=lld -o explicit_bzero_test explicit_bzero_test.o
jim@x540:~/explict_bzero_tests % ./explicit_bzero_test
Assertion failed: ((count_secrets(buf)) == (0)), function do_test_with_bzero, file explicit_bzero_test.c, line 234.
Abort (core dumped)
jim@x540:~/explict_bzero_tests %

No optimization (just -g instead of -O) or no LTO and things still work.

But, save for WITH_LLD_IS_LLD=yes, we don’t compile the kernel or that way.

@jim_netgate.com, does this trick work for both clang and gcc (both base and newer)? If so, I think this would be fine to commit. explicit_bzero could even be a macro! (And there might be a corresponding explicit_memset too, but I digress.)

Now that memset_s is part of the standard, should we just make explicit_bzero a wrapper for memset_s?

I don't know enough to review this request. Maybe delphij, emaste or one of the other secteam members has more domain experience here.

So, spent a few minutes looking into this, my testing was done on:

FreeBSD gate4 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n255198-1907e1c07c3: Thu May  5 07:52:56 UTC 2022     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

It is likely that my program may be too simple to reproduce, but I was unable to get either explicit_bzero or memset_s calls to be eliminated. It is entirely possible I am calling/compiling things incorrectly.

The program:

#define USE_EXPLICIT 1

#include <stdint.h>
#include <stdio.h>
#if USE_EXPLICIT
#include <strings.h>
#else
#include <string.h>
#endif
#include <unistd.h>

int
main(void)
{
        uint8_t buf[32];
        int i;

        getentropy(buf, sizeof buf);

        for (i = 0; i < sizeof buf; i++)
                printf("%02hhx", buf[i]);

        printf("\n");

#if USE_EXPLICIT
        explicit_bzero(buf, sizeof buf);
#else
        memset_s(buf, sizeof buf, 0, sizeof buf);
#endif
}

The compile command:

cc -static -O3 -flto -fuse-ld=lld -Wall -Werror -o t t.c

I also tried w/o -static, but no difference, I was hoping/thinking that w/ a static binary, where all the code was present, that LTO would be more likely to eliminate calls.

I verified that the calls were not removed by gdb, using disass main and related functions.

Note that -flto doesn't actually LTO system libraries, because we do not yet provide LLVM IR libraries. If you put an explicit_bzero implementation in your test code though you can get it to be eliminated. As it stands today for normal use (linking against libc as usual) it will not be eliminated.