There is an issue where calls to bzero (memset(), etc) can be eliminated due to an optimizing compiler eliminating the call to bzero() (or memset(), etc)
because the arguments to the call are not subsequently used by the function. The compiler can interpret this as "no side effects", and eliminate the call.
The origin source of issue to being brought to light with a 'security focus' is here: http://cwe.mitre.org/data/definitions/14.html
OpenBSD implemented explicit_bzero() as a response (over a decade after the report) in OpenBSD 5.5 (released May 1, 2014).
The implementation in OpenBSD is here:
FreeBSD subsequently copied this implementation.
I believe both implementations are flawed.
Link Time Optimization (LTO) is a problem for several implementations of explicit_bzero(). While LTO isn't implemented today with clang, it does work with
several versions of GCC.
When LTO is supported with clang, the issue will propagate to systems compiled with clang, as well. LTO is coming to clang:
This style of implementation in FreeBSD (and by extension OpenBSD) today is flawed, because the compiler/linker is free to look into the
call tree and still eliminate the call. The issue shows up on LTO (today), but is not limited to systems where LTO is in-use.
This exact scenario happened, on FreeBSD, (and OpenBSD): https://github.com/libressl-portable/openbsd/issues/5
(note that someone tested on OpenBSD 5.5 with GCC 4.8.2 from ports, and developed a "patch" that still depends on mfence (compiler side-effects):
OpenBSD has not implemented this patch.
If you look at OPENSSL_cleanse() in boringssl, there is a fix for the problem which again, leverages a mfence:
The first problem with the approach of using an mfence is that it is limited to supporitng the i386 and amd64 architectures. Other architectures will support
similar mechansims, but discovering and understanding these for each architecture will be tiresome.
The second problme is that the compiler barrier does not prevent the bzero/memset from being optimized out unless the address of the buffer being memset has been leaked
to code the compiler cannot see. As long as the compiler 'sees' that the asm has no way of observing the output of the memset, it can optimize out the memset.
Simply making the memset buffer visible to the asm by passing its address (or better yet, it as a memory object) in an asm constraint would probably fix this,
but I'd like to have someone from the GCC side confirm this.
In any case, I believe there is a better way.
Microsoft addresses the problem by turning off LTO for the function in-question. See: https://msdn.microsoft.com/en-us/library/aa366877.aspx
The comment at the bottom of the page is apt.
"Security explanation -- The only way that this differs from normal ZeroMemory is that it has correct flags/declaration differences/garden gnomes to
force the call to happen no matter if the compiler thinks that it is pointless or not."
Both gcc and clang support attributes that turn off the optimization that causes the issue on a per-function basis. Based on the code found here:
I have prepared and attached a patch for FreeBSD that leverages these attributes, solving the problem in a more portable and safe manner.