Change Details

There is an issue where calls to bzero (memset(), etc) can be eliminated due to an optimizing compiler eliminating the call to bzero() (or memset(), etc) because the arguments to the call are not subsequently used by the function. The compiler can interpret this as "no side effects", and eliminate the call. The origin source of issue to being brought to light with a 'security focus' is here: http://cwe.mitre.org/data/definitions/14.html OpenBSD implemented explicit_bzero() as a response (over a decade after the report) in OpenBSD 5.5 (released May 1, 2014). http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/bzero.3?query=explicit%5fbzero&arch=i386 The implementation in OpenBSD is here: http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/lib/libkern/explicit_bzero.c?rev=1.3&content-type=text/x-cvsweb-markup http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/lib/libkern/bzero.c?rev=1.9&content-type=text/x-cvsweb-markup http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/lib/libkern/memset.c?rev=1.7&content-type=text/x-cvsweb-markup FreeBSD subsequently copied this implementation. https://github.com/freebsd/freebsd/blob/e79c62ff68fc74d88cb6f479859f6fae9baa5101/crypto/openssh/openbsd-compat/explicit_bzero.c https://github.com/freebsd/freebsd/blob/e79c62ff68fc74d88cb6f479859f6fae9baa5101/sys/libkern/explicit_bzero.c I believe both implementations are flawed. Link Time Optimization (LTO) is a problem for several implementations of explicit_bzero(). While LTO isn't implemented today with clang, it does work with several versions of GCC. When LTO is supported with clang, the issue will propagate to systems compiled with clang, as well. LTO is coming to clang: http://llvm.org/docs/LinkTimeOptimization.html http://llvm.org/docs/GoldPlugin.html This style of implementation in FreeBSD (and by extension OpenBSD) today is flawed, because the compiler/linker is free to look into the call tree and still eliminate the call. The issue shows up on LTO (today), but is not limited to systems where LTO is in-use. This exact scenario happened, on FreeBSD, (and OpenBSD): https://github.com/libressl-portable/openbsd/issues/5 (note that someone tested on OpenBSD 5.5 with GCC 4.8.2 from ports, and developed a "patch" that still depends on mfence (compiler side-effects): https://github.com/libressl-portable/openbsd/issues/5#issuecomment-50775260) OpenBSD has not implemented this patch. If you look at OPENSSL_cleanse() in boringssl, there is a fix for the problem which again, leverages a mfence: https://boringssl.googlesource.com/boringssl/+/ad1907fe73334d6c696c8539646c21b11178f20f/crypto/mem.c The first problem with the approach of using an mfence is that it is limited to supporitng the i386 and amd64 architectures. Other architectures will support similar mechansims, but discovering and understanding these for each architecture will be tiresome. The second problme is that the compiler barrier does not prevent the bzero/memset from being optimized out unless the address of the buffer being memset has been leaked to code the compiler cannot see. As long as the compiler 'sees' that the asm has no way of observing the output of the memset, it can optimize out the memset. Simply making the memset buffer visible to the asm by passing its address (or better yet, it as a memory object) in an asm constraint would probably fix this, but I'd like to have someone from the GCC side confirm this. In any case, I believe there is a better way. Microsoft addresses the problem by turning off LTO for the function in-question. See: https://msdn.microsoft.com/en-us/library/aa366877.aspx The comment at the bottom of the page is apt. "Security explanation -- The only way that this differs from normal ZeroMemory is that it has correct flags/declaration differences/garden gnomes to force the call to happen no matter if the compiler thinks that it is pointless or not." Both gcc and clang support attributes that turn off the optimization that causes the issue on a per-function basis. Based on the code found here: https://github.com/mikejsavage/lua-symmetric/blob/master/compat/safebfuns.c I have prepared and attached a patch for FreeBSD that leverages these attributes, solving the problem in a more portable and safe manner.

There is an issue where calls to bzero (memset(), etc) can be eliminated due to an optimizing compiler eliminating the call to bzero() (or memset(), etc) because the arguments to the call are not subsequently used by the function. The compiler can interpret this as "no side effects", and eliminate the call. The origin source of issue to being brought to light with a 'security focus' is here: http://cwe.mitre.org/data/definitions/14.html OpenBSD implemented explicit_bzero() as a response (over a decade after the report) in OpenBSD 5.5 (released May 1, 2014). http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/bzero.3?query=explicit%5fbzero&arch=i386 The implementation in OpenBSD is here: http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/lib/libkern/explicit_bzero.c?rev=1.3&content-type=text/x-cvsweb-markup http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/lib/libkern/bzero.c?rev=1.9&content-type=text/x-cvsweb-markup http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/lib/libkern/memset.c?rev=1.7&content-type=text/x-cvsweb-markup FreeBSD subsequently copied this implementation. https://github.com/freebsd/freebsd/blob/e79c62ff68fc74d88cb6f479859f6fae9baa5101/crypto/openssh/openbsd-compat/explicit_bzero.c https://github.com/freebsd/freebsd/blob/e79c62ff68fc74d88cb6f479859f6fae9baa5101/sys/libkern/explicit_bzero.c I believe both implementations are flawed. Link Time Optimization (LTO) is a problem for several implementations of explicit_bzero(). While LTO isn't implementenabled today with clang, it does work with several versions of GCC.also works with several versions of GCC, and only the absence of WITH_LLD_IS_LLD=yes in src.conf is keeping LTO from being enabled in the linker with: When LTO is supported with clang, the issue will propagate to systems compiled with clang, as well. LTO is coming to clang:CFLAGS = -O <any level) -flto LDFLAGS += -fuse-ld=lld http://llvm.org/docs/LinkTimeOptimization.html http://llvm.org/docs/GoldPlugin.html https://wiki.freebsd.org/LinkTimeOptimisations <--- page is out of date This style of implementation in FreeBSD (and by extension OpenBSD) today is flawed, because the compiler/linker is free to look into the call tree and still eliminate the call. The issue shows up on LTO (today), but is not limited to systems where LTO is in-use. This exact scenario happened, on FreeBSD, (and OpenBSD): https://github.com/libressl-portable/openbsd/issues/5 (note that someone tested on OpenBSD 5.5 with GCC 4.8.2 from ports, and developed a "patch" that still depends on mfence (compiler side-effects): https://github.com/libressl-portable/openbsd/issues/5#issuecomment-50775260) OpenBSD has not implemented this patch. If you look at OPENSSL_cleanse() in boringssl, there is a fix for the problem which again, leverages a mfence: https://boringssl.googlesource.com/boringssl/+/ad1907fe73334d6c696c8539646c21b11178f20f/crypto/mem.c The first problem with the approach of using an mfence is that it is limited to supporitng the i386 and amd64 architectures. Other architectures will support similar mechansims, but discovering and understanding these for each architecture will be tiresome. The second problme is that the compiler barrier does not prevent the bzero/memset from being optimized out unless the address of the buffer being memset has been leaked to code the compiler cannot see. As long as the compiler 'sees' that the asm has no way of observing the output of the memset, it can optimize out the memset. Simply making the memset buffer visible to the asm by passing its address (or better yet, it as a memory object) in an asm constraint would probably fix this, but I'd like to have someone from the GCC side confirm this. In any case, I believe there is a better way. Microsoft addresses the problem by turning off LTO for the function in-question. See: https://msdn.microsoft.com/en-us/library/aa366877.aspx The comment at the bottom of the page is apt. "Security explanation -- The only way that this differs from normal ZeroMemory is that it has correct flags/declaration differences/garden gnomes to force the call to happen no matter if the compiler thinks that it is pointless or not." Both gcc and clang support attributes that turn off the optimization that causes the issue on a per-function basis. Based on the code found here: https://github.com/mikejsavage/lua-symmetric/blob/master/compat/safebfuns.c I have prepared and attached a patch for FreeBSD that leverages these attributesI have prepared and attached a patch for FreeBSD that implements the same memory barrier, solving the problem in a more portable and safe manner.