See https://sourceware.org/bugzilla/show_bug.cgi?id=24606 for the test case.
See https://reviews.llvm.org/D64930 for the background and more discussion.
Also this fixes another bug in malloc_aligned() where total size of the allocated memory might be not enough to fir the aligned requested block after the initial pointer is incremented by the pointer size.