If the DF_STATIC_TLS dynamic flag is set, rtld tries to allocate TLS in static space. If there is no space left, the dlopen(3) fails. If space if allocated, initial content from PT_TLS segment is distributed to all threads' pcbs.
See https://www.redhat.com/archives/phil-list/2003-February/msg00077.html for plain language explanation.
I am unsure about https://reviews.llvm.org/D33041, I did observed that lld 6.0.1 does not set DF_STATIC_TLS. IMO this is a bug.
The test case is https://github.com/dumbbell/test-tls-initial-exec. I only compiled and somewhat tested patch on amd64, and patch have untested bits for i386.