When this code was first written we didn't have even a struct tcb, so to
make it MI a pointer to the DTV pointer in the TCB was passed around.
Now that we have a struct tcb we can simplify the code by instead
passing around a pointer to that, and the MI code can access the tcb_dtv
member wherever it happens to be in the layout. This reduces boilerplate
in all the various callers of tls_get_addr_common/slow and makes it
clearer that tls_get_addr_common/slow are operating on the TCB, rather
than obfuscating it slightly through the double pointer.
Whilst here, clarify the comments in aarch64's TLSDESC dynamic resolver,
which were using tp without clarifying what this was for (previously a
pointer to the DTV pointer, now a pointer to the TCB, which happen to be
the same thing for Variant I TLS, and in the case of AArch64 are what
TPIDR_EL0 point to directly, with no offset/bias).