With PERTHREAD_SSP configured, SSP uses a per-thread canary value
instead of a global value.  The value is stored in td->td_md.md_canary;
the sp_el0 register always contains a pointer to that value, and certain
functions selected by the compiler will store the canary value on the
stack as a part of the function prologue (and will verify the copy as
part of the epilogue).  In particular, the thread structure may be
accessed.
This happens to occur in data_abort(), which leads to the same problem
addressed by commit 2c10be9e06d4 ("arm64: Handle translation faults for
thread structures").  This patch fixes that directly, by disabling SSP
in data_abort() and a couple of related functions by using a function
attribute.  It also moves the update of sp_el0 out of C code in case
the compiler decides to start checking the canary in pmap_switch()
someday.
A different solution might be to move the canary value to the PCB, which
currently lives on the kernel stack and isn't subject to the same
problem as thread structures.  However, there isn't any particular
reason the PCB has to live on the stack today; on amd64 it is embedded
in struct thread, reintroducing the same problem.  Keeping the reference
canary value at the top of the stack also seems a bit dubious since it
could be clobbered by a sufficiently large stack overflow.
A third solution could be to go back to the approach of commit
5aa5420ff2e8, and modify UMA to use the direct map for thread structures
even if KASAN is enabled.  But transient promotions and demotions in the
direct map are possible too.
Since all of these options have shortcomings, I went with the most
straightforward one.