It causes clang to generate single-byte access to the target memory with byte mask.
Example of the current asm generated from the fork_exit()
sched_fork_exit(td); if ((dtd = PCPU_GET(deadthread))) { PCPU_SET(deadthread, NULL); thread_stash(dtd); }
fragment for PCPU_SET:
XXX
The same line after the patch:
0xffffffff80370bb6 <+118>: mov %rbx,%rdi 0xffffffff80370bb9 <+121>: callq 0xffffffff803dc100 <sched_fork_exit> 0xffffffff80370bbe <+126>: mov %gs:0x18,%rdi 0xffffffff80370bc7 <+135>: test %rdi,%rdi 0xffffffff80370bca <+138>: je 0xffffffff80370bdc <fork_exit+156> 0xffffffff80370bcc <+140>: xor %eax,%eax 0xffffffff80370bce <+142>: mov %rax,%gs:0x18 0xffffffff80370bd7 <+151>: callq 0xffffffff803bf5a0 <thread_stash> 0xffffffff80370bdc <+156>: mov (%rbx),%rdi