libc/amd64: fix stpncpy.S again
The previous fix introduced a regression on machines without the BMI1
instruction set extension. The TZCNT instruction used in this function
behaves different on old machines when the source operand is zero, but
the code was originally designed to never trigger this case. The bug
fix caused this case to be possible, leading to a regression on
sufficiently old hardware.
Fix the code by messing with things such that the source operand is
never zero.
PR: 291720
Fixes: 66eb78377bf109af1d9e25626bf254b4369436ec
Tested by: cy
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D54303