Update:
Since PowerPC has hardware support for memory alignment, the initial session was removed. Also, the copy-by-byte session (Lstd_byte) was replaced by a faster code.
Performance gain comparing to the former strncpy.S
| **String size (bytes)** | **size <= 8** | **16 <= size <= 32** | **64 <= size <= 128** | **256 <= size <= 512** | **1024 <= size <= 2048**
| ----- | ----- | ----- | ----- | ----- | -----
| Gain rate | 0.17% | 0.31% | 0.62% | 0.37% | 0.22%
Performance gain comparing to the original strncpy (linked in libc):
| **String size (bytes)** | **size <= 8** | **16 <= size <= 32** | **64 <= size <= 128** | **256 <= size <= 512** | **1024 <= size <= 2048**
| ----- | ----- | ----- | ----- | ----- | -----
| Gain rate | 0.23% | 2.48% | 8.30% | 25.81% | 55.12%