Assembly optimization of strncpy for PowerPC64Update:
Since PowerPC has hardware support for memory alignment, the initial session was removed. Also, using double words instead of the copy-by-byte session (Lstd_bytes to copy strings) was replaced by a faster code.
Performance gain with different string size:comparing to the former strncpy.S
| **String size (bytes)** | **size <= 8** | **16 <= size <= 32** | **64 <= size <= 128** | **256 <= size <= 512** | **1024 <= size <= 2048**
| ----- | ----- | ----- | ----- | ----- | -----
| Gain rate | 0.17% | 0.31% | 0.62% | 0.37% | 0.22%
Performance gain comparing to the original strncpy (linked in libc):
| **String size (bytes)** | **size <= 8** | **16 <= size <= 32** | **64 <= size <= 128** | **256 <= size <= 512** | **1024 <= size <= 2048**
| ----- | ----- | ----- | ----- | ----- | -----
| Gain rate | -0,120.23% | 2,43.48% | 9,458.30% | 29,135.81% | 59,93%5.12%