HomeFreeBSD

amd64: implement strlen in assembly

Description

amd64: implement strlen in assembly

The C variant in libkern performs excessive branching to find the
non-zero byte instead of using the bsfq instruction. The same code
patched to use it is still slower than the routine implemented here
as the compiler keeps neglecting to perform certain optimizations
(like using leaq).

On top of that the routine can is a starting point for copyinstr
which operates on words instead of bytes.

Tested with glibc test suite.

Sample results (calls/s):

Haswell:
$(perl -e "print 'A' x 3"):
stock: 211198039
patched:338626619
asm: 465609618

$(perl -e "print 'A' x 100"):
stock: 83151997
patched: 98285919
asm: 120719888

AMD EPYC 7R32:
$(perl -e "print 'A' x 3"):
stock: 282523617
asm: 491498172

$(perl -e "print 'A' x 100"):
stock: 114857172
asm: 112082057

Details

Provenance
mjgAuthored on Feb 8 2021, 5:01 PM
Parents
rG3acea07c1873: Restore the augmented strlen commentary
Branches
Unknown
Tags
Unknown
Reverted By
rGb49a0db6628e: Revert "amd64: implement strlen in assembly"