HomeFreeBSD

amd64: implement strlen in assembly, take 2

Description

amd64: implement strlen in assembly, take 2

Tested with glibc test suite.

The C variant in libkern performs excessive branching to find the zero
byte instead of using the bsfq instruction. The same code patched to use
it is still slower than the routine implemented here as the compiler
keeps neglecting to perform certain optimizations (like using leaq).

On top of that the routine can be used as a starting point for copyinstr
which operates on words intead of bytes.

The previous attempt had an instance of swapped operands to andq when
dealing with fully aligned case, which had a side effect of breaking the
code for certain corner cases. Noted by jrtc27.

Sample results:

$(perl -e "print 'A' x 3"):
stock: 211198039
patched:338626619
asm: 465609618

$(perl -e "print 'A' x 100"):
stock: 83151997
patched: 98285919
asm: 120719888

Reviewed by: jhb, kib
Differential Revision: https://reviews.freebsd.org/D28779

(cherry picked from commit 5fa12fe0cd203efcbb2ac21e7c3e3fb9b2f801ae)

Details

Provenance
mjgAuthored on Apr 10 2021, 1:52 PM
Reviewer
jhb
Differential Revision
D28779: amd64: implement strlen in assembly, take 2
Parents
rG7755e8ae322b: ifconfig: fix UBSan signed shift error
Branches
Unknown
Tags
Unknown