Page MenuHomeFreeBSD

amd64: implement strlen in assembly, take 2

Authored by mjg on Feb 18 2021, 7:44 PM.



amd64: implement strlen in assembly

Tested with glibc test suite and a custom test which can be found in the

The C variant in libkern performs excessive branching to find the
zero byte instead of using the bsfq instruction. The same code
patched to use it is still slower than the routine implemented here
as the compiler keeps neglecting to perform certain optimizations
(like using leaq).

On top of that the routine can is a starting point for copyinstr
which operates on words instead of bytes.

The previous attempt had an instance of swapped operands to
andq when dealing with fully aligned case, which had a side effect
of breaking the code for certain corner cases. Noted by jrtc27.

Sample results:

$(perl -e "print 'A' x 3"):
stock: 211198039
asm: 465609618

$(perl -e "print 'A' x 100"):
stock: 83151997
patched: 98285919
asm: 120719888

Test Plan

the glibc test suite works fine for testing misaligned uses, but notoriously fails to properly test when the buffer is aligned to 8 bytes. Thus the program below was written. It finds the reported problem of kyua test lib/libc/gen/setdomainname_test failing and any intentional breakage I tried to throw at it.

#include <sys/random.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

#define ALIGN_LOOPS     10

size_t my_strlen(const char *s);

dump_buf(char *buf, size_t size)
        size_t i;

        printf("buf is: ");
        for (i = 0; i < size; i++) {
                printf("%x ", buf[i]);

main(int argc, char **argv)
        int i, j, len, size, slen;
        char *buf, *orig_buf;
        char saved;

        size = atoi(argv[1]);
        orig_buf = buf = aligned_alloc(8, size + ALIGN_LOOPS + 1);
        memset(buf, 0, size + ALIGN_LOOPS + 1);

        getrandom(buf, size, GRND_NONBLOCK);
        for (i = 0; i < size + ALIGN_LOOPS; i++) {
                while (buf[i] == '\0') {
                        getrandom(&buf[i], 1, GRND_NONBLOCK);
        buf[size + ALIGN_LOOPS] = '\0';

        for (j = 0; j < ALIGN_LOOPS; j++) {
                buf += j;
                for (i = 0; i < size; i++) {
                        memset(orig_buf, 0, buf - orig_buf);
                        saved = buf[i];
                        buf[i] = '\0';
                        len = my_strlen(buf);
                        slen = strlen(buf);
                        if (len != slen) {
                                printf("diff at %d (%d != %d) buf %p\n", i, len, slen, buf);
                //      printf("tested len %d off %x\n", len, (uintptr_t)buf & 0x7);
                        if (len != i) {
                                printf("i diff at %d (got %d slen %d) buf %p\n", i, len, slen, buf);
                                dump_buf(buf, size);
                        buf[i] = saved;
                buf -= j;

Diff Detail

R10 FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

mjg requested review of this revision.Feb 18 2021, 7:44 PM
mjg created this revision.
mjg edited the summary of this revision. (Show Details)

Minor nit: s/can is a starting point/can be a starting point/ in commit log.

If I understand the implications of what Jess found before, the swapped operand meant the main loop was losing the ~x part and just storing ((x - 0x01....01) & 0x80....80) in %rcx in the main body of the loop, so it wasn't just aligned strings that would have been broken but a string longer than 8 bytes as well? I guess it would treat the first byte in a word from '0x81 -> 0xff' as well as '0x00' as a the first NUL?

This revision is now accepted and ready to land.Feb 20 2021, 1:11 AM
This revision was automatically updated to reflect the committed changes.