Page MenuHomeFreeBSD

amd64: implement strlen in assembly, take 2
ClosedPublic

Authored by mjg on Feb 18 2021, 7:44 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Nov 18, 10:48 PM
Unknown Object (File)
Mon, Nov 18, 8:57 PM
Unknown Object (File)
Sun, Nov 17, 1:05 AM
Unknown Object (File)
Thu, Nov 14, 8:50 PM
Unknown Object (File)
Mon, Nov 11, 1:23 PM
Unknown Object (File)
Sun, Nov 10, 1:47 PM
Unknown Object (File)
Sun, Nov 10, 11:22 AM
Unknown Object (File)
Sun, Nov 10, 2:21 AM
Subscribers
None

Details

Summary

amd64: implement strlen in assembly

Tested with glibc test suite and a custom test which can be found in the
review.

The C variant in libkern performs excessive branching to find the
zero byte instead of using the bsfq instruction. The same code
patched to use it is still slower than the routine implemented here
as the compiler keeps neglecting to perform certain optimizations
(like using leaq).

On top of that the routine can is a starting point for copyinstr
which operates on words instead of bytes.

The previous attempt had an instance of swapped operands to
andq when dealing with fully aligned case, which had a side effect
of breaking the code for certain corner cases. Noted by jrtc27.

Sample results:

$(perl -e "print 'A' x 3"):
stock: 211198039
patched:338626619
asm: 465609618

$(perl -e "print 'A' x 100"):
stock: 83151997
patched: 98285919
asm: 120719888

Test Plan

the glibc test suite works fine for testing misaligned uses, but notoriously fails to properly test when the buffer is aligned to 8 bytes. Thus the program below was written. It finds the reported problem of kyua test lib/libc/gen/setdomainname_test failing and any intentional breakage I tried to throw at it.

#include <sys/random.h>
  
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

#define ALIGN_LOOPS     10

size_t my_strlen(const char *s);

void
dump_buf(char *buf, size_t size)
{
        size_t i;

        printf("buf is: ");
        for (i = 0; i < size; i++) {
                printf("%x ", buf[i]);
        }
        printf("\n");
}

int
main(int argc, char **argv)
{
        int i, j, len, size, slen;
        char *buf, *orig_buf;
        char saved;

        size = atoi(argv[1]);
        orig_buf = buf = aligned_alloc(8, size + ALIGN_LOOPS + 1);
        memset(buf, 0, size + ALIGN_LOOPS + 1);

        getrandom(buf, size, GRND_NONBLOCK);
        for (i = 0; i < size + ALIGN_LOOPS; i++) {
                while (buf[i] == '\0') {
                        getrandom(&buf[i], 1, GRND_NONBLOCK);
                }
        }
        buf[size + ALIGN_LOOPS] = '\0';

        for (j = 0; j < ALIGN_LOOPS; j++) {
                buf += j;
                for (i = 0; i < size; i++) {
                        memset(orig_buf, 0, buf - orig_buf);
                        saved = buf[i];
                        buf[i] = '\0';
                        len = my_strlen(buf);
                        slen = strlen(buf);
                        if (len != slen) {
                                printf("diff at %d (%d != %d) buf %p\n", i, len, slen, buf);
                        }
                //      printf("tested len %d off %x\n", len, (uintptr_t)buf & 0x7);
                        if (len != i) {
                                printf("i diff at %d (got %d slen %d) buf %p\n", i, len, slen, buf);
                                dump_buf(buf, size);
                        }
                        buf[i] = saved;
                }
                buf -= j;
        }
}

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

mjg requested review of this revision.Feb 18 2021, 7:44 PM
mjg created this revision.
mjg edited the summary of this revision. (Show Details)

Minor nit: s/can is a starting point/can be a starting point/ in commit log.

If I understand the implications of what Jess found before, the swapped operand meant the main loop was losing the ~x part and just storing ((x - 0x01....01) & 0x80....80) in %rcx in the main body of the loop, so it wasn't just aligned strings that would have been broken but a string longer than 8 bytes as well? I guess it would treat the first byte in a word from '0x81 -> 0xff' as well as '0x00' as a the first NUL?

This revision is now accepted and ready to land.Feb 20 2021, 1:11 AM
This revision was automatically updated to reflect the committed changes.