User Details
- User Since
- May 24 2024, 9:38 PM (15 w, 2 d)
Tue, Aug 27
Included one commit too many last time :)
bcopy and bzero are now also SIMDified
Mon, Aug 26
- Revert to previous strlen as the other version failed a test when running entire test suite
I dont think this one is gonna be worth including as I'm unable to comfortably beat the perf of the existing implementation.
remove manpage update, lets handle that in a separate commit
- fix Makefile
Sun, Aug 25
- Update based on review
- Use unsigned comparisons for main loop
Fri, Aug 23
strcat in C instead of assembly
alright, that makes sense. I'll replace it.
- revert accidental edit of the makefile
- Update based on review
Thu, Aug 22
- Update based on review
- amd64 -> aarch64 :)
Wed, Aug 21
Wed, Aug 14
Follow __$FUNC convention
ifdef's to simulate bcmp
follow __$FUNC convention
follow __$FUNC convention
Use unsigned comparisons for limit
follow __$FUNC convention
Tue, Aug 13
Update based on review.
Mon, Aug 12
- unsigned comparison for limit (b.mi -> b.lo)
- label function using __$FUNC convention
Aug 9 2024
Aug 8 2024
Aug 1 2024
Use correct style for SPDX identifier
Pad .rodata with -1 and use unsigned comparisons
Use unsigned comparisons everywhere
Use unsigned comparisons everywhere
Jul 31 2024
New method for handling short strings based on D46052
Jul 29 2024
will rebase on D46052 to improve performance for the short case
Looks good, gonna revise the aarch64 port of this to handle it the same way.
Jul 27 2024
Jul 15 2024
Introduces a null "match" when limit is reached for short strings near a page boundary
Jul 14 2024
- Revert to complicated handling of short string near a page boundary
I extended the unit tests and it was failing for short unaligned strings near a page boundary.
Jul 12 2024
- Revert last commit, it wasn't unnecessary :)
Jul 11 2024
- Remove unnecessary mov, from strncmp review
Updated based on review
- Use correct offset for the loop in the normal case
- Updated based on review
Thanks for review! Changes appear to increase performance by a percent or two.
Jul 10 2024
Fixes the page crossing for tails.
I'll write a proper new test instead testing more cases for the bounds
Jul 9 2024
- Revert to branched handling of strings near the end of a page.
Hit the submit button too early, removing the branches for the tbl maneuver didn't work as I had hoped.
Remove branching for tbl maneuver.
Applies suggested microoptimizations for arithmetic.
Jul 5 2024
- Remove unnecessary mov
Jul 2 2024
Jun 24 2024
Jun 21 2024
- Avoid page crossing into unmapped pages
Passes all tests, need to extend memcmp tests to verify correct
behaviour for small buffers at page boundaries.
Jun 19 2024
- Address comments
interleaves independent instructions for paralellism.
finds the offset in a branch free manner after a match.
simplifies the code for avoiding overreads in the main loop
Jun 18 2024
- Avoid overreads in main loop
We re-compare some of the bytes we already compared to avoid crossing
over into an unmapped page.
I'm working on solving that right now, I mentioned it in the description.
The problem is that a buffer less than 32 bytes can be placed at the end of a page causing an overread.
I'll update it later today. I was mostly looking for some feedback at first.
Jun 17 2024
Jun 7 2024
- simd(7): also mention (NEON)
- simd(7): ASIMD instead of Neon