Page MenuHomeFreeBSD

libc: scalar memchr() in RISC-V assembly
ClosedPublic

Authored by strajabot on Jul 18 2024, 6:04 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Nov 1, 10:59 PM
Unknown Object (File)
Wed, Oct 29, 11:32 AM
Unknown Object (File)
Wed, Oct 29, 9:06 AM
Unknown Object (File)
Wed, Oct 29, 9:05 AM
Unknown Object (File)
Wed, Oct 29, 9:05 AM
Unknown Object (File)
Wed, Oct 29, 8:43 AM
Unknown Object (File)
Tue, Oct 28, 4:26 AM
Unknown Object (File)
Mon, Oct 27, 12:51 AM
Subscribers

Details

Summary

Added an optimized memchr() implementation in RISC-V assembly and updated
the relevant manpage.

        │ memchr_baseline │            memchr_scalar            │
        │     sec/op      │   sec/op     vs base                │
Short         636.6µ ± 1%   495.9µ ± 1%  -22.10% (p=0.000 n=20)
Mid           279.7µ ± 1%   224.1µ ± 1%  -19.87% (p=0.000 n=20)
Long          138.8µ ± 0%   124.9µ ± 0%  -10.00% (p=0.000 n=20)
geomean       291.3µ        240.3µ       -17.48%

        │ memchr_baseline │            memchr_scalar             │
        │       B/s       │     B/s       vs base                │
Short        187.3Mi ± 1%   240.4Mi ± 1%  +28.37% (p=0.000 n=20)
Mid          426.2Mi ± 1%   531.9Mi ± 1%  +24.79% (p=0.000 n=20)
Long         859.0Mi ± 0%   954.4Mi ± 0%  +11.11% (p=0.000 n=20)
geomean      409.3Mi        496.0Mi       +21.19%
Test Plan

Tested using the in-tree test suite

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Looks reasonable. Make sure the code works with phony buffer lengths such as SIZE_MAX.

lib/libc/riscv/string/memchr.S
35–36

Could you somehow avoid having to deal with two counters here? Each addition is 20% of the full loop, so if you get rid of one, that's a 20% speedup.

Also consider completely peeling the loop or using Duff's device.

120

Avoid depending on the previous instruction.

libc: unroll loop in RISC-V memchr()

Refactor of memchr to remove the case where len < 8 and to unroll the main loop. Removes many data dependencies between the instructions to improve performance on small strings.

        │ memchr_baseline │            memchr_scalar            │              memchr_uroll              │
        │     sec/op      │   sec/op     vs base                │   sec/op     vs base                   │
Short         636.6µ ± 1%   495.9µ ± 1%  -22.10% (p=0.000 n=20)   497.5µ ± 2%  -21.86% (p=0.000 n=20+22)
Mid           279.7µ ± 1%   224.1µ ± 1%  -19.87% (p=0.000 n=20)   220.0µ ± 2%  -21.35% (p=0.000 n=20+22)
Long          138.8µ ± 0%   124.9µ ± 0%  -10.00% (p=0.000 n=20)   117.3µ ± 2%  -15.45% (p=0.000 n=20+22)
geomean       291.3µ        240.3µ       -17.48%                  234.2µ       -19.60%

        │ memchr_baseline │            memchr_scalar             │
        │       B/s       │     B/s       vs base                │
Short        187.3Mi ± 1%   240.4Mi ± 1%  +28.37% (p=0.000 n=20)
Mid          426.2Mi ± 1%   531.9Mi ± 1%  +24.79% (p=0.000 n=20)
Long         859.0Mi ± 0%   954.4Mi ± 0%  +11.11% (p=0.000 n=20)
geomean      409.3Mi        496.0Mi       +21.19%

        │ memchr_uroll │
        │    MiB/s     │
Short       251.3 ± 1%
Mid         568.2 ± 2%
Long       1.065k ± 2%
geomean     533.8
mhorne added a subscriber: mhorne.
mhorne added inline comments.
lib/libc/riscv/string/memchr.S
82–86

Please improve the formatting of this comment.

This revision is now accepted and ready to land.Aug 19 2025, 6:47 PM
strajabot marked 2 inline comments as done.
  • fixup: update comments in memchr.S
This revision now requires review to proceed.Sep 22 2025, 8:51 PM
This revision is now accepted and ready to land.Sep 22 2025, 9:28 PM

Commit is planned following final test by @strajabot, which we have been working on for the last week (the box is slow, sorry). The other riscv64 libc commits are the same, so I won't repeat that comment on them.