Page MenuHomeFreeBSD

libc: scalar memchr() in RISC-V assembly
AcceptedPublic

Authored by strajabot on Jul 18 2024, 6:04 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Oct 10, 3:19 PM
Unknown Object (File)
Fri, Oct 10, 8:56 AM
Unknown Object (File)
Fri, Oct 10, 8:55 AM
Unknown Object (File)
Fri, Oct 10, 8:55 AM
Unknown Object (File)
Fri, Oct 10, 8:55 AM
Unknown Object (File)
Tue, Oct 7, 11:48 AM
Unknown Object (File)
Sun, Sep 28, 10:54 PM
Unknown Object (File)
Wed, Sep 24, 12:08 PM
Subscribers

Details

Reviewers
fuz
emaste
mhorne
Group Reviewers
riscv
Summary

Added an optimized memchr() implementation in RISC-V assembly and updated
the relevant manpage.

        │ memchr_baseline │            memchr_scalar            │
        │     sec/op      │   sec/op     vs base                │
Short         636.6µ ± 1%   495.9µ ± 1%  -22.10% (p=0.000 n=20)
Mid           279.7µ ± 1%   224.1µ ± 1%  -19.87% (p=0.000 n=20)
Long          138.8µ ± 0%   124.9µ ± 0%  -10.00% (p=0.000 n=20)
geomean       291.3µ        240.3µ       -17.48%

        │ memchr_baseline │            memchr_scalar             │
        │       B/s       │     B/s       vs base                │
Short        187.3Mi ± 1%   240.4Mi ± 1%  +28.37% (p=0.000 n=20)
Mid          426.2Mi ± 1%   531.9Mi ± 1%  +24.79% (p=0.000 n=20)
Long         859.0Mi ± 0%   954.4Mi ± 0%  +11.11% (p=0.000 n=20)
geomean      409.3Mi        496.0Mi       +21.19%
Test Plan

Tested using the in-tree test suite

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 67244
Build 64127: arc lint + arc unit

Event Timeline

Looks reasonable. Make sure the code works with phony buffer lengths such as SIZE_MAX.

lib/libc/riscv/string/memchr.S
35–36

Could you somehow avoid having to deal with two counters here? Each addition is 20% of the full loop, so if you get rid of one, that's a 20% speedup.

Also consider completely peeling the loop or using Duff's device.

120

Avoid depending on the previous instruction.

libc: unroll loop in RISC-V memchr()

Refactor of memchr to remove the case where len < 8 and to unroll the main loop. Removes many data dependencies between the instructions to improve performance on small strings.

        │ memchr_baseline │            memchr_scalar            │              memchr_uroll              │
        │     sec/op      │   sec/op     vs base                │   sec/op     vs base                   │
Short         636.6µ ± 1%   495.9µ ± 1%  -22.10% (p=0.000 n=20)   497.5µ ± 2%  -21.86% (p=0.000 n=20+22)
Mid           279.7µ ± 1%   224.1µ ± 1%  -19.87% (p=0.000 n=20)   220.0µ ± 2%  -21.35% (p=0.000 n=20+22)
Long          138.8µ ± 0%   124.9µ ± 0%  -10.00% (p=0.000 n=20)   117.3µ ± 2%  -15.45% (p=0.000 n=20+22)
geomean       291.3µ        240.3µ       -17.48%                  234.2µ       -19.60%

        │ memchr_baseline │            memchr_scalar             │
        │       B/s       │     B/s       vs base                │
Short        187.3Mi ± 1%   240.4Mi ± 1%  +28.37% (p=0.000 n=20)
Mid          426.2Mi ± 1%   531.9Mi ± 1%  +24.79% (p=0.000 n=20)
Long         859.0Mi ± 0%   954.4Mi ± 0%  +11.11% (p=0.000 n=20)
geomean      409.3Mi        496.0Mi       +21.19%

        │ memchr_uroll │
        │    MiB/s     │
Short       251.3 ± 1%
Mid         568.2 ± 2%
Long       1.065k ± 2%
geomean     533.8
mhorne added a subscriber: mhorne.
mhorne added inline comments.
lib/libc/riscv/string/memchr.S
82–86

Please improve the formatting of this comment.

This revision is now accepted and ready to land.Aug 19 2025, 6:47 PM
strajabot marked 2 inline comments as done.
  • fixup: update comments in memchr.S
This revision now requires review to proceed.Mon, Sep 22, 8:51 PM
This revision is now accepted and ready to land.Mon, Sep 22, 9:28 PM