Page MenuHomeFreeBSD

libc/aarch64: fix strlen() when flush-to-zero is set
ClosedPublic

Authored by fuz on Mon, Jan 13, 1:47 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 23, 8:09 AM
Unknown Object (File)
Mon, Jan 20, 12:54 PM
Unknown Object (File)
Mon, Jan 20, 11:35 AM
Unknown Object (File)
Sun, Jan 19, 2:19 PM
Unknown Object (File)
Sun, Jan 19, 1:12 PM
Unknown Object (File)
Sat, Jan 18, 8:37 AM
Unknown Object (File)
Sat, Jan 18, 8:15 AM
Unknown Object (File)
Sat, Jan 18, 4:29 AM

Details

Summary

Our SIMD-enhanced strlen() implementation for AArch64 uses
a floating-point comparison to compare a bit mask to zero.
This works fine under normal circumstances, but fails if
the FZ (flush-to-zero) flag is set in FPCR (the floating-point
control register) as then the CPU no longer distinguishes
denormals from zero.

This was not caught during testing; this flag is rarely set
and programs that do so rarely perform string manipulation.

Avoid this problem by using an integer comparison instead.
The performance impact seems to be small (about 0.5 %).

Test Plan

passes the test suite

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

fuz requested review of this revision.Mon, Jan 13, 1:47 PM

Looks good with regards to the function but the performance is dependent on the uarch, on the pi5 we get a 19% decrease in performance on long strings (> 1 << 30 bytes).
I think this is a reasonable trade-off for increased performance for short and medium length strings which is the common case.

This revision is now accepted and ready to land.Wed, Jan 15, 3:36 PM
markmi_dsl-only.net added inline comments.
lib/libc/aarch64/string/strlen.S
36

Is the possible change in conditions for when fmov x1, d0 happens in any way significant when x1 would end up with a different value (0) that the original code would avoid assigning x1 before .Lloop was used?

lib/libc/aarch64/string/strlen.S
36

We avoid a move to a GPR for each iteration. This was still carried out in the original code once a match has been found as we need the index of the first match.