HomeFreeBSD

amd64: finish the tail in memset with an overlapping store

Description

amd64: finish the tail in memset with an overlapping store

Instead of finding the exact size to fit in we can just shift the target
by -8 + tail. Doing a blind write to a previously rep stosq'ed area comes
with a penalty so do it conditionally.

Sample win on EPYC when zeroing a 257 sized buffer (tail = 1) aligned to
16 bytes:
before: 44782846 ops/s
after: 46118614 ops/s

Idea stolen from NetBSD.

Sponsored by: The FreeBSD Foundation

Details

Provenance
mjgAuthored on
Parents
rS339578: pfctl: Fix line numbers when \ is used inside ""
Branches
Unknown
Tags
Unknown