Page MenuHomeFreeBSD

amd64: handle small memset buffers with overlapping stores
ClosedPublic

Authored by mjg on Oct 23 2018, 3:53 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Jan 20, 7:57 PM
Unknown Object (File)
Thu, Jan 9, 1:51 PM
Unknown Object (File)
Nov 24 2024, 11:36 PM
Unknown Object (File)
Nov 23 2024, 1:31 AM
Unknown Object (File)
Nov 21 2024, 2:22 AM
Unknown Object (File)
Nov 18 2024, 3:11 PM
Unknown Object (File)
Oct 2 2024, 9:23 PM
Unknown Object (File)
Sep 29 2024, 11:00 PM
Subscribers

Details

Summary

Instead of jumping to locations which store the exact number of bytes, use displacement to move the destination.

In particular the following clears an area between 8-16 (inclusive) branch-free:

movq    %r10,(%rdi)
movq    %r10,-8(%rdi,%rcx)

For instance for rcx of 10 the second line is rdi + 10 - 8 = rdi + 2. Writing 8 bytes starting at that offset overlaps with 6 bytes written previously and writes 2 new.

Provides a nice win for smaller stores. Other ones are erratic depending on the microarchitecture.

General idea taken from NetBSD (restricted use of the trick) and bionic string functions (use for various ranges like in this patch).

skylake-memset2.png (816×634 px, 50 KB)

Test Plan

glibc test suite

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

mjg retitled this revision from amd64: handle short copies with overlapping stores to amd64: handle small memset buffers with overlapping stores.

Could this cause store-to-load stalls on the users of the memory ?

sys/amd64/amd64/support.S
467 ↗(On Diff #49481)

Please use normal numeration for temporal labels, and mention the tail length in the comment. It is too cryptic for my taste.

mjg marked an inline comment as done.Oct 23 2018, 8:26 PM

I believe the problem is roughly the same as before. Passed buffers are often already heavily misaligned, so movs here trip over the same words anyway.

fwiw doing a "survey" reveals the method is used by bionic libc, musl and others.

sys/amd64/amd64/support.S
467 ↗(On Diff #49481)

They get quickly out of hand if changes have to be made. I can add a comment explaining the convention. It is roughly as follows:

Each section zeroes a range of <n-m> encoded in the label: 101632 zeroes between <16-32>. 103200 zeroes 32 or higher without a specific upper bound.

sys/amd64/amd64/support.S
467 ↗(On Diff #49481)

Use local labels instead of the temporary. They are named with the .L prefix. Sure, add the comment explaining conventions.

mjg marked an inline comment as done.Oct 23 2018, 9:43 PM
mjg added inline comments.
sys/amd64/amd64/support.S
467 ↗(On Diff #49481)

I already tried that, it fails claiming symbol redefinition. Hence the label encoding.

sys/amd64/amd64/support.S
467 ↗(On Diff #49481)

I do not understand what you mean. Show an example of the failure.

mjg marked an inline comment as done.Oct 24 2018, 10:16 PM
mjg added inline comments.
sys/amd64/amd64/support.S
467 ↗(On Diff #49481)
diff --git a/sys/amd64/amd64/support.S b/sys/amd64/amd64/support.S
index bbcbd54937f9..76fb5b694ee0 100644
--- a/sys/amd64/amd64/support.S
+++ b/sys/amd64/amd64/support.S
@@ -464,6 +464,8 @@ END(memcpy_erms)
        cmpq    $256,%rcx
        ja      1256f
 
+.Ltestlabel:
+
 1032:
        movq    %r10,(%rdi)
        movq    %r10,8(%rdi)

<instantiation>:14:1: error: invalid symbol redefinition
.Ltestlabel:

kib added inline comments.
sys/amd64/amd64/support.S
467 ↗(On Diff #49481)

Ok, lets split it. Go ahead with the current patch, and lets sort out labels + comments later.

So the <instantiation> diagnostic gives the big hint that the problem is due to the use of the non-temporal label in the macro, which is instantiated more than once in the same file, right ? Gas manual mentions \@ which counts the number of the macro instantiations. You can append it to the local label to get unique name. No, it is not too ugly.

This revision is now accepted and ready to land.Oct 25 2018, 2:42 AM
This revision was automatically updated to reflect the committed changes.