Page MenuHomeFreeBSD

amd64: align memset buffers to 16 bytes before using rep stos
ClosedPublic

Authored by mjg on Oct 23 2018, 4:55 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Nov 7, 8:48 PM
Unknown Object (File)
Sep 24 2024, 9:23 AM
Unknown Object (File)
Sep 23 2024, 11:27 PM
Unknown Object (File)
Sep 23 2024, 8:10 PM
Unknown Object (File)
Sep 23 2024, 5:51 PM
Unknown Object (File)
Sep 23 2024, 12:30 AM
Unknown Object (File)
Sep 21 2024, 6:59 PM
Unknown Object (File)
Sep 4 2024, 10:00 AM
Subscribers

Details

Summary

Here are basic results. Both Intel manual and Agner Fog's docs suggest aligning to 16.

Benchmarking code courtesy of jtl. It performs stores of various misalignment and sizes. The change provides a win on EPYC and Haswell. Skylake did not see a change.

AMD EPYC 7281 16-Core Processor:

(256-511)... memset_erms: 2.300338642 seconds.
(512-1023)... memset_erms: 5.197097775 seconds.
(1024-2047)... memset_erms: 12.781057584 seconds.
(2048-4095)... memset_erms: 17.578486362 seconds.
(256-511)... memset_current: 1.140207436 seconds.
(512-1023)... memset_current: 4.588418390 seconds.
(1024-2047)... memset_current: 12.907714444 seconds.
(2048-4095)... memset_current: 21.990217317 seconds.
(256-511)... memset_current_erms: 2.233002987 seconds.
(512-1023)... memset_current_erms: 5.075316164 seconds.
(1024-2047)... memset_current_erms: 12.537853357 seconds.
(2048-4095)... memset_current_erms: 17.335795972 seconds.
(256-511)... memset_align16: 1.110913214 seconds.
(512-1023)... memset_align16: 3.903778784 seconds.
(1024-2047)... memset_align16: 10.226433854 seconds.
(2048-4095)... memset_align16: 15.021172741 seconds.
(256-511)... memset_align16_erms: 1.796111017 seconds.
(512-1023)... memset_align16_erms: 4.197904826 seconds.
(1024-2047)... memset_align16_erms: 10.784860989 seconds.
(2048-4095)... memset_align16_erms: 15.579892270 seconds.
(256-511)... memset_align32: 1.102154891 seconds.
(512-1023)... memset_align32: 3.869436677 seconds.
(1024-2047)... memset_align32: 10.178790865 seconds.
(2048-4095)... memset_align32: 14.973767848 seconds.
(256-511)... memset_align32_erms: 1.800761732 seconds.
(512-1023)... memset_align32_erms: 4.205545185 seconds.
(1024-2047)... memset_align32_erms: 10.800691952 seconds.
(2048-4095)... memset_align32_erms: 15.595877200 seconds.

Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz:

(256-511)... memset_erms: 0.927782495 seconds.
(512-1023)... memset_erms: 2.395261299 seconds.
(1024-2047)... memset_erms: 7.915244753 seconds.
(2048-4095)... memset_erms: 14.676952348 seconds.
(256-511)... memset_current: 1.317894401 seconds.
(512-1023)... memset_current: 3.188063859 seconds.
(1024-2047)... memset_current: 9.032718976 seconds.
(2048-4095)... memset_current: 15.773163394 seconds.
(256-511)... memset_current_erms: 0.883882284 seconds.
(512-1023)... memset_current_erms: 2.376637886 seconds.
(1024-2047)... memset_current_erms: 7.913666031 seconds.
(2048-4095)... memset_current_erms: 14.652330491 seconds.
(256-511)... memset_align16: 1.319897053 seconds.
(512-1023)... memset_align16: 3.057998031 seconds.
(1024-2047)... memset_align16: 7.904105772 seconds.
(2048-4095)... memset_align16: 13.013629611 seconds.
(256-511)... memset_align16_erms: 0.928722912 seconds.
(512-1023)... memset_align16_erms: 2.272421882 seconds.
(1024-2047)... memset_align16_erms: 6.898529762 seconds.
(2048-4095)... memset_align16_erms: 11.982380596 seconds.
(256-511)... memset_align32: 1.314689455 seconds.
(512-1023)... memset_align32: 3.033606647 seconds.
(1024-2047)... memset_align32: 7.147985430 seconds.
(2048-4095)... memset_align32: 10.484084497 seconds.
(256-511)... memset_align32_erms: 0.913452673 seconds.
(512-1023)... memset_align32_erms: 2.138181087 seconds.
(1024-2047)... memset_align32_erms: 5.699912660 seconds.
(2048-4095)... memset_align32_erms: 9.063921729 seconds.

Test Plan

glibc test suite

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib added inline comments.
sys/amd64/amd64/support.S
558 ↗(On Diff #49484)

subq $16,%r8/addq %r8,%rcx/subq %r8,%rdi ?

This revision is now accepted and ready to land.Oct 23 2018, 4:09 PM
This revision was automatically updated to reflect the committed changes.