HomeFreeBSD

Retune SHA2 code for improved performance on CPUs with more ILP and

Description

Retune SHA2 code for improved performance on CPUs with more ILP and
a preference for memory load instructions over large code footprints
with embedded immediate variables.

On amd64 CPUs from 2007-2008 there is not a significant change, but
amd64 CPUs from 2009-2010 get roughly 10% more throughput with this
code; amd64 CPUs from 2011-2012 get roughly 15% more throughput; and
AMD64 CPUs from 2013-2015 get 20-25% more throughput. The Raspberry
Pi 2 increases its throughput by 6-8%.

Sponsored by: Tarsnap Backup Inc.
Performance tested by: allanjude
MFC after: 3 weeks

Details

Provenance
cpercivaAuthored on
Parents
rS300965: Micro optimize: C standard guarantees that right shift for unsigned value
Branches
Unknown
Tags
Unknown