HomeFreeBSD

math/openblas: update to 0.3.10, add POWER8 option

Description

math/openblas: update to 0.3.10, add POWER8 option

Changelog:

common:

Improved thread locking behaviour in blas_server and parallel getrf
Imported bugfix 394 from LAPACK (spurious reference to "XERBL"
due to overlong lines)
Imported bugfix 403 from LAPACK (compile option "recursive" required
for correctness with Intel and PGI)
Imported bugfix 408 from LAPACK (wrong scaling in ZHEEQUB)
Imported bugfix 411 from LAPACK (infinite loop in LARGV/LARTG/LARTGP)
Fixed mismatches between BUFFERSIZE and GEMM_UNROLL parameters that
could lead to crashes at large matrix sizes
Restored internal soname in dynamic libraries on FreeBSD and Dragonfly
Added API (openblas_setaffinity) to set thread affinity
programmatically on Linux
Added initial infrastructure for half-precision floating point
(bfloat16) support with a generic implementation of SHGEMM
Added CMAKE build system support for building the cblas_Xgemm3m
functions
Fixed CMAKE support for building in a path with embedded spaces
Fixed CMAKE (non)handling of NO_EXPRECISION and MAX_STACK_ALLOC
Fixed GCC version detection in the Makefiles
Allowed overriding the names of AR, AS and LD in Makefile builds

POWER:

fixed big-endian POWER8 ELFv2 builds on FreeBSD
Fixed GCC version checks and DYNAMIC_ARCH builds on POWER9
Fixed CMAKE build support for POWER9
fixed a potential race condition in the thread buffer allocation
Worked around LAPACK test failures on PPC G4

MIPS:

fixed a potential race condition in the thread buffer allocation
Added support for MIPS 24K/24KE family based on P5600 kernels

MIPS64:

fixed a potential race condition in the thread buffer allocation
Added TARGET=GENERIC

ARMV7:

fixed a race condition in the thread buffer allocation

ARMV8:

Fixed a race condition in the thread buffer allocation
Fixed zero initialisation in the assembly for SGEMM and DGEMM BETA
Improved performance of the ThunderX2 DAXPY kernel
Added an optimized SGEMM kernel for Cortex A53
Fixed Makefile support for INTERFACE64 (8-byte integer)

x86_64:

Fixed a syntax error in the CMAKE setup for SkylakeX
Improved performance of STRSM on Haswell, SkylakeX and Ryzen
Improved SGEMM performance on SGEMM for workloads with ldc a
multiple of 1024
Improved DGEMM performance on Skylake X
Fixed unwanted AVX512-dependency of SGEMM in DYNAMIC_ARCH
builds created on SkylakeX
Removed data alignment requirement in the SSE2 copy kernels
that could cause spurious crashes
Added a workaround for an optimizer bug in AppleClang 11.0.3
Fixed LAPACK-TEST failures with Intel Fortran
Fixed compilation and LAPACK test results with recent Flang
and AMD AOCC
Fixed DYNAMIC_ARCH builds with CMAKE on OS X
Fixed missing exports of cblas_i?amin, cblas_i?min, cblas_i?max,
cblas_?sum, cblas_?gemm3m in the shared library on OS X
Fixed reporting of cpu name in DYNAMIC_ARCH builds (would sometimes
show the name of an older generation chip supported by the same kernels)

IBM Z:

Improved performance of SGEMM/STRMM and DGEMM/DTRMM on Z14

PR: 249120
Approved by: phd_kimberlite@yahoo.co.jp (maintainer)

Details

Provenance
pkubajAuthored on
Parents
rP547858: www/py-pywikibot: Add python's 3.5+ flag
Branches
Unknown
Tags
Unknown