|
|
b224bad |
# pveclib
|
|
|
b224bad |
|
|
Steven Munroe |
46b3a36 |
## Power Vector Library
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
Header files that contain useful functions leveraging the PowerISA
|
|
Steven Munroe |
46b3a36 |
Vector Facilities: Vector Multimedia Extension (VMX AKA Altivec) and
|
|
Steven Munroe |
46b3a36 |
Vector Scalar Extension (VSX). Larger functions like quadword multiply
|
|
Steven Munroe |
46b3a36 |
and multiple quadword multiply and madd are large enough to justify
|
|
Steven Munroe |
46b3a36 |
CPU specific and tuned run-time libraries. The user can choose to bind
|
|
Steven Munroe |
46b3a36 |
to platform specific static archives or dynamic shared object libraries
|
|
Steven Munroe |
46b3a36 |
which automatically (dynamic linking with IFUNC resolves) select the
|
|
Steven Munroe |
46b3a36 |
correct implementation for the CPU it is running on.
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
The goal of this project to provide well crafted implementations
|
|
Steven Munroe |
46b3a36 |
of useful vector and large number operations:
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
- Provide equivalent functions across versions of the PowerISA.
|
|
Steven Munroe |
46b3a36 |
For example the Vector Multiply-by-10 Unsigned Quadword
|
|
Steven Munroe |
46b3a36 |
operations introduced in PowerISA 3.0 (POWER9) can be implement in a
|
|
Steven Munroe |
46b3a36 |
few vector instructions on earlier PowerISA versions.
|
|
Steven Munroe |
46b3a36 |
- Provide equivalent functions across versions of the compiler.
|
|
Steven Munroe |
46b3a36 |
For example builtins provided in later versions of the compiler
|
|
Steven Munroe |
46b3a36 |
can be implemented as inline functions with inline asm in earlier
|
|
Steven Munroe |
46b3a36 |
compiler versions.
|
|
Steven Munroe |
46b3a36 |
- Provide higher order functions not provided directly by the PowerISA.
|
|
Steven Munroe |
46b3a36 |
For example vector SIMD implementation for ASCII `__isalpha`, etc.
|
|
Steven Munroe |
46b3a36 |
Another example full `__int128` implementations of Count Leading Zeros,
|
|
Steven Munroe |
46b3a36 |
Population Count, and Multiply.
|
|
Steven Munroe |
46b3a36 |
- Provide optimized run-time libraries for quadword integer multiply
|
|
Steven Munroe |
46b3a36 |
and multi-quadword integer multiply and add.
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
## Build
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
PVECLIB now supports CPU tuned run-time libraries, both static archives
|
|
Steven Munroe |
46b3a36 |
and dynamic (IFUNC selected) shared objects. This complicates the build
|
|
Steven Munroe |
46b3a36 |
process as it now has to build the same source code, multiple times,
|
|
Steven Munroe |
46b3a36 |
with different compile targets (-mcpu=). Another complication comes
|
|
Steven Munroe |
46b3a36 |
from compiling for big endian systems where the compiler default target
|
|
Steven Munroe |
46b3a36 |
may not include the vector facilities (VMX and VSX).
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
## Configure and option flags
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
The project can use configure test to define options like AM_CPPFLAGS
|
|
Steven Munroe |
46b3a36 |
and AM_CFLAGS but the user command line options (CPPFLAGS and CFLAGS)
|
|
Steven Munroe |
46b3a36 |
are always applied last and take precedent.
|
|
Steven Munroe |
46b3a36 |
See: Automake "Flag Variables Ordering"
|
|
Steven Munroe |
46b3a36 |
https://www.gnu.org/software/automake/manual/html_node/Flag-Variables-Ordering.html
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
So a configure flag like CFLAGS='-O3 -mcpu=power7' would be OK for
|
|
Steven Munroe |
46b3a36 |
functional verification tests of the POWER7 specific implementations
|
|
Steven Munroe |
46b3a36 |
of PVECLIB operations. But this would interfere with building the POWER8
|
|
Steven Munroe |
46b3a36 |
and POWER9 specific objects for the production version of libpvec.so.
|
|
Steven Munroe |
46b3a36 |
So builds for production level PVECLIB should never specify -mcpu= in
|
|
Steven Munroe |
46b3a36 |
CFLAGS.
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
On the other hand if the user does not specify any CFLAGS, autoconf will
|
|
Steven Munroe |
46b3a36 |
fill in a default value of '-O2 -g'. This is bad! PVECLIB needs the
|
|
Steven Munroe |
46b3a36 |
global common subexpression, loop, and vector cost model optimizations
|
|
Steven Munroe |
46b3a36 |
enabled by '-O3'. Also '-g' will generate huge debug tables for the vector
|
|
Steven Munroe |
46b3a36 |
int512 run-time and slow down the build. If you need to profile or
|
|
Steven Munroe |
46b3a36 |
debug with basic back-trace information, use '-g1'.
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
So unless you are involved in the functional testing of new PVECLIB
|
|
Steven Munroe |
46b3a36 |
operations, the safe options are:
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
CFLAGS='-m64 -g1 -O3'
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
The PVECLIB Makefile.am files include special macros for CPU specific
|
|
Steven Munroe |
46b3a36 |
run-time compiles. These macros exclude the user CFLAGS from those
|
|
Steven Munroe |
46b3a36 |
compile commands.
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
On the other hand, if the compiler default target does not support
|
|
Steven Munroe |
46b3a36 |
PowerISA vector facilities and an appropriate '-mcpu=' option is not
|
|
Steven Munroe |
46b3a36 |
supplied, the compile will fail. So the PVECLIB configure.ac includes a
|
|
Steven Munroe |
46b3a36 |
number of configure tests that detect this and provide appropriate
|
|
Steven Munroe |
46b3a36 |
compile targets.
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
The current PVECLIB implementation assumes the target supports both VMX
|
|
Steven Munroe |
46b3a36 |
(Altivec) and VSX facilities. So the minimum targets are set internally
|
|
Steven Munroe |
46b3a36 |
(PVECLIB_DEFAULT_CFLAG) to '-mcpu=power7' for BE and '-mcpu=power8' for LE.
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
The PVECLIB configure.ac also includes configure tests for related
|
|
Steven Munroe |
46b3a36 |
PowerISA facilities that can be leveraged for PVECLIB operations but
|
|
Steven Munroe |
46b3a36 |
are not core functions. This includes decimal floating-point and IEEE
|
|
Steven Munroe |
46b3a36 |
128-bit binary floating-point. These are both target and compiler
|
|
Steven Munroe |
46b3a36 |
support checks. The compiler checks are especially important for the
|
|
Steven Munroe |
46b3a36 |
Clang compiler as it is currently missing Decimalxx and Float128
|
|
Steven Munroe |
46b3a36 |
support. Some PVECLIB operations will be disabled in this case.
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
The default compiler is 'gcc'. The project can be configured to use
|
|
Steven Munroe |
46b3a36 |
the Clang / LLVM compiler using the CC=clang flag.
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
Run './configure', to verify the build tools and environment.
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
$ ./configure CFLAGS='-O3 -g1'
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
On a big endian / biarch systems it is wise to explicitely specify 64-bit.
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
$ ./configure CFLAGS='-m64 -O3 -g1' LDFLAGS='-m64'
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
To use the Advance Toolchain.
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
$ ./configure CC=/opt/at13.0/bin/powerpc64le-linux-gnu-gcc \
|
|
Steven Munroe |
46b3a36 |
AR=/opt/at13.0/bin/powerpc64le-linux-gnu-ar \
|
|
Steven Munroe |
46b3a36 |
RANLIB=/opt/at13.0/bin/powerpc64le-linux-gnu-ranlib \
|
|
Steven Munroe |
46b3a36 |
CFLAGS='-m64 -O3 -g1' LDFLAGS='-m64'
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
Then run 'make' to perform the basic compile tests and build the
|
|
Steven Munroe |
46b3a36 |
run-time libraries:
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
$ make
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
and, optionally run the functional verication tests:
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
$ make check
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
and, install the headers and librarys so your programs can use them:
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
$ make install
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
If the included autotools dont match the version installed on your
|
|
Steven Munroe |
46b3a36 |
system, perform these step:
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
$ aclocal
|
|
Steven Munroe |
46b3a36 |
$ autoconf
|
|
Steven Munroe |
46b3a36 |
$ automake
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
## Usage
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
Once pveclib is installed on the POWER or OpenPOWER system
|
|
Steven Munroe |
46b3a36 |
simply include the appropriate header. For example:
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
#include <pveclib/vec_int128_ppc.h>
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
The headers are organized by element type:
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
vec_common_ppc.h; Typedefs and helper macros
|
|
Steven Munroe |
46b3a36 |
vec_f128_ppc.h; Operations on vector _Float128 values
|
|
Steven Munroe |
46b3a36 |
vec_f64_ppc.h; Operations on vector double values
|
|
Steven Munroe |
46b3a36 |
vec_f32_ppc.h; Operations on vector float values
|
|
Steven Munroe |
46b3a36 |
vec_int512_ppc.h; Operations on Multi-quadword integer values
|
|
Steven Munroe |
46b3a36 |
vec_int128_ppc.h; Operations on vector __int128 values
|
|
Steven Munroe |
46b3a36 |
vec_int64_ppc.h; Operations on vector long int (64-bit) values
|
|
Steven Munroe |
46b3a36 |
vec_int32_ppc.h; Operations on vector int (32-bit) values
|
|
Steven Munroe |
46b3a36 |
vec_int16_ppc.h; Operations on vector short int (16-bit) values
|
|
Steven Munroe |
46b3a36 |
vec_char_ppc.h; Operations on vector char (8-bit) values
|
|
Steven Munroe |
46b3a36 |
vec_bcd_ppc.h; Operations on vectors of Binary Code Decimal and Zoned Decimal values
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
Full documentation is linked off of:
|
|
Steven Munroe |
46b3a36 |
|
|
Steven Munroe |
46b3a36 |
https://github.com/open-power-sdk/pveclib/wiki
|
|
Steven Munroe |
46b3a36 |
|