b224bad
# pveclib
b224bad
Steven Munroe 46b3a36
## Power Vector Library
Steven Munroe 46b3a36
Steven Munroe 46b3a36
Header files that contain useful functions leveraging the PowerISA
Steven Munroe 46b3a36
Vector Facilities: Vector Multimedia Extension (VMX AKA Altivec) and
Steven Munroe 46b3a36
Vector Scalar Extension (VSX). Larger functions like quadword multiply
Steven Munroe 46b3a36
and multiple quadword multiply and madd are large enough to justify
Steven Munroe 46b3a36
CPU specific and tuned run-time libraries. The user can choose to bind
Steven Munroe 46b3a36
to platform specific static archives or dynamic shared object libraries
Steven Munroe 46b3a36
which automatically (dynamic linking with IFUNC resolves) select the
Steven Munroe 46b3a36
correct implementation for the CPU it is running on.
Steven Munroe 46b3a36
Steven Munroe 46b3a36
The goal of this project to provide well crafted implementations
Steven Munroe 46b3a36
of useful vector and large number operations:
Steven Munroe 46b3a36
Steven Munroe 46b3a36
- Provide equivalent functions across versions of the PowerISA.
Steven Munroe 46b3a36
  For example the Vector Multiply-by-10 Unsigned Quadword
Steven Munroe 46b3a36
  operations introduced in PowerISA 3.0 (POWER9) can be implement in a
Steven Munroe 46b3a36
  few vector instructions on earlier PowerISA versions.
Steven Munroe 46b3a36
- Provide equivalent functions across versions of the compiler.
Steven Munroe 46b3a36
  For example builtins provided in later versions of the compiler
Steven Munroe 46b3a36
  can be implemented as inline functions with inline asm in earlier
Steven Munroe 46b3a36
  compiler versions.
Steven Munroe 46b3a36
- Provide higher order functions not provided directly by the PowerISA.
Steven Munroe 46b3a36
  For example vector SIMD implementation for ASCII `__isalpha`, etc.
Steven Munroe 46b3a36
  Another example full `__int128` implementations of Count Leading Zeros,
Steven Munroe 46b3a36
  Population Count, and Multiply.
Steven Munroe 46b3a36
- Provide optimized run-time libraries for quadword integer multiply
Steven Munroe 46b3a36
  and multi-quadword integer multiply and add.
Steven Munroe 46b3a36
Steven Munroe 46b3a36
## Build
Steven Munroe 46b3a36
Steven Munroe 46b3a36
PVECLIB now supports CPU tuned run-time libraries, both static archives
Steven Munroe 46b3a36
and dynamic (IFUNC selected) shared objects. This complicates the build
Steven Munroe 46b3a36
process as it now has to build the same source code, multiple times,
Steven Munroe 46b3a36
with different compile targets (-mcpu=). Another complication comes
Steven Munroe 46b3a36
from compiling for big endian systems where the compiler default target
Steven Munroe 46b3a36
may not include the vector facilities (VMX and VSX).
Steven Munroe 46b3a36
Steven Munroe 46b3a36
## Configure and option flags
Steven Munroe 46b3a36
Steven Munroe 46b3a36
The project can use configure test to define options like AM_CPPFLAGS
Steven Munroe 46b3a36
and AM_CFLAGS but the user command line options (CPPFLAGS and CFLAGS)
Steven Munroe 46b3a36
are always applied last and take precedent.
Steven Munroe 46b3a36
See: Automake "Flag Variables Ordering" 
Steven Munroe 46b3a36
https://www.gnu.org/software/automake/manual/html_node/Flag-Variables-Ordering.html
Steven Munroe 46b3a36
Steven Munroe 46b3a36
So a configure flag like CFLAGS='-O3 -mcpu=power7' would be OK for
Steven Munroe 46b3a36
functional verification tests of the POWER7 specific implementations
Steven Munroe 46b3a36
of PVECLIB operations. But this would interfere with building the POWER8
Steven Munroe 46b3a36
and POWER9 specific objects for the production version of libpvec.so.
Steven Munroe 46b3a36
So builds for production level PVECLIB should never specify -mcpu= in
Steven Munroe 46b3a36
CFLAGS.
Steven Munroe 46b3a36
Steven Munroe 46b3a36
On the other hand if the user does not specify any CFLAGS, autoconf will
Steven Munroe 46b3a36
fill in a default value of '-O2 -g'. This is bad! PVECLIB needs the
Steven Munroe 46b3a36
global common subexpression, loop, and vector cost model optimizations
Steven Munroe 46b3a36
enabled by '-O3'. Also '-g' will generate huge debug tables for the vector
Steven Munroe 46b3a36
int512 run-time and slow down the build. If you need to profile or
Steven Munroe 46b3a36
debug with basic back-trace information, use '-g1'.
Steven Munroe 46b3a36
Steven Munroe 46b3a36
So unless you are involved in the functional testing of new PVECLIB
Steven Munroe 46b3a36
operations, the safe options are:
Steven Munroe 46b3a36
Steven Munroe 46b3a36
CFLAGS='-m64 -g1 -O3'
Steven Munroe 46b3a36
Steven Munroe 46b3a36
The PVECLIB Makefile.am files include special macros for CPU specific
Steven Munroe 46b3a36
run-time compiles. These macros exclude the user CFLAGS from those
Steven Munroe 46b3a36
compile commands.
Steven Munroe 46b3a36
Steven Munroe 46b3a36
On the other hand, if the compiler default target does not support
Steven Munroe 46b3a36
PowerISA vector facilities and an appropriate '-mcpu=' option is not
Steven Munroe 46b3a36
supplied, the compile will fail. So the PVECLIB configure.ac includes a
Steven Munroe 46b3a36
number of configure tests that detect this and provide appropriate
Steven Munroe 46b3a36
compile targets.
Steven Munroe 46b3a36
Steven Munroe 46b3a36
The current PVECLIB implementation assumes the target supports both VMX
Steven Munroe 46b3a36
(Altivec) and VSX facilities. So the minimum targets are set internally
Steven Munroe 46b3a36
(PVECLIB_DEFAULT_CFLAG) to '-mcpu=power7' for BE and '-mcpu=power8' for LE.
Steven Munroe 46b3a36
Steven Munroe 46b3a36
The  PVECLIB configure.ac also includes configure tests for related
Steven Munroe 46b3a36
PowerISA facilities that can be leveraged for PVECLIB operations but
Steven Munroe 46b3a36
are not core functions. This includes decimal floating-point and IEEE
Steven Munroe 46b3a36
128-bit binary floating-point. These are both target and compiler
Steven Munroe 46b3a36
support checks. The compiler checks are especially important for the
Steven Munroe 46b3a36
Clang compiler as it is currently missing Decimalxx and Float128
Steven Munroe 46b3a36
support. Some PVECLIB operations will be disabled in this case.
Steven Munroe 46b3a36
Steven Munroe 46b3a36
The default compiler is 'gcc'. The project can be configured to use
Steven Munroe 46b3a36
the Clang / LLVM compiler using the CC=clang flag.
Steven Munroe 46b3a36
Steven Munroe 46b3a36
Run './configure', to verify the build tools and environment.
Steven Munroe 46b3a36
Steven Munroe 46b3a36
    $ ./configure CFLAGS='-O3 -g1'
Steven Munroe 46b3a36
Steven Munroe 46b3a36
On a big endian / biarch systems it is wise to explicitely specify 64-bit.
Steven Munroe 46b3a36
Steven Munroe 46b3a36
    $ ./configure CFLAGS='-m64 -O3 -g1' LDFLAGS='-m64'
Steven Munroe 46b3a36
Steven Munroe 46b3a36
To use the Advance Toolchain.
Steven Munroe 46b3a36
Steven Munroe 46b3a36
    $ ./configure  CC=/opt/at13.0/bin/powerpc64le-linux-gnu-gcc \
Steven Munroe 46b3a36
	AR=/opt/at13.0/bin/powerpc64le-linux-gnu-ar \
Steven Munroe 46b3a36
	RANLIB=/opt/at13.0/bin/powerpc64le-linux-gnu-ranlib \
Steven Munroe 46b3a36
	CFLAGS='-m64 -O3 -g1' LDFLAGS='-m64'
Steven Munroe 46b3a36
Steven Munroe 46b3a36
Then run 'make' to perform the basic compile tests and build the
Steven Munroe 46b3a36
run-time libraries:
Steven Munroe 46b3a36
Steven Munroe 46b3a36
    $ make
Steven Munroe 46b3a36
Steven Munroe 46b3a36
and, optionally run the functional verication tests:
Steven Munroe 46b3a36
Steven Munroe 46b3a36
    $ make check
Steven Munroe 46b3a36
    
Steven Munroe 46b3a36
and, install the headers and librarys so your programs can use them:
Steven Munroe 46b3a36
Steven Munroe 46b3a36
    $ make install
Steven Munroe 46b3a36
Steven Munroe 46b3a36
If the included autotools dont match the version installed on your
Steven Munroe 46b3a36
system, perform these step:
Steven Munroe 46b3a36
Steven Munroe 46b3a36
    $ aclocal
Steven Munroe 46b3a36
    $ autoconf
Steven Munroe 46b3a36
    $ automake
Steven Munroe 46b3a36
Steven Munroe 46b3a36
## Usage
Steven Munroe 46b3a36
Steven Munroe 46b3a36
Once pveclib is installed on the POWER or OpenPOWER system
Steven Munroe 46b3a36
simply include the appropriate header. For example:
Steven Munroe 46b3a36
Steven Munroe 46b3a36
    #include <pveclib/vec_int128_ppc.h>
Steven Munroe 46b3a36
Steven Munroe 46b3a36
The headers are organized by element type:
Steven Munroe 46b3a36
Steven Munroe 46b3a36
    vec_common_ppc.h; Typedefs and helper macros
Steven Munroe 46b3a36
    vec_f128_ppc.h; Operations on vector _Float128 values
Steven Munroe 46b3a36
    vec_f64_ppc.h; Operations on vector double values
Steven Munroe 46b3a36
    vec_f32_ppc.h; Operations on vector float values
Steven Munroe 46b3a36
    vec_int512_ppc.h; Operations on Multi-quadword integer values
Steven Munroe 46b3a36
    vec_int128_ppc.h; Operations on vector __int128 values
Steven Munroe 46b3a36
    vec_int64_ppc.h; Operations on vector long int (64-bit) values
Steven Munroe 46b3a36
    vec_int32_ppc.h; Operations on vector int (32-bit) values
Steven Munroe 46b3a36
    vec_int16_ppc.h; Operations on vector short int (16-bit) values
Steven Munroe 46b3a36
    vec_char_ppc.h; Operations on vector char (8-bit) values
Steven Munroe 46b3a36
    vec_bcd_ppc.h; Operations on vectors of Binary Code Decimal and Zoned Decimal values
Steven Munroe 46b3a36
Steven Munroe 46b3a36
Steven Munroe 46b3a36
Full documentation is linked off of:
Steven Munroe 46b3a36
Steven Munroe 46b3a36
    https://github.com/open-power-sdk/pveclib/wiki
Steven Munroe 46b3a36