clblast

CLBlast is a modern, lightweight, performant and tunable OpenCL BLAS library written in C++11. It is designed to leverage the full performance potential of a wide variety of OpenCL devices from different vendors, including desktop and laptop GPUs, embedded GPUs, and other accelerators. CLBlast implements BLAS routines: basic linear algebra subprograms operating on vectors and matrices. See the CLBlast website for performance reports on various devices as well as the latest CLBlast news.

The library is not tuned for all possible OpenCL devices: if out-of-the-box performance is poor, please run the tuners first.