README.md

libdivide

This package contains a header-only C/C++ library for optimizing integer division. Integer division is one of the slowest instructions on most CPUs, e.g. on current x64 CPUs a 64-bit integer division has a latency of up to 90 clock cycles whereas a multiplication has a latency of only 3 clock cycles. libdivide allows you to replace expensive integer division instructions by a sequence of shift, add and multiply instructions that will calculate the integer division much faster.

On current CPUs you can get a speedup of up to 10x for 64-bit integer division and a speedup of up to to 5x for 32-bit integer division when using libdivide. libdivide also supports SSE2, AVX2 and AVX512 vector division which provides an even larger speedup.