x86: Make AVX2 SGR gatherless
Instead of using gathers we can calculate the value of sgr_x_by_x[min(z, 255)] by doing 256 / (z + 1) in floating-point with some clipping for z == 0 and z >= 255. As the required precision of the division is fairly small it can be performed using an approximate reciprocal, which is significantly faster than a regular division. Gather instructions are slow on all AMD CPU:s, and on most Intel CPU:s ever since µcode updates were issued as a workaround for the Gather Data Sampling side channel vulnerability.
Loading
Please register or sign in to comment