Skip to content
Snippets Groups Projects
Commit 7072e79f authored by Henrik Gramner's avatar Henrik Gramner
Browse files

x86: Make AVX2 SGR gatherless

Instead of using gathers we can calculate the value of
sgr_x_by_x[min(z, 255)] by doing 256 / (z + 1) in floating-point
with some clipping for z == 0 and z >= 255.

As the required precision of the division is fairly small it can be
performed using an approximate reciprocal, which is significantly
faster than a regular division.

Gather instructions are slow on all AMD CPU:s, and on most Intel
CPU:s ever since µcode updates were issued as a workaround for
the Gather Data Sampling side channel vulnerability.
parent 21d9f29d
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment