x86: Make AVX2 SGR gatherless (7072e79f) · Commits · TeTai Wu / dav1d

Commit 7072e79f authored 6 months ago by Henrik Gramner

x86: Make AVX2 SGR gatherless

Instead of using gathers we can calculate the value of
sgr_x_by_x[min(z, 255)] by doing 256 / (z + 1) in floating-point
with some clipping for z == 0 and z >= 255.

As the required precision of the division is fairly small it can be
performed using an approximate reciprocal, which is significantly
faster than a regular division.

Gather instructions are slow on all AMD CPU:s, and on most Intel
CPU:s ever since µcode updates were issued as a workaround for
the Gather Data Sampling side channel vulnerability.

parent 21d9f29d

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 581 additions and 331 deletions

Please register or to comment