Skip to content

AVX2: Swap shuffles with zen 2/3 friendly equivalents

Kyle Siefring requested to merge KyleSiefring/dav1d:zen2_3_friendly_perm1 into master

On zen 2 and 3, vpermq is slower than vperm2i128. In some assembly, we use the former to swap lanes of a vector when we could be using the latter.

On current intel cpus, these instructions are equally expensive, so there should be no impact there.

Merge request reports