Skip to content

Draft: aarch64: Test implementing sgr_x_by_x[] with fdiv

Martin Storsjö requested to merge mstorsjo/dav1d:arm64-sgr-div into master

Test implementation done in sgr_box5_vert_neon; it may be possible to tweak things a little bit further (we use 32 bit vector elements throughout; we could narrow things down a bit first, like was done before, but we still need things in 32 bit quantities for the float steps). Overall, this doesn't seem to be beneficial compared to the current implementation that we have.

Before:           Cortex A53       A55       A72       A73       A76  Apple M3
sgr_5x5_8bpc_neon:  258319.2  254398.7  195143.7  199321.0  117959.0  250.5
After:
sgr_5x5_8bpc_neon:  286970.0  275679.4  214980.5  224968.7  129278.1  266.8
Edited by Martin Storsjö

Merge request reports

Loading