Skip to content

arm64: itx: Do the final calculation of adst8/adst16 in 32 bit to avoid too narrow clipping

Martin Storsjö requested to merge mstorsjo/dav1d:arm64-itx-32bit into master

See issue #295 (closed), this fixes it for arm64.

Before:                                 Cortex A53      A72      A73
inv_txfm_add_8x8_adst_adst_1_8bpc_neon:      332.0    248.0    247.1
inv_txfm_add_16x16_adst_adst_2_8bpc_neon:   1676.8   1197.0   1186.8
After:
inv_txfm_add_8x8_adst_adst_1_8bpc_neon:      358.0    269.0    276.2
inv_txfm_add_16x16_adst_adst_2_8bpc_neon:   1785.2   1347.8   1312.1

This would probably only be needed for adst in the first pass, but the additional code complexity from splitting the implementations (as we currently don't have transforms differentiated between first and second pass) isn't necessarily worth it (the speedup over C code is still 8-10x).

Also notifying @gramner

Edited by Martin Storsjö

Merge request reports