Skip to content

AArch64: Optimize Armv8.0 Neon path of HBD HV 6-tap filters

Arpad Panyik requested to merge arpadpanyik-arm/dav1d:mc_hbd_hv_6tap_neon into master

The horizontal parts of Armv8.0 Neon 6-tap HV subpel filters can be further improved by some pointer arithmetic and saving some EXT instructions in their data rearrangement codes.

Relative runtime of micro benchmarks after this patch on some Cortex CPU cores:

HBD mct hv        X1     A78     A76     A72     A55
 regular  w8:  0.952x  0.989x  0.924x  0.973x  0.976x
 regular w16:  0.961x  0.993x  0.928x  0.952x  0.971x
 regular w32:  0.964x  0.996x  0.930x  0.973x  0.972x
 regular w64:  0.963x  0.997x  0.930x  0.969x  0.974x

Merge request reports

Loading