Skip to content
Snippets Groups Projects

AArch64: Add Neon implementation of load_tmvs

Merged Arpad Panyik requested to merge arpadpanyik-arm/dav1d:load_tmvs_neon into master

This patch adds a vectorised variant of the mv_projection calculation and a faster initialisation of motion vectors for load_tmvs_neon.

checkasm uplifts after this patch on some Neoverse and Cortex CPU cores compared to the C reference compiled with GCC-13 and Clang-19:

                     GCC    Clang
 AWS Graviton 4:   1.62x    1.59x
 Cortex-X4:        1.45x    1.46x
 Cortex-X3:        1.68x    1.69x
 Cortex-X1:        1.55x    1.52x
 Cortex-A720:      1.54x    1.57x
 Cortex-A715:      1.47x    1.55x
 Cortex-A78:       1.21x    1.18x
 Cortex-A76:       1.38x    1.37x
 Cortex-A72:       1.08x    1.11x
 Cortex-A520:      0.97x    1.18x
 Cortex-A510:      0.99x    1.14x
 Cortex-A55:       1.16x    1.23x

This patch increases the .text by ~660 bytes, but smaller than the reference implementation by about 0.5 KiB.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading