mc: Reduce stack use in {put,prep}_scaled_{bilin,8tap} (b129d9f2) · Commits · VideoLAN / dav1d · GitLab

Snippets Groups Projects

Commit b129d9f2 authored 4 months ago by Martin Storsjö

mc: Reduce stack use in {put,prep}_scaled_{bilin,8tap}

For the bilin cases, this seems to make things marginally faster
(measured on x86_64; 7-25% faster with compiler autovectorization).
For 8tap, it doesn't make much of a difference at all.

Before:                                      GCC   Clang
mc_scaled_8tap_regular_w128_8bpc_c:     115155.5   98549.3
mc_scaled_8tap_regular_w128_8bpc_ssse3:  17936.0   18411.1
mc_scaled_bilinear_w128_8bpc_c:          40290.0   51812.9
mc_scaled_bilinear_w128_8bpc_ssse3:      18243.9   18177.0
After:
mc_scaled_8tap_regular_w128_8bpc_c:     116304.3   99453.2
mc_scaled_8tap_regular_w128_8bpc_ssse3:  18387.0   18077.3
mc_scaled_bilinear_w128_8bpc_c:          37381.4   41145.0
mc_scaled_bilinear_w128_8bpc_ssse3:      18423.8   18031.6

(Benchmarked with the seed 0; the total runtime for the scaled
benchmarks are significantly affected by the random seed.)

This reduces the stack usage of these functions from around 65 KB
each, to less than 1 KB for bilin, and around 2 KB for 8tap.

With this in place, the required stack space for dav1d should
be mostly identical across configurations; on x86_64 (both with
and without assembly), it can run with 62 KB of stack, and
on arm and aarch64, it can run with 58 KB of stack.

parent cd5bfa12

No related branches found

No related tags found

Pipeline #551025 passed with stages

in 35 minutes and 38 seconds

Hide whitespace changes

Inline Side-by-side

Showing with 134 additions and 93 deletions

Martin Storsjö @mstorsjo
mentioned in issue #442 (closed)
· 3 months ago

mentioned in issue #442 (closed)

mentioned in issue #442

Toggle commit list

Please register or to comment

VideoLAN code repository instance