Feat: fg compute block avg sse4
- The goal of this MR is to optimize
fg_compute_block_avg
which is part of the functionpp_process_frame
. - To benchmark, we decided to call the function
pp_process_frame
100 times with 10 iterations for each call. The code for the benchmark can be found here
By the help of perf
and Hotspot software , we can see where the function pp_process_frame
spends the most time computing.
> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PWD/libovvc/.libs perf record ./examples/.libs/filmgraintest
> pp_process_frame : 6.932500 ms (average time / 100 calls of 10 iteration each)]
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 1,041 MB perf.data (27262 samples) ]
fg_compute_block_avg
accounts for 19.4% of computing time.
Let's benchmark fg_compute_block_avg_sse4
:
> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PWD/libovvc/.libs perf record ./examples/.libs/filmgraintest
> pp_process_frame : 6.181763 ms (average time / 100 calls of 10 iteration each)]
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0,915 MB perf.data (23944 samples) ]
With the new version, fg_compute_block_avg_sse4
now accounts for 6.18% of the computing time.
Edited by Ferdinand Mom