WIP: Perform cdef on ~64x64 chunks.
Merge request reports
Activity
Filter activity
@EwoutH Can you benchmark this?
Difficult to say, but it looks a tiny bit faster. But margin of error work.
decoder dav1d dav1d Build abb972a5 a3bc6e98 Build date 2019-03-28 2019-03-29 ISA AVX2 AVX2 Morocco MT 172,86 173,07 100,1% Morocco MT 172,82 173,65 100,5% Dua Lipa MT 144,27 144,77 100,3% Dua Lipa MT 143,28 143,96 100,5% Chimera MT 194,24 192,76 99,2% Chimera MT 192,78 195,28 101,3% @EwoutH Can you bench again?
Some comments (partially duplicate from comments on IRC):
- I don't think the changes to cdef_dir are strictly necessary. Given that we already have several SIMD implementations, it might be good to be cautious with making changes unless there is a specific performance impact. I don't think that is the case here.
- In terms of the filter changes, I like the approach. This needs some careful performance measuring on various types of content, in particular because we now impose cdef setup overhead on skip blocks, but we don't duplicate edge preparation anymore. We will only now how this balances against each other by measuring it. In addition to total decoding time, we should also look at specific cycle timings for CDEF alone for higher precision. For that last one, please do single-threaded runs.
- I think LR and CDEF can now use the same buffer for their pre-cdef/post-deblock pixels. This might improve numbers.
- If the whole SB is skip, can we skip the calls to setup? This might improve numbers.
Edited by Ronald S. Bultje52 db 1, 2, 1, 2, 0, -1, 0, 0 53 db 1, 2, 1, 2, 1, 2, 0, 0 54 db 1, 2, 0, 1, 1, 2, 0, 0 55 db 1, 2, 0, 0, 1, 2, 0, 0 56 db 0, 1, 0, -1, 1, 2, 0, 0 57 db 0, 0, 1, 2, 1, 2, 0, 0 58 db 0, -1, 1, 2, 0, 1, 0, 0 59 pw_128: times 2 dw 128 60 pw_2048: times 2 dw 2048 58 61 59 62 SECTION .text 60 63 64 %macro cdef_setup_fn 1 ; w 65 INIT_YMM avx2 66 ; TODO: correct number of registers used 67 cglobal cdef_setup_%1xh, 4, 15, 16, -4, \ changed this line in version 12 of the diff
- Resolved by Jean-Baptiste Kempf
@KyleSiefring Can you rebase?
added 5 commits
-
275f253a...8bbcd3f7 - 4 commits from branch
videolan:master
- 69cc839a - Perform cdef on ~64x64 chunks
-
275f253a...8bbcd3f7 - 4 commits from branch
decoder dav1d dav1d Build 8bbcd3f7 69cc839a Build date 2019-04-08 2019-04-08 ISA AVX2 AVX2 Morocco MT 172,41 171,30 99,4% Morocco MT 172,13 170,93 99,3% Dua Lipa MT 142,86 143,50 100,5% Dua Lipa MT 142,71 143,14 100,3% Chimera MT 192,76 194,05 100,7% Chimera MT 192,17 193,87 100,9% Edited by Ewout ter Hoeven