ADS kernel implementation for SVE
This is the first of in series of changes we developed earlier this year while optimizing x264 for NVIDIA Grace.
Contains:
- Implementations of ADS kernels for SVE (any vector size)
- Implementations of ADS kernels for SVE2 (any vector size)
checkasm8
x264: using random seed 3804690644
x264: ARMv8
- intra pred : [OK]
- coeff_last : [OK]
- coeff_level_run : [OK]
x264: NEON
- pixel sad : [OK]
- pixel sad_aligned : [OK]
- pixel ssd : [OK]
- pixel satd : [OK]
- pixel sa8d : [OK]
- pixel sa8d_satd : [OK]
- pixel sad_x3 : [OK]
- pixel sad_x4 : [OK]
- pixel var : [OK]
- pixel var2 : [OK]
- pixel hadamard_ac : [OK]
- pixel vsad : [OK]
- pixel asd : [OK]
- intra satd_x3 : [OK]
- intra sad_x3 : [OK]
- ssd_nv12 : [OK]
- ssim : [OK]
- sub_dct4 : [OK]
- sub_dct8 : [OK]
- add_idct4 : [OK]
- add_idct8 : [OK]
- dct4x4dc : [OK]
- idct4x4dc : [OK]
- zigzag_interleave : [OK]
- zigzag_frame : [OK]
- zigzag_field : [OK]
- mc luma : [OK]
- mc chroma : [OK]
- mc wpredb : [OK]
- mc weight : [OK]
- mc offsetadd : [OK]
- mc offsetsub : [OK]
- store_interleave : [OK]
- plane_copy : [OK]
- hpel filter : [OK]
- lowres init : [OK]
- integral init : [OK]
- mbtree : [OK]
- memcpy aligned : [OK]
- memzero aligned : [OK]
- intra pred : [OK]
- deblock : [OK]
- quant : [OK]
- dequant : [OK]
- denoise dct : [OK]
- decimate_score : [OK]
- coeff_last : [OK]
- coeff_level_run : [OK]
- nal escape: [OK]
x264: SVE (128 bits)
- pixel ssd : [OK]
- pixel sa8d : [OK]
- pixel var : [OK]
- pixel hadamard_ac : [OK]
- esa ads: [OK]
- sub_dct4 : [OK]
- zigzag_interleave : [OK]
- mc wpredb : [OK]
- deblock : [OK]
x264: SVE2 (128 bits)
- add_idct4 : [OK]
x264: All tests passed Yeah :)
checkasm10
I have no name!@a64c740923aa:/x264$ ./checkasm10
x264: using random seed 3721534389
x264: ARMv8
x264: NEON
- pixel sad : [OK]
- pixel ssd : [OK]
- pixel satd : [OK]
- pixel sa8d : [OK]
- pixel sa8d_satd : [OK]
- pixel sad_x3 : [OK]
- pixel sad_x4 : [OK]
- pixel var : [OK]
- pixel var2 : [OK]
- pixel hadamard_ac : [OK]
- pixel vsad : [OK]
- pixel asd : [OK]
- ssd_nv12 : [OK]
- ssim : [OK]
- mc luma : [OK]
- mc chroma : [OK]
- mc wpredb : [OK]
- mc weight : [OK]
- mc offsetadd : [OK]
- mc offsetsub : [OK]
- store_interleave : [OK]
- plane_copy : [OK]
- hpel filter : [OK]
- lowres init : [OK]
- integral init : [OK]
- mbtree : [OK]
- memcpy aligned : [OK]
- memzero aligned : [OK]
- quant : [OK]
- dequant : [OK]
- denoise dct : [OK]
- decimate_score : [OK]
- coeff_last : [OK]
- coeff_level_run : [OK]
- nal escape: [OK]
x264: SVE (128 bits)
- pixel ssd : [OK]
- esa ads: [OK]
x264: SVE2 (128 bits)
x264: All tests passed Yeah :)
Edited by Matthias Langer
Merge request reports
Activity
Filter activity
Benchmarks: SVE2 implementationt
8-bit
esa_ads_8x8_c: 183 esa_ads_8x8_sve: 96 esa_ads_8x16_c: 243 esa_ads_8x16_sve: 107 esa_ads_16x8_c: 241 esa_ads_16x8_sve: 107 esa_ads_16x16_c: 352 esa_ads_16x16_sve: 118
10-bit
esa_ads_8x8_c: 146 esa_ads_8x8_sve: 95 esa_ads_8x16_c: 212 esa_ads_8x16_sve: 105 esa_ads_16x8_c: 212 esa_ads_16x8_sve: 105 esa_ads_16x16_c: 352 esa_ads_16x16_sve: 117
Benchmarks: SVE implementation
8-bit
esa_ads_8x8_c: 180 esa_ads_8x8_sve: 99 esa_ads_8x16_c: 251 esa_ads_8x16_sve: 114 esa_ads_16x8_c: 247 esa_ads_16x8_sve: 113 esa_ads_16x16_c: 363 esa_ads_16x16_sve: 127
10-bit
esa_ads_8x8_c: 149 esa_ads_8x8_sve: 101 esa_ads_8x16_c: 211 esa_ads_8x16_sve: 113 esa_ads_16x8_c: 213 esa_ads_16x8_sve: 113 esa_ads_16x16_c: 349 esa_ads_16x16_sve: 127
Edited by Matthias Langer- Resolved by Matthias Langer
- Resolved by Matthias Langer
added 1 commit
- e8cb6757 - Implement ESA ADS kernels using SVE and SVE2 assembly
added 1 commit
- d8022829 - Remove last remnant of delted C implementation.
8 bits per pixel
make -j checkasm ./checkasm8 --bench=esa_ ... x264: All tests passed Yeah :) nop: 197 esa_ads_8x8_c: 182 esa_ads_8x8_sve: 74 esa_ads_8x8_sve2: 77 esa_ads_8x16_c: 210 esa_ads_8x16_sve: 85 esa_ads_8x16_sve2: 78 esa_ads_16x8_c: 211 esa_ads_16x8_sve: 84 esa_ads_16x8_sve2: 78 esa_ads_16x16_c: 361 esa_ads_16x16_sve: 111 esa_ads_16x16_sve2: 93
10 bits per pixel:
./checkasm10 --bench=esa_ x264: All tests passed Yeah :) nop: 196 esa_ads_8x8_c: 170 esa_ads_8x8_sve: 73 esa_ads_8x8_sve2: 76 esa_ads_8x16_c: 209 esa_ads_8x16_sve: 84 esa_ads_8x16_sve2: 77 esa_ads_16x8_c: 209 esa_ads_16x8_sve: 84 esa_ads_16x8_sve2: 77 esa_ads_16x16_c: 362 esa_ads_16x16_sve: 111 esa_ads_16x16_sve2: 92
- Resolved by Matthias Langer
- Resolved by Martin Storsjö
- Resolved by Matthias Langer
- Resolved by Matthias Langer
- Resolved by Matthias Langer
- Resolved by Matthias Langer
- Resolved by Matthias Langer
- Resolved by Matthias Langer
I had a look at the actual assembly implementation here now as well - it looks mostly good, thanks! Just a couple comments.
If you want to, you could also add a bit more comments in the implementation of the
if( ads < thresh ) mvs[nmv++] = i;
part. It's understandable when stopping and taking the time to follow it closely, but a few comments could speed up understanding of it for future readers.added 24 commits
-
d8022829...fe9e4a7f - 23 commits from branch
videolan:master
- 0d18becf - Merge branch 'master' into ads_for_sve
-
d8022829...fe9e4a7f - 23 commits from branch
added 1 commit
- ce8a529e - Separate and optimize ADS kernel implementations.
Please register or sign in to reply