Update `subWxH_dct` kernels for AARCH64 NEON. This will also make the SVE implementation redundant.
BEFORE => AFTER = IMPROVEMENT
--------------------------------------------------------------------------
sub4x4_dct_c: 67 => sub4x4_dct_c: 66 =
sub4x4_dct_neon: 51 => sub4x4_dct_neon: 13 = 51/13 = 3.92x
sub4x4_dct_sve: 19 => sub4x4_dct_sve: 19 = now redundant
sub8x8_dct_c: 321 => sub8x8_dct_c: 317 =
sub8x8_dct_neon: 69 => sub8x8_dct_neon: 63 = 69/63 = 1.10x
sub8x8_dct8_c: 540 => sub8x8_dct8_c: 534 =
sub8x8_dct8_neon: 110 => sub8x8_dct8_neon: 105 = 110/105 = 1.05x
sub8x8_dct_dc_c: 130 => sub8x8_dct_dc_c: 130 =
sub8x8_dct_dc_neon: 22 => sub8x8_dct_dc_neon: 18 = 22/18 = 1.22x
sub8x16_dct_dc_c: 283 => sub8x16_dct_dc_c: 280 =
sub8x16_dct_dc_neon: 51 => sub8x16_dct_dc_neon: 48 = 51/48 = 1.06x
sub16x16_dct_c: 1352 => sub16x16_dct_c: 1345 =
sub16x16_dct_neon: 318 => sub16x16_dct_neon: 297 = 318/297 = 1.07x
sub16x16_dct8_c: 2273 => sub16x16_dct8_c: 2279 =
sub16x16_dct8_neon: 499 => sub16x16_dct8_neon: 478 = 499/478 = 1.04x
Merge request reports
Activity
Filter activity
Please register or sign in to reply