x86: Add 10bpc 8x32/32x8 itx AVX-512 (Ice Lake) asm
32x8
helps more than 8x32
, which is expected and in line with previous functions, as width-8 IDCTs are limited to YMM registers in the 2nd pass.
inv_txfm_add_8x32_dct_dct_0_10bpc_avx2: 36.5
inv_txfm_add_8x32_dct_dct_0_10bpc_avx512icl: 33.0
inv_txfm_add_8x32_dct_dct_1_10bpc_avx2: 163.2
inv_txfm_add_8x32_dct_dct_1_10bpc_avx512icl: 155.8
inv_txfm_add_8x32_dct_dct_2_10bpc_avx2: 199.5
inv_txfm_add_8x32_dct_dct_2_10bpc_avx512icl: 185.2
inv_txfm_add_8x32_dct_dct_4_10bpc_avx2: 285.5
inv_txfm_add_8x32_dct_dct_4_10bpc_avx512icl: 227.2
inv_txfm_add_32x8_dct_dct_0_10bpc_avx2: 21.9
inv_txfm_add_32x8_dct_dct_0_10bpc_avx512icl: 18.7
inv_txfm_add_32x8_dct_dct_1_10bpc_avx2: 348.5
inv_txfm_add_32x8_dct_dct_1_10bpc_avx512icl: 219.3
inv_txfm_add_32x8_dct_dct_2_10bpc_avx2: 348.5
inv_txfm_add_32x8_dct_dct_2_10bpc_avx512icl: 259.6
inv_txfm_add_32x8_dct_dct_4_10bpc_avx2: 348.5
inv_txfm_add_32x8_dct_dct_4_10bpc_avx512icl: 281.9
inv_txfm_add_8x32_identity_identity_2_10bpc_avx2: 28.4
inv_txfm_add_8x32_identity_identity_2_10bpc_avx512icl: 23.3
inv_txfm_add_8x32_identity_identity_4_10bpc_avx2: 53.2
inv_txfm_add_8x32_identity_identity_4_10bpc_avx512icl: 47.9
inv_txfm_add_32x8_identity_identity_2_10bpc_avx2: 30.4
inv_txfm_add_32x8_identity_identity_2_10bpc_avx512icl: 25.2
inv_txfm_add_32x8_identity_identity_4_10bpc_avx2: 51.9
inv_txfm_add_32x8_identity_identity_4_10bpc_avx512icl: 40.6