x86: add AVX512-IceLake implementation of HBD 64x32 DCT^2
inv_txfm_add_64x32_dct_dct_0_10bpc_c: 1760.6 ( 1.00x)
inv_txfm_add_64x32_dct_dct_0_10bpc_sse4: 271.1 ( 6.49x)
inv_txfm_add_64x32_dct_dct_0_10bpc_avx2: 121.3 (14.52x)
inv_txfm_add_64x32_dct_dct_0_10bpc_avx512icl: 116.3 (15.14x)
inv_txfm_add_64x32_dct_dct_1_10bpc_c: 66507.4 ( 1.00x)
inv_txfm_add_64x32_dct_dct_1_10bpc_sse4: 3712.4 (17.91x)
inv_txfm_add_64x32_dct_dct_1_10bpc_avx2: 1830.5 (36.33x)
inv_txfm_add_64x32_dct_dct_1_10bpc_avx512icl: 805.4 (82.58x)
inv_txfm_add_64x32_dct_dct_2_10bpc_c: 66491.6 ( 1.00x)
inv_txfm_add_64x32_dct_dct_2_10bpc_sse4: 5325.3 (12.49x)
inv_txfm_add_64x32_dct_dct_2_10bpc_avx2: 2578.5 (25.79x)
inv_txfm_add_64x32_dct_dct_2_10bpc_avx512icl: 1394.5 (47.68x)
inv_txfm_add_64x32_dct_dct_3_10bpc_c: 66490.2 ( 1.00x)
inv_txfm_add_64x32_dct_dct_3_10bpc_sse4: 6418.5 (10.36x)
inv_txfm_add_64x32_dct_dct_3_10bpc_avx2: 3305.6 (20.11x)
inv_txfm_add_64x32_dct_dct_3_10bpc_avx512icl: 2571.5 (25.86x)
inv_txfm_add_64x32_dct_dct_4_10bpc_c: 66508.6 ( 1.00x)
inv_txfm_add_64x32_dct_dct_4_10bpc_sse4: 8671.2 ( 7.67x)
inv_txfm_add_64x32_dct_dct_4_10bpc_avx2: 4054.2 (16.40x)
inv_txfm_add_64x32_dct_dct_4_10bpc_avx512icl: 2691.6 (24.71x)