Skip to content
Snippets Groups Projects
  1. May 02, 2023
  2. Apr 28, 2023
  3. Apr 27, 2023
  4. Apr 25, 2023
  5. Apr 23, 2023
  6. Apr 20, 2023
    • Ronald S. Bultje's avatar
      x86: add AVX512-IceLake implementation of HBD 64x64 DCT^2 · ad0f3e6a
      Ronald S. Bultje authored
      Also implement "fast3" path for pass2.dct64 (where 1/8th of the
      coefficients are non-zero), which affects 32x64 as well as 64x64.
      
      Before:
      inv_txfm_add_32x64_dct_dct_1_10bpc_c:          51008.6 ( 1.00x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_sse4:        3351.9 (15.22x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_avx2:        1419.5 (35.93x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_avx512icl:    744.8 (68.49x)
      
      After:
      inv_txfm_add_32x64_dct_dct_1_10bpc_c:          51019.5 ( 1.00x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_sse4:        3276.1 (15.57x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_avx2:        1420.7 (35.91x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_avx512icl:    668.3 (76.34x)
      
      (Not sure why the SSE4 speed changed.)
      
      And speed for 64x64:
      inv_txfm_add_64x64_dct_dct_0_10bpc_c:           3506.9 ( 1.00x)
      inv_txfm_add_64x64_dct_dct_0_10bpc_sse4:         535.6 ( 6.55x)
      inv_txfm_add_64x64_dct_dct_0_10bpc_avx2:         223.5 (15.69x)
      inv_txfm_add_64x64_dct_dct_0_10bpc_avx512icl:    252.4 (13.89x)
      inv_txfm_add_64x64_dct_dct_1_10bpc_c:         108353.7 ( 1.00x)
      inv_txfm_add_64x64_dct_dct_1_10bpc_sse4:        6551.9 (16.54x)
      inv_txfm_add_64x64_dct_dct_1_10bpc_avx2:        2876.8 (37.66x)
      inv_txfm_add_64x64_dct_dct_1_10bpc_avx512icl:   1310.1 (82.70x)
      inv_txfm_add_64x64_dct_dct_2_10bpc_c:         108347.6 ( 1.00x)
      inv_txfm_add_64x64_dct_dct_2_10bpc_sse4:        7985.4 (13.57x)
      inv_txfm_add_64x64_dct_dct_2_10bpc_avx2:        3561.8 (30.42x)
      inv_txfm_add_64x64_dct_dct_2_10bpc_avx512icl:   1962.6 (55.20x)
      inv_txfm_add_64x64_dct_dct_3_10bpc_c:         108455.5 ( 1.00x)
      inv_txfm_add_64x64_dct_dct_3_10bpc_sse4:        9709.0 (11.17x)
      inv_txfm_add_64x64_dct_dct_3_10bpc_avx2:        4220.5 (25.70x)
      inv_txfm_add_64x64_dct_dct_3_10bpc_avx512icl:   2991.1 (36.26x)
      inv_txfm_add_64x64_dct_dct_4_10bpc_c:         108349.9 ( 1.00x)
      inv_txfm_add_64x64_dct_dct_4_10bpc_sse4:       11048.0 ( 9.81x)
      inv_txfm_add_64x64_dct_dct_4_10bpc_avx2:        4898.1 (22.12x)
      inv_txfm_add_64x64_dct_dct_4_10bpc_avx512icl:   3108.1 (34.86x)
      ad0f3e6a
  7. Apr 18, 2023
    • James Almer's avatar
      picture: allow storing an array of Dav1dITUTT35 entries · feeeccb6
      James Almer authored
      Nothing in the spec prevents a Temporal Unit from having more than one Metadata
      OBU of type ITU-T T.35, so export them as an array instead of only exporting
      the last one we parse.
      This is backwards compatible with the previous implementation, as users unaware
      of this change can ignore the n_itut_t35 field and still access the first (or
      only) entry in the array as they have been doing until now.
      feeeccb6
    • Ronald S. Bultje's avatar
      x86: add AVX512-IceLake implementation of HBD 64x32 DCT^2 · 68d7a76d
      Ronald S. Bultje authored
      inv_txfm_add_64x32_dct_dct_0_10bpc_c:           1760.6 ( 1.00x)
      inv_txfm_add_64x32_dct_dct_0_10bpc_sse4:         271.1 ( 6.49x)
      inv_txfm_add_64x32_dct_dct_0_10bpc_avx2:         121.3 (14.52x)
      inv_txfm_add_64x32_dct_dct_0_10bpc_avx512icl:    116.3 (15.14x)
      inv_txfm_add_64x32_dct_dct_1_10bpc_c:          66507.4 ( 1.00x)
      inv_txfm_add_64x32_dct_dct_1_10bpc_sse4:        3712.4 (17.91x)
      inv_txfm_add_64x32_dct_dct_1_10bpc_avx2:        1830.5 (36.33x)
      inv_txfm_add_64x32_dct_dct_1_10bpc_avx512icl:    805.4 (82.58x)
      inv_txfm_add_64x32_dct_dct_2_10bpc_c:          66491.6 ( 1.00x)
      inv_txfm_add_64x32_dct_dct_2_10bpc_sse4:        5325.3 (12.49x)
      inv_txfm_add_64x32_dct_dct_2_10bpc_avx2:        2578.5 (25.79x)
      inv_txfm_add_64x32_dct_dct_2_10bpc_avx512icl:   1394.5 (47.68x)
      inv_txfm_add_64x32_dct_dct_3_10bpc_c:          66490.2 ( 1.00x)
      inv_txfm_add_64x32_dct_dct_3_10bpc_sse4:        6418.5 (10.36x)
      inv_txfm_add_64x32_dct_dct_3_10bpc_avx2:        3305.6 (20.11x)
      inv_txfm_add_64x32_dct_dct_3_10bpc_avx512icl:   2571.5 (25.86x)
      inv_txfm_add_64x32_dct_dct_4_10bpc_c:          66508.6 ( 1.00x)
      inv_txfm_add_64x32_dct_dct_4_10bpc_sse4:        8671.2 ( 7.67x)
      inv_txfm_add_64x32_dct_dct_4_10bpc_avx2:        4054.2 (16.40x)
      inv_txfm_add_64x32_dct_dct_4_10bpc_avx512icl:   2691.6 (24.71x)
      68d7a76d
  8. Apr 13, 2023
    • Ronald S. Bultje's avatar
      x86: add AVX512-IceLake implementation of HBD 64x16 DCT^2 · 0b809a92
      Ronald S. Bultje authored
      inv_txfm_add_64x16_dct_dct_0_10bpc_c:            892.0 ( 1.00x)
      inv_txfm_add_64x16_dct_dct_0_10bpc_sse4:         131.5 ( 6.78x)
      inv_txfm_add_64x16_dct_dct_0_10bpc_avx2:          63.4 (14.07x)
      inv_txfm_add_64x16_dct_dct_0_10bpc_avx512icl:     56.8 (15.71x)
      inv_txfm_add_64x16_dct_dct_1_10bpc_c:          29253.7 ( 1.00x)
      inv_txfm_add_64x16_dct_dct_1_10bpc_sse4:        1639.7 (17.84x)
      inv_txfm_add_64x16_dct_dct_1_10bpc_avx2:        1106.8 (26.43x)
      inv_txfm_add_64x16_dct_dct_1_10bpc_avx512icl:    532.9 (54.89x)
      inv_txfm_add_64x16_dct_dct_2_10bpc_c:          29249.8 ( 1.00x)
      inv_txfm_add_64x16_dct_dct_2_10bpc_sse4:        3065.6 ( 9.54x)
      inv_txfm_add_64x16_dct_dct_2_10bpc_avx2:        1791.0 (16.33x)
      inv_txfm_add_64x16_dct_dct_2_10bpc_avx512icl:   1108.0 (26.40x)
      inv_txfm_add_64x16_dct_dct_3_10bpc_c:          29269.1 ( 1.00x)
      inv_txfm_add_64x16_dct_dct_3_10bpc_sse4:        3738.2 ( 7.83x)
      inv_txfm_add_64x16_dct_dct_3_10bpc_avx2:        1790.9 (16.34x)
      inv_txfm_add_64x16_dct_dct_3_10bpc_avx512icl:   1203.8 (24.31x)
      inv_txfm_add_64x16_dct_dct_4_10bpc_c:          29337.7 ( 1.00x)
      inv_txfm_add_64x16_dct_dct_4_10bpc_sse4:        3749.7 ( 7.82x)
      inv_txfm_add_64x16_dct_dct_4_10bpc_avx2:        1791.0 (16.38x)
      inv_txfm_add_64x16_dct_dct_4_10bpc_avx512icl:   1203.8 (24.37x)
      0b809a92
  9. Apr 12, 2023
    • Ronald S. Bultje's avatar
      x86: add AVX512-IceLake implementation of HBD 32x64 DCT^2 · 6ae57667
      Ronald S. Bultje authored
      inv_txfm_add_32x64_dct_dct_0_10bpc_c:           1783.5 ( 1.00x)
      inv_txfm_add_32x64_dct_dct_0_10bpc_sse4:         243.3 ( 7.33x)
      inv_txfm_add_32x64_dct_dct_0_10bpc_avx2:         119.1 (14.97x)
      inv_txfm_add_32x64_dct_dct_0_10bpc_avx512icl:    142.6 (12.50x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_c:          50422.5 ( 1.00x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_sse4:        2880.5 (17.50x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_avx2:        1423.4 (35.43x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_avx512icl:    741.6 (67.99x)
      inv_txfm_add_32x64_dct_dct_2_10bpc_c:          50433.6 ( 1.00x)
      inv_txfm_add_32x64_dct_dct_2_10bpc_sse4:        4015.1 (12.56x)
      inv_txfm_add_32x64_dct_dct_2_10bpc_avx2:        1767.7 (28.53x)
      inv_txfm_add_32x64_dct_dct_2_10bpc_avx512icl:    960.8 (52.49x)
      inv_txfm_add_32x64_dct_dct_3_10bpc_c:          50422.2 ( 1.00x)
      inv_txfm_add_32x64_dct_dct_3_10bpc_sse4:        4500.5 (11.20x)
      inv_txfm_add_32x64_dct_dct_3_10bpc_avx2:        2111.7 (23.88x)
      inv_txfm_add_32x64_dct_dct_3_10bpc_avx512icl:   1777.1 (28.37x)
      inv_txfm_add_32x64_dct_dct_4_10bpc_c:          50444.2 ( 1.00x)
      inv_txfm_add_32x64_dct_dct_4_10bpc_sse4:        5592.8 ( 9.02x)
      inv_txfm_add_32x64_dct_dct_4_10bpc_avx2:        2458.1 (20.52x)
      inv_txfm_add_32x64_dct_dct_4_10bpc_avx512icl:   1867.2 (27.02x)
      6ae57667
    • James Almer's avatar
      ed997f5f
  10. Apr 11, 2023
  11. Apr 08, 2023
    • James Almer's avatar
      picture: move Dav1dRef fields out of the public struct · 76e71ddf
      James Almer authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      
      Dav1dRef is an opaque struct to the API user, and they have no business
      with these fields at all, so move them to the internal picture struct.
      
      Signed-off-by: default avatarJames Almer <jamrial@gmail.com>
      76e71ddf
    • James Almer's avatar
      picture: allow storing an array of Dav1dITUTT35 entries · 62f8b887
      James Almer authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      
      Nothing in the spec prevents a Temporal Unit from having more than one Metadata
      OBU of type ITU-T T.35, so export them as an array instead of only exporting
      the last one we parse.
      This is backwards compatible with the previous implementation, as users unaware
      of this change can ignore the n_itut_t35 field and still access the first (or
      only) entry in the array as they have been doing until now.
      
      Signed-off-by: default avatarJames Almer <jamrial@gmail.com>
      62f8b887
    • Ronald S. Bultje's avatar
      x86: add AVX512-IceLake implementation of HBD 16x64 DCT^2 · 5aa3b38f
      Ronald S. Bultje authored
      nop:                                              39.4
      inv_txfm_add_16x64_dct_dct_0_10bpc_c:           2208.0 ( 1.00x)
      inv_txfm_add_16x64_dct_dct_0_10bpc_sse4:         133.5 (16.54x)
      inv_txfm_add_16x64_dct_dct_0_10bpc_avx2:          71.3 (30.98x)
      inv_txfm_add_16x64_dct_dct_0_10bpc_avx512icl:    102.0 (21.66x)
      inv_txfm_add_16x64_dct_dct_1_10bpc_c:          25757.0 ( 1.00x)
      inv_txfm_add_16x64_dct_dct_1_10bpc_sse4:        1366.1 (18.85x)
      inv_txfm_add_16x64_dct_dct_1_10bpc_avx2:         657.6 (39.17x)
      inv_txfm_add_16x64_dct_dct_1_10bpc_avx512icl:    378.9 (67.98x)
      inv_txfm_add_16x64_dct_dct_2_10bpc_c:          25771.0 ( 1.00x)
      inv_txfm_add_16x64_dct_dct_2_10bpc_sse4:        1739.7 (14.81x)
      inv_txfm_add_16x64_dct_dct_2_10bpc_avx2:         772.1 (33.38x)
      inv_txfm_add_16x64_dct_dct_2_10bpc_avx512icl:    469.3 (54.92x)
      inv_txfm_add_16x64_dct_dct_3_10bpc_c:          25775.7 ( 1.00x)
      inv_txfm_add_16x64_dct_dct_3_10bpc_sse4:        1968.1 (13.10x)
      inv_txfm_add_16x64_dct_dct_3_10bpc_avx2:         886.5 (29.08x)
      inv_txfm_add_16x64_dct_dct_3_10bpc_avx512icl:    662.6 (38.90x)
      inv_txfm_add_16x64_dct_dct_4_10bpc_c:          25745.9 ( 1.00x)
      inv_txfm_add_16x64_dct_dct_4_10bpc_sse4:        2330.9 (11.05x)
      inv_txfm_add_16x64_dct_dct_4_10bpc_avx2:        1008.5 (25.53x)
      inv_txfm_add_16x64_dct_dct_4_10bpc_avx512icl:    662.3 (38.88x)
      5aa3b38f
  12. Apr 06, 2023
  13. Mar 31, 2023
    • Matthias Dressel's avatar
      0207e0fe
    • Matthias Dressel's avatar
      x86/itx: Add 32x32 12bpc AVX2 idtx · f6d4c0c4
      Matthias Dressel authored
      inv_txfm_add_32x32_identity_identity_0_12bpc_c:      5785.8 ( 1.00x)
      inv_txfm_add_32x32_identity_identity_0_12bpc_avx2:     20.7 (279.65x)
      inv_txfm_add_32x32_identity_identity_1_12bpc_c:      5896.9 ( 1.00x)
      inv_txfm_add_32x32_identity_identity_1_12bpc_avx2:     20.7 (285.01x)
      inv_txfm_add_32x32_identity_identity_2_12bpc_c:      5799.5 ( 1.00x)
      inv_txfm_add_32x32_identity_identity_2_12bpc_avx2:     68.9 (84.20x)
      inv_txfm_add_32x32_identity_identity_3_12bpc_c:      5798.1 ( 1.00x)
      inv_txfm_add_32x32_identity_identity_3_12bpc_avx2:    140.6 (41.25x)
      inv_txfm_add_32x32_identity_identity_4_12bpc_c:      5803.3 ( 1.00x)
      inv_txfm_add_32x32_identity_identity_4_12bpc_avx2:    308.2 (18.83x)
      f6d4c0c4
    • Matthias Dressel's avatar
      x86/itx: Add 32x16 12bpc AVX2 idtx · 1e602b8b
      Matthias Dressel authored
      inv_txfm_add_32x16_identity_identity_0_12bpc_c:      4138.7 ( 1.00x)
      inv_txfm_add_32x16_identity_identity_0_12bpc_avx2:     30.4 (136.26x)
      inv_txfm_add_32x16_identity_identity_1_12bpc_c:      4147.5 ( 1.00x)
      inv_txfm_add_32x16_identity_identity_1_12bpc_avx2:     30.7 (135.25x)
      inv_txfm_add_32x16_identity_identity_2_12bpc_c:      4138.2 ( 1.00x)
      inv_txfm_add_32x16_identity_identity_2_12bpc_avx2:     98.9 (41.84x)
      inv_txfm_add_32x16_identity_identity_3_12bpc_c:      4136.6 ( 1.00x)
      inv_txfm_add_32x16_identity_identity_3_12bpc_avx2:    167.7 (24.67x)
      inv_txfm_add_32x16_identity_identity_4_12bpc_c:      4156.3 ( 1.00x)
      inv_txfm_add_32x16_identity_identity_4_12bpc_avx2:    242.1 (17.17x)
      1e602b8b
    • Matthias Dressel's avatar
      x86/itx: Add 16x32 12bpc AVX2 idtx · e6b194e7
      Matthias Dressel authored
      inv_txfm_add_16x32_identity_identity_0_12bpc_c:      4287.9 ( 1.00x)
      inv_txfm_add_16x32_identity_identity_0_12bpc_avx2:     31.4 (136.66x)
      inv_txfm_add_16x32_identity_identity_1_12bpc_c:      4293.7 ( 1.00x)
      inv_txfm_add_16x32_identity_identity_1_12bpc_avx2:     30.9 (139.07x)
      inv_txfm_add_16x32_identity_identity_2_12bpc_c:      4273.8 ( 1.00x)
      inv_txfm_add_16x32_identity_identity_2_12bpc_avx2:     97.3 (43.92x)
      inv_txfm_add_16x32_identity_identity_3_12bpc_c:      4269.0 ( 1.00x)
      inv_txfm_add_16x32_identity_identity_3_12bpc_avx2:    165.2 (25.83x)
      inv_txfm_add_16x32_identity_identity_4_12bpc_c:      4284.4 ( 1.00x)
      inv_txfm_add_16x32_identity_identity_4_12bpc_avx2:    235.2 (18.22x)
      e6b194e7
  14. Mar 25, 2023
  15. Mar 23, 2023
  16. Mar 21, 2023
  17. Mar 16, 2023
  18. Mar 13, 2023
  19. Mar 07, 2023
  20. Mar 06, 2023
Loading