Skip to content
Snippets Groups Projects
  1. Jun 01, 2023
  2. May 31, 2023
  3. May 29, 2023
  4. May 26, 2023
  5. May 25, 2023
  6. May 24, 2023
  7. May 14, 2023
  8. May 12, 2023
  9. May 11, 2023
  10. May 06, 2023
    • Andrey Semashev's avatar
      Fix extern "C" declarations · cb5a095e
      Andrey Semashev authored
      Avoid wrapping external includes in extern "C" blocks. Also wrap all public headers in extern "C" blocks to allow them to be selectively included in C++ projects.
      
      Fixes #422.
      cb5a095e
  11. May 05, 2023
    • Martin Storsjö's avatar
      arm64: ipred: 8 bpc NEON implementation of the Z2 function · 8af8244a
      Martin Storsjö authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      Relative speedup over C code:
                               Cortex A53    A55    A72    A73    A76   Apple M1
      intra_pred_z2_w4_8bpc_neon:    3.91   3.55   3.31   3.94   3.46   8.50
      intra_pred_z2_w8_8bpc_neon:    5.68   5.67   4.31   5.31   4.34   5.83
      intra_pred_z2_w16_8bpc_neon:   8.39   9.28   5.53   7.04   7.01   9.45
      intra_pred_z2_w32_8bpc_neon:   7.01   8.01   5.04   6.32   5.48   7.48
      intra_pred_z2_w64_8bpc_neon:   8.73  10.25   5.92   7.61   6.63  10.05
      8af8244a
  12. May 04, 2023
    • Victorien Le Couviour--Tuffet's avatar
      threading: Fix a race on task_thread.init_done · f89dbc07
      Victorien Le Couviour--Tuffet authored
      Fixes a race where the tasks inserted by the init one could all be
      executed, signaling frame completion, leading to another frame starting
      before init_done could be set by the aforementioned init task, which then
      sets it, preventing the init task of the new frame to be executed.
      
      This then caused an assert to trigger down the task picking loop.
      Credits to Oss-Fuzz.
      f89dbc07
  13. May 02, 2023
  14. Apr 28, 2023
  15. Apr 27, 2023
  16. Apr 25, 2023
  17. Apr 23, 2023
  18. Apr 20, 2023
    • Ronald S. Bultje's avatar
      x86: add AVX512-IceLake implementation of HBD 64x64 DCT^2 · ad0f3e6a
      Ronald S. Bultje authored
      Also implement "fast3" path for pass2.dct64 (where 1/8th of the
      coefficients are non-zero), which affects 32x64 as well as 64x64.
      
      Before:
      inv_txfm_add_32x64_dct_dct_1_10bpc_c:          51008.6 ( 1.00x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_sse4:        3351.9 (15.22x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_avx2:        1419.5 (35.93x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_avx512icl:    744.8 (68.49x)
      
      After:
      inv_txfm_add_32x64_dct_dct_1_10bpc_c:          51019.5 ( 1.00x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_sse4:        3276.1 (15.57x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_avx2:        1420.7 (35.91x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_avx512icl:    668.3 (76.34x)
      
      (Not sure why the SSE4 speed changed.)
      
      And speed for 64x64:
      inv_txfm_add_64x64_dct_dct_0_10bpc_c:           3506.9 ( 1.00x)
      inv_txfm_add_64x64_dct_dct_0_10bpc_sse4:         535.6 ( 6.55x)
      inv_txfm_add_64x64_dct_dct_0_10bpc_avx2:         223.5 (15.69x)
      inv_txfm_add_64x64_dct_dct_0_10bpc_avx512icl:    252.4 (13.89x)
      inv_txfm_add_64x64_dct_dct_1_10bpc_c:         108353.7 ( 1.00x)
      inv_txfm_add_64x64_dct_dct_1_10bpc_sse4:        6551.9 (16.54x)
      inv_txfm_add_64x64_dct_dct_1_10bpc_avx2:        2876.8 (37.66x)
      inv_txfm_add_64x64_dct_dct_1_10bpc_avx512icl:   1310.1 (82.70x)
      inv_txfm_add_64x64_dct_dct_2_10bpc_c:         108347.6 ( 1.00x)
      inv_txfm_add_64x64_dct_dct_2_10bpc_sse4:        7985.4 (13.57x)
      inv_txfm_add_64x64_dct_dct_2_10bpc_avx2:        3561.8 (30.42x)
      inv_txfm_add_64x64_dct_dct_2_10bpc_avx512icl:   1962.6 (55.20x)
      inv_txfm_add_64x64_dct_dct_3_10bpc_c:         108455.5 ( 1.00x)
      inv_txfm_add_64x64_dct_dct_3_10bpc_sse4:        9709.0 (11.17x)
      inv_txfm_add_64x64_dct_dct_3_10bpc_avx2:        4220.5 (25.70x)
      inv_txfm_add_64x64_dct_dct_3_10bpc_avx512icl:   2991.1 (36.26x)
      inv_txfm_add_64x64_dct_dct_4_10bpc_c:         108349.9 ( 1.00x)
      inv_txfm_add_64x64_dct_dct_4_10bpc_sse4:       11048.0 ( 9.81x)
      inv_txfm_add_64x64_dct_dct_4_10bpc_avx2:        4898.1 (22.12x)
      inv_txfm_add_64x64_dct_dct_4_10bpc_avx512icl:   3108.1 (34.86x)
      ad0f3e6a
  19. Apr 18, 2023
    • James Almer's avatar
      picture: allow storing an array of Dav1dITUTT35 entries · feeeccb6
      James Almer authored
      Nothing in the spec prevents a Temporal Unit from having more than one Metadata
      OBU of type ITU-T T.35, so export them as an array instead of only exporting
      the last one we parse.
      This is backwards compatible with the previous implementation, as users unaware
      of this change can ignore the n_itut_t35 field and still access the first (or
      only) entry in the array as they have been doing until now.
      feeeccb6
    • Ronald S. Bultje's avatar
      x86: add AVX512-IceLake implementation of HBD 64x32 DCT^2 · 68d7a76d
      Ronald S. Bultje authored
      inv_txfm_add_64x32_dct_dct_0_10bpc_c:           1760.6 ( 1.00x)
      inv_txfm_add_64x32_dct_dct_0_10bpc_sse4:         271.1 ( 6.49x)
      inv_txfm_add_64x32_dct_dct_0_10bpc_avx2:         121.3 (14.52x)
      inv_txfm_add_64x32_dct_dct_0_10bpc_avx512icl:    116.3 (15.14x)
      inv_txfm_add_64x32_dct_dct_1_10bpc_c:          66507.4 ( 1.00x)
      inv_txfm_add_64x32_dct_dct_1_10bpc_sse4:        3712.4 (17.91x)
      inv_txfm_add_64x32_dct_dct_1_10bpc_avx2:        1830.5 (36.33x)
      inv_txfm_add_64x32_dct_dct_1_10bpc_avx512icl:    805.4 (82.58x)
      inv_txfm_add_64x32_dct_dct_2_10bpc_c:          66491.6 ( 1.00x)
      inv_txfm_add_64x32_dct_dct_2_10bpc_sse4:        5325.3 (12.49x)
      inv_txfm_add_64x32_dct_dct_2_10bpc_avx2:        2578.5 (25.79x)
      inv_txfm_add_64x32_dct_dct_2_10bpc_avx512icl:   1394.5 (47.68x)
      inv_txfm_add_64x32_dct_dct_3_10bpc_c:          66490.2 ( 1.00x)
      inv_txfm_add_64x32_dct_dct_3_10bpc_sse4:        6418.5 (10.36x)
      inv_txfm_add_64x32_dct_dct_3_10bpc_avx2:        3305.6 (20.11x)
      inv_txfm_add_64x32_dct_dct_3_10bpc_avx512icl:   2571.5 (25.86x)
      inv_txfm_add_64x32_dct_dct_4_10bpc_c:          66508.6 ( 1.00x)
      inv_txfm_add_64x32_dct_dct_4_10bpc_sse4:        8671.2 ( 7.67x)
      inv_txfm_add_64x32_dct_dct_4_10bpc_avx2:        4054.2 (16.40x)
      inv_txfm_add_64x32_dct_dct_4_10bpc_avx512icl:   2691.6 (24.71x)
      68d7a76d
  20. Apr 13, 2023
    • Ronald S. Bultje's avatar
      x86: add AVX512-IceLake implementation of HBD 64x16 DCT^2 · 0b809a92
      Ronald S. Bultje authored
      inv_txfm_add_64x16_dct_dct_0_10bpc_c:            892.0 ( 1.00x)
      inv_txfm_add_64x16_dct_dct_0_10bpc_sse4:         131.5 ( 6.78x)
      inv_txfm_add_64x16_dct_dct_0_10bpc_avx2:          63.4 (14.07x)
      inv_txfm_add_64x16_dct_dct_0_10bpc_avx512icl:     56.8 (15.71x)
      inv_txfm_add_64x16_dct_dct_1_10bpc_c:          29253.7 ( 1.00x)
      inv_txfm_add_64x16_dct_dct_1_10bpc_sse4:        1639.7 (17.84x)
      inv_txfm_add_64x16_dct_dct_1_10bpc_avx2:        1106.8 (26.43x)
      inv_txfm_add_64x16_dct_dct_1_10bpc_avx512icl:    532.9 (54.89x)
      inv_txfm_add_64x16_dct_dct_2_10bpc_c:          29249.8 ( 1.00x)
      inv_txfm_add_64x16_dct_dct_2_10bpc_sse4:        3065.6 ( 9.54x)
      inv_txfm_add_64x16_dct_dct_2_10bpc_avx2:        1791.0 (16.33x)
      inv_txfm_add_64x16_dct_dct_2_10bpc_avx512icl:   1108.0 (26.40x)
      inv_txfm_add_64x16_dct_dct_3_10bpc_c:          29269.1 ( 1.00x)
      inv_txfm_add_64x16_dct_dct_3_10bpc_sse4:        3738.2 ( 7.83x)
      inv_txfm_add_64x16_dct_dct_3_10bpc_avx2:        1790.9 (16.34x)
      inv_txfm_add_64x16_dct_dct_3_10bpc_avx512icl:   1203.8 (24.31x)
      inv_txfm_add_64x16_dct_dct_4_10bpc_c:          29337.7 ( 1.00x)
      inv_txfm_add_64x16_dct_dct_4_10bpc_sse4:        3749.7 ( 7.82x)
      inv_txfm_add_64x16_dct_dct_4_10bpc_avx2:        1791.0 (16.38x)
      inv_txfm_add_64x16_dct_dct_4_10bpc_avx512icl:   1203.8 (24.37x)
      0b809a92
  21. Apr 12, 2023
    • Ronald S. Bultje's avatar
      x86: add AVX512-IceLake implementation of HBD 32x64 DCT^2 · 6ae57667
      Ronald S. Bultje authored
      inv_txfm_add_32x64_dct_dct_0_10bpc_c:           1783.5 ( 1.00x)
      inv_txfm_add_32x64_dct_dct_0_10bpc_sse4:         243.3 ( 7.33x)
      inv_txfm_add_32x64_dct_dct_0_10bpc_avx2:         119.1 (14.97x)
      inv_txfm_add_32x64_dct_dct_0_10bpc_avx512icl:    142.6 (12.50x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_c:          50422.5 ( 1.00x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_sse4:        2880.5 (17.50x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_avx2:        1423.4 (35.43x)
      inv_txfm_add_32x64_dct_dct_1_10bpc_avx512icl:    741.6 (67.99x)
      inv_txfm_add_32x64_dct_dct_2_10bpc_c:          50433.6 ( 1.00x)
      inv_txfm_add_32x64_dct_dct_2_10bpc_sse4:        4015.1 (12.56x)
      inv_txfm_add_32x64_dct_dct_2_10bpc_avx2:        1767.7 (28.53x)
      inv_txfm_add_32x64_dct_dct_2_10bpc_avx512icl:    960.8 (52.49x)
      inv_txfm_add_32x64_dct_dct_3_10bpc_c:          50422.2 ( 1.00x)
      inv_txfm_add_32x64_dct_dct_3_10bpc_sse4:        4500.5 (11.20x)
      inv_txfm_add_32x64_dct_dct_3_10bpc_avx2:        2111.7 (23.88x)
      inv_txfm_add_32x64_dct_dct_3_10bpc_avx512icl:   1777.1 (28.37x)
      inv_txfm_add_32x64_dct_dct_4_10bpc_c:          50444.2 ( 1.00x)
      inv_txfm_add_32x64_dct_dct_4_10bpc_sse4:        5592.8 ( 9.02x)
      inv_txfm_add_32x64_dct_dct_4_10bpc_avx2:        2458.1 (20.52x)
      inv_txfm_add_32x64_dct_dct_4_10bpc_avx512icl:   1867.2 (27.02x)
      6ae57667
    • James Almer's avatar
      ed997f5f
Loading