Skip to content
Snippets Groups Projects
  1. Jun 20, 2020
  2. Jun 19, 2020
    • Henrik Gramner's avatar
      x86: Branch before waiting on popcnt in ipred_z AVX2 functions · bf7adb75
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      Some specific Haswell CPU:s have a hardware bug where the popcnt
      instruction doesn't set zero flag correctly, which causes the wrong
      branch to be taken.
      
      popcnt also has a 3-cycle latency on Intel CPU:s, so doing the branch
      on the input value instead of the output reduces the amount of time
      wasted going down the wrong code path in case of branch mispredictions.
      bf7adb75
    • Martin Storsjö's avatar
      arm32: Add a NEON implementation of MSAC · 53e7b21e
      Martin Storsjö authored
      Only use this in the cases when NEON can be used unconditionally
      without runtime detection (when __ARM_NEON is defined).
      
      The speedup over the C code is very modest for the smaller functions
      (and the NEON version actually is a little slower than the C code
      on Cortex A7 for adapt4), but the speedup is around 2x for
      adapt16.
      
                                    Cortex A7     A8     A9    A53    A72    A73
      msac_decode_bool_c:                41.1   43.0   43.0   37.3   26.2   31.3
      msac_decode_bool_neon:             40.2   42.0   37.2   32.8   19.9   25.5
      msac_decode_bool_adapt_c:          65.1   70.4   58.5   54.3   33.2   40.8
      msac_decode_bool_adapt_neon:       56.8   52.4   49.3   42.6   27.1   33.7
      msac_decode_bool_equi_c:           36.9   37.2   42.8   32.6   22.7   42.3
      msac_decode_bool_equi_neon:        34.9   35.1   36.4   29.7   19.5   36.4
      msac_decode_symbol_adapt4_c:      114.2  139.0  111.6   99.9   65.5   83.5
      msac_decode_symbol_adapt4_neon:   119.2  128.3   95.7   82.2   58.2   57.5
      msac_decode_symbol_adapt8_c:      176.0  207.9  164.0  154.4   88.0  117.0
      msac_decode_symbol_adapt8_neon:   128.3  130.3  110.7   85.1   59.9   61.4
      msac_decode_symbol_adapt16_c:     292.1  320.5  256.4  246.4  129.1  173.3
      msac_decode_symbol_adapt16_neon:  162.2  144.3  129.0  104.2   69.2   69.9
      
      (Omitting msac_decode_hi_tok from the benchmark, as the "C" version
      measured there uses the NEON version of msac_decode_symbol_adapt4.)
      53e7b21e
  3. Jun 18, 2020
    • Martin Storsjö's avatar
      arm64: msac: Add a special cased implementation of decode_hi_tok · 370200cd
      Martin Storsjö authored
      The speedup (over the normal version, that just calls the existing
      assembly version of symbol_adapt4) is not very impressive on
      bigger cores, but looks decent on small cores. It's an improvement
      though, in any case.
      
                                   Cortex A53    A72    A73
      msac_decode_hi_tok_c:             175.7  136.2  138.1
      msac_decode_hi_tok_neon:          146.8  129.4  125.9
      370200cd
    • Martin Storsjö's avatar
    • Martin Storsjö's avatar
      arm64: msac: Clarify the register use in one macro · 078e7360
      Martin Storsjö authored
      Include the letter prefix when calling the macro, making it
      slightly less obscure.
      078e7360
    • Martin Storsjö's avatar
      cli: Avoid large intermediates in the windows get_time_nanos · 7949de70
      Martin Storsjö authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      By multiplicating the performance counter value (within its own
      time base) by the intended target time base, and only then dividing,
      we reduce the available numeric range by the factor of the
      original time base times the new time base.
      
      On Windows 10 on ARM64, the performance counter frequency is
      19200000 (on x86_64 in a virtual machine, it's 10000000), making
      the calculation overflow every (1 << 64) / (19200000 * 1000000000)
      = 960 seconds, i.e. 16 minutes - long before the actual uint64_t
      nanosecond return value wraps around.
      7949de70
    • Martin Storsjö's avatar
      cli: Get the elapsed time if printing progress, regardless of the fps value · 3e643b1f
      Martin Storsjö authored
      Even if we don't want to throttle decoding to realtime, and
      even if the file itself didn't contain a valid fps value, we
      may want to call the synchronize function to fetch the current
      elapsed decoding time, for displaying the fps value.
      3e643b1f
    • Jean-Baptiste Kempf's avatar
      Update NEWS for 0.7.1 · 8c763d21
      Jean-Baptiste Kempf authored
      8c763d21
    • Victorien Le Couviour--Tuffet's avatar
      x86: Add put/prep_bilin_scaled AVX2 asm · a75ee78b
      Victorien Le Couviour--Tuffet authored
      Bilin scaled being very rarely used, add a new table entry to
      mc_subpel_filters, and jump to the put/prep_8tap_scaled code.
      
      AVX2 performance is obviously the same as the 8tap code, the speed up is
      much smaller though, as the C code is a true bilinear codepath,
      auto-vectorized. Yet, the AVX2 performance are always better.
      a75ee78b
    • Victorien Le Couviour--Tuffet's avatar
      x86: Add prep_8tap_scaled AVX2 asm · ea74e3d5
      Victorien Le Couviour--Tuffet authored
      mct_scaled_8tap_regular_w4_8bpc_c: 872.1
      mct_scaled_8tap_regular_w4_8bpc_avx2: 125.6
      mct_scaled_8tap_regular_w4_dy1_8bpc_c: 886.3
      mct_scaled_8tap_regular_w4_dy1_8bpc_avx2: 84.0
      mct_scaled_8tap_regular_w4_dy2_8bpc_c: 1189.1
      mct_scaled_8tap_regular_w4_dy2_8bpc_avx2: 84.7
      
      mct_scaled_8tap_regular_w8_8bpc_c: 2261.0
      mct_scaled_8tap_regular_w8_8bpc_avx2: 306.2
      mct_scaled_8tap_regular_w8_dy1_8bpc_c: 2189.9
      mct_scaled_8tap_regular_w8_dy1_8bpc_avx2: 233.8
      mct_scaled_8tap_regular_w8_dy2_8bpc_c: 3060.3
      mct_scaled_8tap_regular_w8_dy2_8bpc_avx2: 282.8
      
      mct_scaled_8tap_regular_w16_8bpc_c: 4335.3
      mct_scaled_8tap_regular_w16_8bpc_avx2: 680.7
      mct_scaled_8tap_regular_w16_dy1_8bpc_c: 5137.2
      mct_scaled_8tap_regular_w16_dy1_8bpc_avx2: 578.6
      mct_scaled_8tap_regular_w16_dy2_8bpc_c: 7878.4
      mct_scaled_8tap_regular_w16_dy2_8bpc_avx2: 774.6
      
      mct_scaled_8tap_regular_w32_8bpc_c: 17871.9
      mct_scaled_8tap_regular_w32_8bpc_avx2: 2954.8
      mct_scaled_8tap_regular_w32_dy1_8bpc_c: 18594.7
      mct_scaled_8tap_regular_w32_dy1_8bpc_avx2: 2073.9
      mct_scaled_8tap_regular_w32_dy2_8bpc_c: 28696.0
      mct_scaled_8tap_regular_w32_dy2_8bpc_avx2: 2852.1
      
      mct_scaled_8tap_regular_w64_8bpc_c: 46967.5
      mct_scaled_8tap_regular_w64_8bpc_avx2: 7527.5
      mct_scaled_8tap_regular_w64_dy1_8bpc_c: 45564.2
      mct_scaled_8tap_regular_w64_dy1_8bpc_avx2: 5262.9
      mct_scaled_8tap_regular_w64_dy2_8bpc_c: 72793.3
      mct_scaled_8tap_regular_w64_dy2_8bpc_avx2: 7535.9
      
      mct_scaled_8tap_regular_w128_8bpc_c: 111190.8
      mct_scaled_8tap_regular_w128_8bpc_avx2: 19386.8
      mct_scaled_8tap_regular_w128_dy1_8bpc_c: 122625.0
      mct_scaled_8tap_regular_w128_dy1_8bpc_avx2: 15376.1
      mct_scaled_8tap_regular_w128_dy2_8bpc_c: 197120.6
      mct_scaled_8tap_regular_w128_dy2_8bpc_avx2: 21871.0
      ea74e3d5
  4. Jun 16, 2020
    • Colin Lee's avatar
      Clean up fraction calculation · 07261e8c
      Colin Lee authored
      07261e8c
    • Colin Lee's avatar
      Add clamping back to mv projection · 2d4711c9
      Colin Lee authored
      Clamping in the motion vector projection calculation is required by spec.
      In commit aca57bf3
      a rewrite of the function omitted the clamping. This commit readds the
      clamping.
      2d4711c9
    • Martin Storsjö's avatar
      arm64: itx: Simplify and clarify the sub_sp macro a little bit · 1e674fdb
      Martin Storsjö authored
      Add an .error case for windows if subtracting more than 8 KB, simplify
      the generic subtraction case.
      1e674fdb
    • Martin Storsjö's avatar
      arm: itx: Add NEON implementation of itx for 8 bpc · 3d6d7683
      Martin Storsjö authored
      The transforms process vectors of up to 8 elements at a time, for
      transforms up to size 8; for larger transforms, it uses vectors of
      4 elements.
      
      Overall, the speedup over C code seems to be around 8-14x for the
      larger transforms, and 10-19x for the smaller ones.
      
      Relative speedup over C code (built with GCC 7.5) for a few functions:
      
                                          Cortex A7     A8     A9    A53    A72    A73
      inv_txfm_add_4x4_dct_dct_0_8bpc_neon:    3.83   3.42   2.57   3.36   2.97   7.47
      inv_txfm_add_4x4_dct_dct_1_8bpc_neon:    7.25  13.53   8.38   8.82   7.96  12.37
      inv_txfm_add_8x8_dct_dct_0_8bpc_neon:    4.78   6.61   4.82   4.65   5.27   9.76
      inv_txfm_add_8x8_dct_dct_1_8bpc_neon:   10.20  19.07  13.07  14.69  11.45  15.50
      inv_txfm_add_16x16_dct_dct_0_8bpc_neon:  4.26   5.06   3.00   3.74   4.05   4.49
      inv_txfm_add_16x16_dct_dct_1_8bpc_neon: 10.51  16.02  13.57  14.03  12.86  18.16
      inv_txfm_add_16x16_dct_dct_2_8bpc_neon:  7.95  11.75   9.09  10.64  10.06  14.07
      inv_txfm_add_32x32_dct_dct_0_8bpc_neon:  5.31   5.58   3.14   4.18   4.80   4.57
      inv_txfm_add_32x32_dct_dct_1_8bpc_neon: 12.66  16.07  14.34  16.00  15.24  21.32
      inv_txfm_add_32x32_dct_dct_4_8bpc_neon:  8.25  10.69   8.90  10.59  10.41  14.39
      inv_txfm_add_64x64_dct_dct_0_8bpc_neon:  4.69   5.97   3.17   3.96   4.57   4.34
      inv_txfm_add_64x64_dct_dct_1_8bpc_neon: 11.47  12.68  10.18  14.73  14.20  17.95
      inv_txfm_add_64x64_dct_dct_4_8bpc_neon:  8.84  10.13   7.94  11.25  10.58  13.88
      3d6d7683
  5. Jun 11, 2020
    • Matthias Dressel's avatar
    • Henrik Gramner's avatar
      Remove redundant memset in itx DSP initialization · d606dd24
      Henrik Gramner authored
      The struct is already zero-initialized when the function is called
      except for the checkasm test, so move the zeroing there instead.
      d606dd24
    • Matthias Dressel's avatar
      meson: Make docs generation subproject-safe · bc008834
      Matthias Dressel authored
      meson.source_root() returns the root of a parent project if dav1d is
      embedded as a subproject.
      bc008834
    • Victorien Le Couviour--Tuffet's avatar
      x86: Adapt SSSE3 prep_8tap to SSE2 · 22fb8a42
      Victorien Le Couviour--Tuffet authored
      ---------------------
      x86_64:
      ------------------------------------------
      mct_8tap_regular_w4_h_8bpc_c: 302.3
      mct_8tap_regular_w4_h_8bpc_sse2: 47.3
      mct_8tap_regular_w4_h_8bpc_ssse3: 19.5
      ---------------------
      mct_8tap_regular_w8_h_8bpc_c: 745.5
      mct_8tap_regular_w8_h_8bpc_sse2: 235.2
      mct_8tap_regular_w8_h_8bpc_ssse3: 70.4
      ---------------------
      mct_8tap_regular_w16_h_8bpc_c: 1844.3
      mct_8tap_regular_w16_h_8bpc_sse2: 755.6
      mct_8tap_regular_w16_h_8bpc_ssse3: 225.9
      ---------------------
      mct_8tap_regular_w32_h_8bpc_c: 6685.5
      mct_8tap_regular_w32_h_8bpc_sse2: 2954.4
      mct_8tap_regular_w32_h_8bpc_ssse3: 795.8
      ---------------------
      mct_8tap_regular_w64_h_8bpc_c: 15633.5
      mct_8tap_regular_w64_h_8bpc_sse2: 7120.4
      mct_8tap_regular_w64_h_8bpc_ssse3: 1900.4
      ---------------------
      mct_8tap_regular_w128_h_8bpc_c: 37772.1
      mct_8tap_regular_w128_h_8bpc_sse2: 17698.1
      mct_8tap_regular_w128_h_8bpc_ssse3: 4665.5
      ------------------------------------------
      mct_8tap_regular_w4_v_8bpc_c: 306.5
      mct_8tap_regular_w4_v_8bpc_sse2: 71.7
      mct_8tap_regular_w4_v_8bpc_ssse3: 37.9
      ---------------------
      mct_8tap_regular_w8_v_8bpc_c: 923.3
      mct_8tap_regular_w8_v_8bpc_sse2: 168.7
      mct_8tap_regular_w8_v_8bpc_ssse3: 71.3
      ---------------------
      mct_8tap_regular_w16_v_8bpc_c: 3040.1
      mct_8tap_regular_w16_v_8bpc_sse2: 505.1
      mct_8tap_regular_w16_v_8bpc_ssse3: 199.7
      ---------------------
      mct_8tap_regular_w32_v_8bpc_c: 12354.8
      mct_8tap_regular_w32_v_8bpc_sse2: 1942.0
      mct_8tap_regular_w32_v_8bpc_ssse3: 714.2
      ---------------------
      mct_8tap_regular_w64_v_8bpc_c: 29427.9
      mct_8tap_regular_w64_v_8bpc_sse2: 4637.4
      mct_8tap_regular_w64_v_8bpc_ssse3: 1829.2
      ---------------------
      mct_8tap_regular_w128_v_8bpc_c: 72756.9
      mct_8tap_regular_w128_v_8bpc_sse2: 11301.0
      mct_8tap_regular_w128_v_8bpc_ssse3: 5020.6
      ------------------------------------------
      mct_8tap_regular_w4_hv_8bpc_c: 876.9
      mct_8tap_regular_w4_hv_8bpc_sse2: 171.7
      mct_8tap_regular_w4_hv_8bpc_ssse3: 112.2
      ---------------------
      mct_8tap_regular_w8_hv_8bpc_c: 2215.1
      mct_8tap_regular_w8_hv_8bpc_sse2: 730.2
      mct_8tap_regular_w8_hv_8bpc_ssse3: 330.9
      ---------------------
      mct_8tap_regular_w16_hv_8bpc_c: 6075.5
      mct_8tap_regular_w16_hv_8bpc_sse2: 2252.1
      mct_8tap_regular_w16_hv_8bpc_ssse3: 973.4
      ---------------------
      mct_8tap_regular_w32_hv_8bpc_c: 22182.7
      mct_8tap_regular_w32_hv_8bpc_sse2: 7692.6
      mct_8tap_regular_w32_hv_8bpc_ssse3: 3599.8
      ---------------------
      mct_8tap_regular_w64_hv_8bpc_c: 50876.8
      mct_8tap_regular_w64_hv_8bpc_sse2: 18499.6
      mct_8tap_regular_w64_hv_8bpc_ssse3: 8815.6
      ---------------------
      mct_8tap_regular_w128_hv_8bpc_c: 122926.3
      mct_8tap_regular_w128_hv_8bpc_sse2: 45120.0
      mct_8tap_regular_w128_hv_8bpc_ssse3: 22085.7
      ------------------------------------------
      22fb8a42
    • Victorien Le Couviour--Tuffet's avatar
      x86: Adapt SSSE3 prep_bilin to SSE2 · 83956bf1
      Victorien Le Couviour--Tuffet authored
      ---------------------
      x86_64:
      ------------------------------------------
      mct_bilinear_w4_h_8bpc_c: 98.9
      mct_bilinear_w4_h_8bpc_sse2: 30.2
      mct_bilinear_w4_h_8bpc_ssse3: 11.5
      ---------------------
      mct_bilinear_w8_h_8bpc_c: 175.3
      mct_bilinear_w8_h_8bpc_sse2: 57.0
      mct_bilinear_w8_h_8bpc_ssse3: 19.7
      ---------------------
      mct_bilinear_w16_h_8bpc_c: 396.2
      mct_bilinear_w16_h_8bpc_sse2: 179.3
      mct_bilinear_w16_h_8bpc_ssse3: 50.9
      ---------------------
      mct_bilinear_w32_h_8bpc_c: 1311.2
      mct_bilinear_w32_h_8bpc_sse2: 718.8
      mct_bilinear_w32_h_8bpc_ssse3: 243.9
      ---------------------
      mct_bilinear_w64_h_8bpc_c: 2892.7
      mct_bilinear_w64_h_8bpc_sse2: 1746.0
      mct_bilinear_w64_h_8bpc_ssse3: 568.0
      ---------------------
      mct_bilinear_w128_h_8bpc_c: 7192.6
      mct_bilinear_w128_h_8bpc_sse2: 4339.8
      mct_bilinear_w128_h_8bpc_ssse3: 1619.2
      ------------------------------------------
      mct_bilinear_w4_v_8bpc_c: 129.7
      mct_bilinear_w4_v_8bpc_sse2: 26.6
      mct_bilinear_w4_v_8bpc_ssse3: 16.7
      ---------------------
      mct_bilinear_w8_v_8bpc_c: 233.3
      mct_bilinear_w8_v_8bpc_sse2: 55.0
      mct_bilinear_w8_v_8bpc_ssse3: 24.7
      ---------------------
      mct_bilinear_w16_v_8bpc_c: 498.9
      mct_bilinear_w16_v_8bpc_sse2: 146.0
      mct_bilinear_w16_v_8bpc_ssse3: 54.2
      ---------------------
      mct_bilinear_w32_v_8bpc_c: 1562.2
      mct_bilinear_w32_v_8bpc_sse2: 560.6
      mct_bilinear_w32_v_8bpc_ssse3: 201.0
      ---------------------
      mct_bilinear_w64_v_8bpc_c: 3221.3
      mct_bilinear_w64_v_8bpc_sse2: 1380.6
      mct_bilinear_w64_v_8bpc_ssse3: 499.3
      ---------------------
      mct_bilinear_w128_v_8bpc_c: 7357.7
      mct_bilinear_w128_v_8bpc_sse2: 3439.0
      mct_bilinear_w128_v_8bpc_ssse3: 1489.1
      ------------------------------------------
      mct_bilinear_w4_hv_8bpc_c: 185.0
      mct_bilinear_w4_hv_8bpc_sse2: 54.5
      mct_bilinear_w4_hv_8bpc_ssse3: 22.1
      ---------------------
      mct_bilinear_w8_hv_8bpc_c: 377.8
      mct_bilinear_w8_hv_8bpc_sse2: 104.3
      mct_bilinear_w8_hv_8bpc_ssse3: 35.8
      ---------------------
      mct_bilinear_w16_hv_8bpc_c: 1159.4
      mct_bilinear_w16_hv_8bpc_sse2: 311.0
      mct_bilinear_w16_hv_8bpc_ssse3: 106.3
      ---------------------
      mct_bilinear_w32_hv_8bpc_c: 4436.2
      mct_bilinear_w32_hv_8bpc_sse2: 1230.7
      mct_bilinear_w32_hv_8bpc_ssse3: 400.7
      ---------------------
      mct_bilinear_w64_hv_8bpc_c: 10627.7
      mct_bilinear_w64_hv_8bpc_sse2: 2934.2
      mct_bilinear_w64_hv_8bpc_ssse3: 957.2
      ---------------------
      mct_bilinear_w128_hv_8bpc_c: 26048.9
      mct_bilinear_w128_hv_8bpc_sse2: 7590.3
      mct_bilinear_w128_hv_8bpc_ssse3: 2947.0
      ------------------------------------------
      83956bf1
  6. Jun 10, 2020
  7. Jun 09, 2020
  8. Jun 07, 2020
    • Niklas Haas's avatar
      CI: Enable coverage reports · 2b98fd28
      Niklas Haas authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      Blacklisted some files not directly relevant to the codebase (such as
      tests, tools and debugging functions).
      
      The coverage HTML report gets attached as a build artifact, although
      unfortunately we can't link directly to the `index.html`. We also attach
      the coverage XML as a cobertura report, although I'm not sure if it does
      anything.
      2b98fd28
  9. Jun 04, 2020
  10. Jun 01, 2020
    • Victorien Le Couviour--Tuffet's avatar
      x86: Add put_8tap_scaled AVX2 asm · a755541f
      Victorien Le Couviour--Tuffet authored
      mc_scaled_8tap_regular_w2_8bpc_c: 764.4
      mc_scaled_8tap_regular_w2_8bpc_avx2: 191.3
      mc_scaled_8tap_regular_w2_dy1_8bpc_c: 705.8
      mc_scaled_8tap_regular_w2_dy1_8bpc_avx2: 89.5
      mc_scaled_8tap_regular_w2_dy2_8bpc_c: 964.0
      mc_scaled_8tap_regular_w2_dy2_8bpc_avx2: 120.3
      
      mc_scaled_8tap_regular_w4_8bpc_c: 1355.7
      mc_scaled_8tap_regular_w4_8bpc_avx2: 180.9
      mc_scaled_8tap_regular_w4_dy1_8bpc_c: 1233.2
      mc_scaled_8tap_regular_w4_dy1_8bpc_avx2: 115.3
      mc_scaled_8tap_regular_w4_dy2_8bpc_c: 1707.6
      mc_scaled_8tap_regular_w4_dy2_8bpc_avx2: 117.9
      
      mc_scaled_8tap_regular_w8_8bpc_c: 2483.2
      mc_scaled_8tap_regular_w8_8bpc_avx2: 294.8
      mc_scaled_8tap_regular_w8_dy1_8bpc_c: 2166.4
      mc_scaled_8tap_regular_w8_dy1_8bpc_avx2: 222.0
      mc_scaled_8tap_regular_w8_dy2_8bpc_c: 3133.7
      mc_scaled_8tap_regular_w8_dy2_8bpc_avx2: 292.6
      
      mc_scaled_8tap_regular_w16_8bpc_c: 5239.2
      mc_scaled_8tap_regular_w16_8bpc_avx2: 729.9
      mc_scaled_8tap_regular_w16_dy1_8bpc_c: 5156.5
      mc_scaled_8tap_regular_w16_dy1_8bpc_avx2: 602.2
      mc_scaled_8tap_regular_w16_dy2_8bpc_c: 8018.4
      mc_scaled_8tap_regular_w16_dy2_8bpc_avx2: 783.1
      
      mc_scaled_8tap_regular_w32_8bpc_c: 14745.0
      mc_scaled_8tap_regular_w32_8bpc_avx2: 2205.0
      mc_scaled_8tap_regular_w32_dy1_8bpc_c: 14862.3
      mc_scaled_8tap_regular_w32_dy1_8bpc_avx2: 1721.3
      mc_scaled_8tap_regular_w32_dy2_8bpc_c: 23607.6
      mc_scaled_8tap_regular_w32_dy2_8bpc_avx2: 2325.7
      
      mc_scaled_8tap_regular_w64_8bpc_c: 54891.7
      mc_scaled_8tap_regular_w64_8bpc_avx2: 8351.4
      mc_scaled_8tap_regular_w64_dy1_8bpc_c: 50249.0
      mc_scaled_8tap_regular_w64_dy1_8bpc_avx2: 5864.4
      mc_scaled_8tap_regular_w64_dy2_8bpc_c: 79400.1
      mc_scaled_8tap_regular_w64_dy2_8bpc_avx2: 8295.7
      
      mc_scaled_8tap_regular_w128_8bpc_c: 121046.8
      mc_scaled_8tap_regular_w128_8bpc_avx2: 21809.1
      mc_scaled_8tap_regular_w128_dy1_8bpc_c: 133720.4
      mc_scaled_8tap_regular_w128_dy1_8bpc_avx2: 16197.8
      mc_scaled_8tap_regular_w128_dy2_8bpc_c: 218774.8
      mc_scaled_8tap_regular_w128_dy2_8bpc_avx2: 22993.1
      a755541f
  11. May 28, 2020
    • Steve Lhomme's avatar
      meson: favor _aligned_malloc over posix_memalign · ed39e8fb
      Steve Lhomme authored
      posix_memalign is defined as a built-in in gcc in msys2 but it's not available
      when linking with the Universal C Runtime. _aligned_malloc is available in the
      UCRT.
      
      That should only affect builds targeting Windows since _aligned_malloc is a MS
      thing.
      ed39e8fb
  12. May 26, 2020
  13. May 25, 2020
    • Niklas Haas's avatar
      dav1dplay: allow resizing the window · a1e7a329
      Niklas Haas authored
      libplacebo v66 got helper functions that make preserving the aspect
      ratio in this case trivial. But we still need to make sure to clear the
      FBO to black if the image doesn't cover it fully.
      a1e7a329
  14. May 20, 2020
    • Niklas Haas's avatar
      dav1dplay: don't freeze on render errors · df40d36d
      Niklas Haas authored
      Returning out of this function when pl_render_image() fails is the wrong
      thing to do, since that leaves the swapchain frame acquired but never
      submitted. Instead, just clear the target FBO to blank red (to make it
      clear that something went wrong) and continue on with presentation.
      0.7.0
      df40d36d
  15. May 19, 2020
  16. May 18, 2020
    • Niklas Haas's avatar
      dav1dplay: support on-GPU film grain synthesis · cbe05cf4
      Niklas Haas authored
      Annoying minor differences in this struct layout mean we can't just
      memcpy the entire thing. Oh well.
      
      Note: technically, PL_API_VER 33 added this API, but PL_API_VER 63 is
      the minimum version of libplacebo that doesn't have glaring bugs when
      generating chroma grain, so we require that as a minimum instead.
      
      (I tested this version on some 4:2:2 and 4:2:0, 8-bit and 10-bit grain
      samples I had lying around and made sure the output was identical up to
      differences in rounding / dithering.)
      cbe05cf4
    • Niklas Haas's avatar
      dav1dplay: handle all supported csps/reprs/bitdepths · 7bbebdb4
      Niklas Haas authored
      Generalize the code to set the right pl_image metadata based on the
      values signaled in the Dav1dPictureParameters / Dav1dSequenceHeader.
      
      Some values are not mapped, in which case stdout will be spammed.
      Whatever. Hopefully somebody sees that error spam and opens a bug report
      for libplacebo to implement it.
      7bbebdb4
    • Niklas Haas's avatar
      dav1dplay: move and simplify pl_image generation · f01fd0f1
      Niklas Haas authored
      Having the pl_image generation live in upload_planes() rather than
      render() will make it easier to set the correct pl_image metadata based
      on the Dav1dPicture headers moving forwards. Rename the function to make
      more sense, semantically.
      
      Reduce some code duplication by turning per-plane fields into arrays
      wherever appropriate.
      
      As an aside, also apply the correct chroma location rather than
      hard-coding it as PL_CHROMA_LEFT.
      f01fd0f1
    • Niklas Haas's avatar
      dav1dplay: don't write directly to iparams.extensions · 3bb0aed1
      Niklas Haas authored
      This is turned into a const array in upstream libplacebo, which
      generates warnings due to the implicit cast. Rewrite the code to have
      the mutable array live inside a separate variable `extensions` and only
      set `iparams.extensions` to this, rather than directly manipulating it.
      3bb0aed1
  17. May 16, 2020
Loading