Skip to content
Snippets Groups Projects
  1. Jan 29, 2020
    • Henrik Gramner's avatar
      checkasm: Increase buffer alignment to 64-byte on x86-64 · 9c29f229
      Henrik Gramner authored
      Required for AVX-512.
      9c29f229
    • Martin Storsjö's avatar
      arm: cdef: Add special cased versions for pri_strength/sec_strength being zero · 361a3c8e
      Martin Storsjö authored
      Before:
      ARM32:                    Cortex A7      A8      A9     A53     A72     A73
      cdef_filter_4x4_8bpc_neon:    964.6   599.5   707.9   601.2   465.1   405.2
      cdef_filter_4x8_8bpc_neon:   1726.0  1066.2  1238.7  1041.7   798.6   725.3
      cdef_filter_8x8_8bpc_neon:   2974.4  1671.8  1943.9  1806.1  1229.8  1242.1
      ARM64:
      cdef_filter_4x4_8bpc_neon:                            569.2   337.8   348.7
      cdef_filter_4x8_8bpc_neon:                           1031.1   623.3   633.6
      cdef_filter_8x8_8bpc_neon:                           1847.5  1097.7  1117.5
      
      After:
      ARM32:                    Cortex A7      A8      A9     A53     A72     A73
      cdef_filter_4x4_8bpc_neon:    798.4   524.2   617.3   506.8   432.4   361.1
      cdef_filter_4x8_8bpc_neon:   1394.7   910.4  1054.0   863.6   730.2   632.2
      cdef_filter_8x8_8bpc_neon:   2364.6  1453.8  1675.1  1466.0  1086.4  1107.7
      ARM64:
      cdef_filter_4x4_8bpc_neon:                            461.7   303.1   308.6
      cdef_filter_4x8_8bpc_neon:                            833.0   547.5   556.0
      cdef_filter_8x8_8bpc_neon:                           1459.3   934.1   967.9
      361a3c8e
    • Martin Storsjö's avatar
      arm: cdef: Fix some comment typos · 6ad9bd5f
      Martin Storsjö authored
      6ad9bd5f
  2. Jan 28, 2020
  3. Jan 27, 2020
  4. Jan 25, 2020
  5. Jan 21, 2020
  6. Jan 20, 2020
  7. Jan 15, 2020
  8. Jan 14, 2020
  9. Jan 10, 2020
    • Ronald S. Bultje's avatar
      SSSE3 implementations of film grain · 8ff89463
      Ronald S. Bultje authored
      gen_grain_y_ar0_8bpc_c: 84853.3
      gen_grain_y_ar0_8bpc_ssse3: 23528.0
      gen_grain_y_ar1_8bpc_c: 140775.5
      gen_grain_y_ar1_8bpc_ssse3: 70410.2
      gen_grain_y_ar2_8bpc_c: 251311.3
      gen_grain_y_ar2_8bpc_ssse3: 95222.2
      gen_grain_y_ar3_8bpc_c: 394763.0
      gen_grain_y_ar3_8bpc_ssse3: 103541.9
      
      gen_grain_uv_ar0_8bpc_420_c: 29773.7
      gen_grain_uv_ar0_8bpc_420_ssse3: 7068.9
      gen_grain_uv_ar1_8bpc_420_c: 46113.2
      gen_grain_uv_ar1_8bpc_420_ssse3: 22148.1
      gen_grain_uv_ar2_8bpc_420_c: 70061.4
      gen_grain_uv_ar2_8bpc_420_ssse3: 25479.0
      gen_grain_uv_ar3_8bpc_420_c: 113826.0
      gen_grain_uv_ar3_8bpc_420_ssse3: 30004.9
      
      fguv_32x32xn_8bpc_420_csfl0_c: 8148.9
      fguv_32x32xn_8bpc_420_csfl0_ssse3: 1371.3
      fguv_32x32xn_8bpc_420_csfl1_c: 6391.9
      fguv_32x32xn_8bpc_420_csfl1_ssse3: 1034.8
      
      fgy_32x32xn_8bpc_c: 14201.3
      fgy_32x32xn_8bpc_ssse3: 3443.0
      8ff89463
    • Dale Curtis's avatar
      Reduce scope of NO_SANITIZE usage · e79e5ceb
      Dale Curtis authored
      dav1d_open() is part of the public API and should be sanitized, limit
      sanitizer disable to just the problematic dlsym() method.
      e79e5ceb
    • Henrik Gramner's avatar
      Add a workaround for -fsanitize=cfi + dlsym() issue · c192e0db
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      CFI will SIGILL when calling a function pointer obtained through
      dlsym(), regardless of whether or not the signature is correct.
      
      See https://bugs.llvm.org/show_bug.cgi?id=44500
      c192e0db
  10. Jan 09, 2020
    • Victorien Le Couviour--Tuffet's avatar
      x86: add prep_bilin AVX512 asm · 5462c2a8
      Victorien Le Couviour--Tuffet authored and Ronald S. Bultje's avatar Ronald S. Bultje committed
      ------------------------------------------
      mct_bilinear_w4_0_8bpc_avx2:      3.8
      mct_bilinear_w4_0_8bpc_avx512icl: 3.7
      ---------------------
      mct_bilinear_w8_0_8bpc_avx2:      5.0
      mct_bilinear_w8_0_8bpc_avx512icl: 4.8
      ---------------------
      mct_bilinear_w16_0_8bpc_avx2:      8.5
      mct_bilinear_w16_0_8bpc_avx512icl: 7.1
      ---------------------
      mct_bilinear_w32_0_8bpc_avx2:      29.5
      mct_bilinear_w32_0_8bpc_avx512icl: 17.1
      ---------------------
      mct_bilinear_w64_0_8bpc_avx2:      68.1
      mct_bilinear_w64_0_8bpc_avx512icl: 34.7
      ---------------------
      mct_bilinear_w128_0_8bpc_avx2:      180.5
      mct_bilinear_w128_0_8bpc_avx512icl: 138.0
      ------------------------------------------
      mct_bilinear_w4_h_8bpc_avx2:      4.0
      mct_bilinear_w4_h_8bpc_avx512icl: 3.9
      ---------------------
      mct_bilinear_w8_h_8bpc_avx2:      5.3
      mct_bilinear_w8_h_8bpc_avx512icl: 5.0
      ---------------------
      mct_bilinear_w16_h_8bpc_avx2:      11.7
      mct_bilinear_w16_h_8bpc_avx512icl:  7.5
      ---------------------
      mct_bilinear_w32_h_8bpc_avx2:      41.8
      mct_bilinear_w32_h_8bpc_avx512icl: 20.3
      ---------------------
      mct_bilinear_w64_h_8bpc_avx2:      94.9
      mct_bilinear_w64_h_8bpc_avx512icl: 35.0
      ---------------------
      mct_bilinear_w128_h_8bpc_avx2:      240.1
      mct_bilinear_w128_h_8bpc_avx512icl: 143.8
      ------------------------------------------
      mct_bilinear_w4_v_8bpc_avx2:      4.1
      mct_bilinear_w4_v_8bpc_avx512icl: 4.0
      ---------------------
      mct_bilinear_w8_v_8bpc_avx2:      6.0
      mct_bilinear_w8_v_8bpc_avx512icl: 5.4
      ---------------------
      mct_bilinear_w16_v_8bpc_avx2:      10.3
      mct_bilinear_w16_v_8bpc_avx512icl:  8.9
      ---------------------
      mct_bilinear_w32_v_8bpc_avx2:      29.5
      mct_bilinear_w32_v_8bpc_avx512icl: 25.9
      ---------------------
      mct_bilinear_w64_v_8bpc_avx2:      64.3
      mct_bilinear_w64_v_8bpc_avx512icl: 41.3
      ---------------------
      mct_bilinear_w128_v_8bpc_avx2:      198.2
      mct_bilinear_w128_v_8bpc_avx512icl: 139.6
      ------------------------------------------
      mct_bilinear_w4_hv_8bpc_avx2:      5.6
      mct_bilinear_w4_hv_8bpc_avx512icl: 5.2
      ---------------------
      mct_bilinear_w8_hv_8bpc_avx2:      8.3
      mct_bilinear_w8_hv_8bpc_avx512icl: 7.0
      ---------------------
      mct_bilinear_w16_hv_8bpc_avx2:      19.4
      mct_bilinear_w16_hv_8bpc_avx512icl: 12.1
      ---------------------
      mct_bilinear_w32_hv_8bpc_avx2:      69.1
      mct_bilinear_w32_hv_8bpc_avx512icl: 32.5
      ---------------------
      mct_bilinear_w64_hv_8bpc_avx2:      164.4
      mct_bilinear_w64_hv_8bpc_avx512icl:  71.1
      ---------------------
      mct_bilinear_w128_hv_8bpc_avx2:      405.2
      mct_bilinear_w128_hv_8bpc_avx512icl: 193.1
      ------------------------------------------
      5462c2a8
    • Victorien Le Couviour--Tuffet's avatar
      x86: add avx512icl cpu flag to x86inc.asm · 40891aab
      Victorien Le Couviour--Tuffet authored and Ronald S. Bultje's avatar Ronald S. Bultje committed
      40891aab
    • Victorien Le Couviour--Tuffet's avatar
      checkasm: x86: ensure all SIMD lanes are turned on at all times · 430967a6
      Victorien Le Couviour--Tuffet authored and Ronald S. Bultje's avatar Ronald S. Bultje committed
      YMM and ZMM registers on x86 are turned off to save power when they haven't
      been used for some period of time. When they are used there will be a
      "warmup" period during which performance will be reduced and inconsistent
      which is problematic when trying to benchmark individual functions.
      
      Periodically issue "dummy" instructions that uses those registers to
      prevent them from being powered down. The end result is more consistent
      benchmark results.
      
      Credits to Henrik Gramner's commit
      1878c7f2af0a9c73e291488209109782c428cfcf from x264.
      430967a6
  11. Jan 08, 2020
  12. Jan 07, 2020
  13. Jan 05, 2020
    • Martin Storsjö's avatar
      arm64: msac: Avoid 32 bit intermediates in symbol_adapt · 8d574f70
      Martin Storsjö authored
      This gives small gains on A72 and A73, and on A53 on symbol_adapt16.
      
      Before:                      Cortex A53    A72    A73
      msac_decode_symbol_adapt4_neon:    63.2   52.8   53.3
      msac_decode_symbol_adapt8_neon:    68.5   57.9   55.7
      msac_decode_symbol_adapt16_neon:   92.8   59.7   62.8
      After:
      msac_decode_symbol_adapt4_neon:    63.3   48.3   50.0
      msac_decode_symbol_adapt8_neon:    68.7   55.5   54.0
      msac_decode_symbol_adapt16_neon:   88.6   58.8   60.0
      8d574f70
  14. Jan 02, 2020
  15. Jan 01, 2020
  16. Dec 31, 2019
  17. Dec 29, 2019
  18. Dec 28, 2019
Loading