Skip to content
Snippets Groups Projects
  1. Sep 17, 2020
  2. Sep 15, 2020
    • Wan-Teh Chang's avatar
      Ban op->idc that may drop all layer-specific OBUs · 50e876c6
      Wan-Teh Chang authored
      If c->operating_point_idc is nonzero and either bits 0-7 or bits 8-11 in
      it are all 0s, it will cause dav1d_parse_obus() to drop all
      layer-specific OBUs. Prohibit any op->idc with such properties because
      it could be selected as c->operating_point_idc.
      50e876c6
  3. Sep 06, 2020
  4. Sep 03, 2020
    • Martin Storsjö's avatar
      arm32: mc: NEON implementation of put/prep 8tap/bilin for 16 bpc · 856662b4
      Martin Storsjö authored
      Examples of checkasm benchmarks:
                                        Cortex A7      A8      A9     A53     A72     A73
      mc_8tap_regular_w8_0_16bpc_neon:      158.7   106.2   167.0   127.9    55.0    77.2
      mc_8tap_regular_w8_h_16bpc_neon:     1000.8   557.5   749.2   609.2   401.4   485.4
      mc_8tap_regular_w8_hv_16bpc_neon:    2278.9  1255.4  1352.5  1277.2   867.8   915.9
      mc_8tap_regular_w8_v_16bpc_neon:     1060.0   393.6   485.5   448.3   298.0   298.2
      mc_bilinear_w8_0_16bpc_neon:          159.7    96.6   161.1   123.7    55.4    74.7
      mc_bilinear_w8_h_16bpc_neon:          342.3   250.8   352.9   239.0   158.4   165.1
      mc_bilinear_w8_hv_16bpc_neon:         587.7   373.8   469.0   339.8   244.4   247.5
      mc_bilinear_w8_v_16bpc_neon:          285.8   189.3   284.9   180.4   103.4   100.9
      mct_8tap_regular_w8_0_16bpc_neon:     233.0   136.6   229.3   169.3    86.2    98.3
      mct_8tap_regular_w8_h_16bpc_neon:    1106.8   588.3   817.9   654.1   406.4   489.8
      mct_8tap_regular_w8_hv_16bpc_neon:   2473.3  1326.3  1428.2  1373.7   903.3   951.1
      mct_8tap_regular_w8_v_16bpc_neon:    1266.0   474.1   581.3   505.9   382.0   373.4
      mct_bilinear_w8_0_16bpc_neon:         232.9   126.2   225.0   166.3    86.2    91.7
      mct_bilinear_w8_h_16bpc_neon:         380.6   270.6   386.0   259.7   154.1   151.9
      mct_bilinear_w8_hv_16bpc_neon:        631.4   409.2   509.4   372.1   243.1   244.1
      mct_bilinear_w8_v_16bpc_neon:         349.5   233.5   347.9   212.4   138.7   138.4
      
      For comparison, the corresponding numbers for the existing arm64
      implementation:
      
                                                               Cortex A53     A72     A73
      mc_8tap_regular_w8_0_16bpc_neon:                               94.1    48.9    62.3
      mc_8tap_regular_w8_h_16bpc_neon:                              570.4   388.1   467.3
      mc_8tap_regular_w8_hv_16bpc_neon:                            1035.8   775.0   891.2
      mc_8tap_regular_w8_v_16bpc_neon:                              399.8   284.5   278.2
      mc_bilinear_w8_0_16bpc_neon:                                   90.0    44.3    57.4
      mc_bilinear_w8_h_16bpc_neon:                                  191.7   158.7   156.4
      mc_bilinear_w8_hv_16bpc_neon:                                 295.6   235.0   244.9
      mc_bilinear_w8_v_16bpc_neon:                                  147.2    99.0    88.8
      mct_8tap_regular_w8_0_16bpc_neon:                             139.4    78.4    84.9
      mct_8tap_regular_w8_h_16bpc_neon:                             612.3   395.9   478.6
      mct_8tap_regular_w8_hv_16bpc_neon:                           1113.0   804.3   963.5
      mct_8tap_regular_w8_v_16bpc_neon:                             462.1   370.8   353.3
      mct_bilinear_w8_0_16bpc_neon:                                 135.6    77.0    80.5
      mct_bilinear_w8_h_16bpc_neon:                                 210.8   159.2   141.7
      mct_bilinear_w8_hv_16bpc_neon:                                325.7   238.4   227.3
      mct_bilinear_w8_v_16bpc_neon:                                 180.7   136.7   129.5
      856662b4
    • Martin Storsjö's avatar
      arm64: mc: Apply tuning from w4/w8 case to w2 case in 16 bpc 8tap_hv · 4ae3f5f7
      Martin Storsjö authored
      Narrowing the intermediates from the horizontal pass is beneficial
      (on most cores, but a small slowdown on A53) here as well. This
      increases consistency in the code between the cases.
      
      (The corresponding change in the upcoming arm32 version is beneficial
      on all tested cores except for on A53 - it helps, on some cores a lot,
      on A7, A8, A9, A72, A73 and only makes it marginally slower on A53.)
      
      Before:                        Cortex A53     A72     A73
      mc_8tap_regular_w2_hv_16bpc_neon:   457.7   301.0   317.1
      After:
      mc_8tap_regular_w2_hv_16bpc_neon:   472.0   276.0   284.3
      4ae3f5f7
    • Martin Storsjö's avatar
      arm: mc: Avoid an unnecessary mov in 8tap_hv w2 · 65a1aafd
      Martin Storsjö authored
      This matches how the same logic is written for w4 and above.
      65a1aafd
    • Martin Storsjö's avatar
    • Martin Storsjö's avatar
      arm32: mc: Use narrower vext.8 in 8tap_w4_h · ea7e13e7
      Martin Storsjö authored
      The previous form was a leftover from how it had to be written on
      aarch64.
      ea7e13e7
    • Martin Storsjö's avatar
      arm64: mc: Use more descriptive element specifiers for loads/stores in 16 bpc put_neon · 13fad75d
      Martin Storsjö authored
      For loads of a half/full register, the actual size of the elements
      doesn't matter, but it makes the code more readable and understandable.
      13fad75d
  5. Sep 01, 2020
    • Henrik Gramner's avatar
      cli: Use proper integer math in Y4M PAR calculations · 3bfe8c7c
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      The previous floating-point implementation produced results that were
      sometimes slightly off due to rounding errors.
      
      For example, a frame size of 432x240 with a render size of 176x240
      previously resulted in a PAR of 98:240 instead of the correct 11:27.
      
      Also reduce fractions to produce more readable numbers.
      3bfe8c7c
  6. Aug 30, 2020
  7. Aug 29, 2020
    • Martin Storsjö's avatar
      arm32: mc: NEON implementation of avg/mask/w_avg for 16 bpc · 80aa7823
      Martin Storsjö authored
                            Cortex A7       A8       A9      A53      A72      A73
      avg_w4_16bpc_neon:        131.4     81.8    117.3    111.0     50.9     58.8
      avg_w8_16bpc_neon:        291.9    173.1    293.1    230.9    114.7    128.8
      avg_w16_16bpc_neon:       803.3    480.1    821.4    645.8    345.7    384.9
      avg_w32_16bpc_neon:      3350.0   1833.1   3188.1   2343.5   1343.9   1500.6
      avg_w64_16bpc_neon:      8185.9   4390.6  10448.2   6078.8   3303.6   3466.7
      avg_w128_16bpc_neon:    22384.3  10901.2  33721.9  16782.7   8165.1   8416.5
      w_avg_w4_16bpc_neon:      251.3    165.8    203.9    158.3     99.6    106.9
      w_avg_w8_16bpc_neon:      638.4    427.8    555.7    365.1    283.2    277.4
      w_avg_w16_16bpc_neon:    1912.3   1257.5   1623.4   1056.5    879.5    841.8
      w_avg_w32_16bpc_neon:    7461.3   4889.6   6383.8   3966.3   3286.8   3296.8
      w_avg_w64_16bpc_neon:   18689.3  11698.1  18487.3  10134.1   8156.2   7939.5
      w_avg_w128_16bpc_neon:  48776.6  28989.0  53203.3  26004.1  20055.2  20049.4
      mask_w4_16bpc_neon:       298.6    189.2    242.3    191.6    115.2    129.6
      mask_w8_16bpc_neon:       768.6    501.5    646.1    432.4    302.9    326.8
      mask_w16_16bpc_neon:     2320.5   1480.9   1873.0   1270.2    932.2    976.1
      mask_w32_16bpc_neon:     9412.0   5791.9   7348.5   4875.1   3896.4   3821.1
      mask_w64_16bpc_neon:    23385.9  13875.6  21383.8  12235.9   9469.2   9160.2
      mask_w128_16bpc_neon:   60466.4  34762.6  61055.9  31214.0  23299.0  23324.5
      
      For comparison, the corresponding numbers for the existing arm64
      implementation:
      
      avg_w4_16bpc_neon:                                    78.0     38.5     50.0
      avg_w8_16bpc_neon:                                   198.3    105.4    117.8
      avg_w16_16bpc_neon:                                  614.9    339.9    376.7
      avg_w32_16bpc_neon:                                 2313.8   1391.1   1487.7
      avg_w64_16bpc_neon:                                 5733.3   3269.1   3648.4
      avg_w128_16bpc_neon:                               15105.9   8143.5   8970.4
      w_avg_w4_16bpc_neon:                                 119.2     87.7     92.9
      w_avg_w8_16bpc_neon:                                 322.9    252.3    263.5
      w_avg_w16_16bpc_neon:                               1016.8    794.0    828.6
      w_avg_w32_16bpc_neon:                               3910.9   3159.6   3308.3
      w_avg_w64_16bpc_neon:                               9499.6   7933.9   8026.5
      w_avg_w128_16bpc_neon:                             24508.3  19502.0  20389.8
      mask_w4_16bpc_neon:                                  138.9     98.7    106.7
      mask_w8_16bpc_neon:                                  375.5    301.1    302.7
      mask_w16_16bpc_neon:                                1217.2   1064.6    954.4
      mask_w32_16bpc_neon:                                4821.0   4018.4   3825.7
      mask_w64_16bpc_neon:                               12262.7   9471.3   9169.7
      mask_w128_16bpc_neon:                              31356.6  22657.6  23324.5
      80aa7823
  8. Aug 28, 2020
  9. Aug 22, 2020
  10. Aug 21, 2020
  11. Aug 07, 2020
    • Martin Storsjö's avatar
      checkasm: Add ifdefs around the readtime check · 5bbd9632
      Martin Storsjö authored
      This fixes building in configurations where no readtime implementation
      is available at all, such as MSVC targeting 32 bit ARM.
      
      This was missed when the check was added in
      95a19254.
      5bbd9632
    • Martin Storsjö's avatar
      checkasm: Enforce declare_func to be outside of check_func · 0b824944
      Martin Storsjö authored
      Move the declaration of func_ref/func_new into declare_func. This
      enforces that declare_func is a scope outside of/before check_func.
      
      This ensures that if the signal handler is triggered, we rewind
      to a scope outside of check_func, where check_func makes sure we
      don't rerun the test that just triggered the signal handler.
      0b824944
  12. Aug 06, 2020
  13. Aug 05, 2020
  14. Jul 20, 2020
  15. Jul 13, 2020
  16. Jul 10, 2020
  17. Jul 09, 2020
Loading