Skip to content
Snippets Groups Projects
  1. Oct 21, 2020
  2. Oct 02, 2020
  3. Oct 01, 2020
  4. Sep 27, 2020
  5. Sep 24, 2020
  6. Sep 20, 2020
  7. Sep 17, 2020
  8. Sep 15, 2020
    • Wan-Teh Chang's avatar
      Ban op->idc that may drop all layer-specific OBUs · 50e876c6
      Wan-Teh Chang authored
      If c->operating_point_idc is nonzero and either bits 0-7 or bits 8-11 in
      it are all 0s, it will cause dav1d_parse_obus() to drop all
      layer-specific OBUs. Prohibit any op->idc with such properties because
      it could be selected as c->operating_point_idc.
      50e876c6
  9. Sep 06, 2020
  10. Sep 03, 2020
    • Martin Storsjö's avatar
      arm32: mc: NEON implementation of put/prep 8tap/bilin for 16 bpc · 856662b4
      Martin Storsjö authored
      Examples of checkasm benchmarks:
                                        Cortex A7      A8      A9     A53     A72     A73
      mc_8tap_regular_w8_0_16bpc_neon:      158.7   106.2   167.0   127.9    55.0    77.2
      mc_8tap_regular_w8_h_16bpc_neon:     1000.8   557.5   749.2   609.2   401.4   485.4
      mc_8tap_regular_w8_hv_16bpc_neon:    2278.9  1255.4  1352.5  1277.2   867.8   915.9
      mc_8tap_regular_w8_v_16bpc_neon:     1060.0   393.6   485.5   448.3   298.0   298.2
      mc_bilinear_w8_0_16bpc_neon:          159.7    96.6   161.1   123.7    55.4    74.7
      mc_bilinear_w8_h_16bpc_neon:          342.3   250.8   352.9   239.0   158.4   165.1
      mc_bilinear_w8_hv_16bpc_neon:         587.7   373.8   469.0   339.8   244.4   247.5
      mc_bilinear_w8_v_16bpc_neon:          285.8   189.3   284.9   180.4   103.4   100.9
      mct_8tap_regular_w8_0_16bpc_neon:     233.0   136.6   229.3   169.3    86.2    98.3
      mct_8tap_regular_w8_h_16bpc_neon:    1106.8   588.3   817.9   654.1   406.4   489.8
      mct_8tap_regular_w8_hv_16bpc_neon:   2473.3  1326.3  1428.2  1373.7   903.3   951.1
      mct_8tap_regular_w8_v_16bpc_neon:    1266.0   474.1   581.3   505.9   382.0   373.4
      mct_bilinear_w8_0_16bpc_neon:         232.9   126.2   225.0   166.3    86.2    91.7
      mct_bilinear_w8_h_16bpc_neon:         380.6   270.6   386.0   259.7   154.1   151.9
      mct_bilinear_w8_hv_16bpc_neon:        631.4   409.2   509.4   372.1   243.1   244.1
      mct_bilinear_w8_v_16bpc_neon:         349.5   233.5   347.9   212.4   138.7   138.4
      
      For comparison, the corresponding numbers for the existing arm64
      implementation:
      
                                                               Cortex A53     A72     A73
      mc_8tap_regular_w8_0_16bpc_neon:                               94.1    48.9    62.3
      mc_8tap_regular_w8_h_16bpc_neon:                              570.4   388.1   467.3
      mc_8tap_regular_w8_hv_16bpc_neon:                            1035.8   775.0   891.2
      mc_8tap_regular_w8_v_16bpc_neon:                              399.8   284.5   278.2
      mc_bilinear_w8_0_16bpc_neon:                                   90.0    44.3    57.4
      mc_bilinear_w8_h_16bpc_neon:                                  191.7   158.7   156.4
      mc_bilinear_w8_hv_16bpc_neon:                                 295.6   235.0   244.9
      mc_bilinear_w8_v_16bpc_neon:                                  147.2    99.0    88.8
      mct_8tap_regular_w8_0_16bpc_neon:                             139.4    78.4    84.9
      mct_8tap_regular_w8_h_16bpc_neon:                             612.3   395.9   478.6
      mct_8tap_regular_w8_hv_16bpc_neon:                           1113.0   804.3   963.5
      mct_8tap_regular_w8_v_16bpc_neon:                             462.1   370.8   353.3
      mct_bilinear_w8_0_16bpc_neon:                                 135.6    77.0    80.5
      mct_bilinear_w8_h_16bpc_neon:                                 210.8   159.2   141.7
      mct_bilinear_w8_hv_16bpc_neon:                                325.7   238.4   227.3
      mct_bilinear_w8_v_16bpc_neon:                                 180.7   136.7   129.5
      856662b4
    • Martin Storsjö's avatar
      arm64: mc: Apply tuning from w4/w8 case to w2 case in 16 bpc 8tap_hv · 4ae3f5f7
      Martin Storsjö authored
      Narrowing the intermediates from the horizontal pass is beneficial
      (on most cores, but a small slowdown on A53) here as well. This
      increases consistency in the code between the cases.
      
      (The corresponding change in the upcoming arm32 version is beneficial
      on all tested cores except for on A53 - it helps, on some cores a lot,
      on A7, A8, A9, A72, A73 and only makes it marginally slower on A53.)
      
      Before:                        Cortex A53     A72     A73
      mc_8tap_regular_w2_hv_16bpc_neon:   457.7   301.0   317.1
      After:
      mc_8tap_regular_w2_hv_16bpc_neon:   472.0   276.0   284.3
      4ae3f5f7
    • Martin Storsjö's avatar
      arm: mc: Avoid an unnecessary mov in 8tap_hv w2 · 65a1aafd
      Martin Storsjö authored
      This matches how the same logic is written for w4 and above.
      65a1aafd
    • Martin Storsjö's avatar
    • Martin Storsjö's avatar
      arm32: mc: Use narrower vext.8 in 8tap_w4_h · ea7e13e7
      Martin Storsjö authored
      The previous form was a leftover from how it had to be written on
      aarch64.
      ea7e13e7
    • Martin Storsjö's avatar
      arm64: mc: Use more descriptive element specifiers for loads/stores in 16 bpc put_neon · 13fad75d
      Martin Storsjö authored
      For loads of a half/full register, the actual size of the elements
      doesn't matter, but it makes the code more readable and understandable.
      13fad75d
  11. Sep 01, 2020
    • Henrik Gramner's avatar
      cli: Use proper integer math in Y4M PAR calculations · 3bfe8c7c
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      The previous floating-point implementation produced results that were
      sometimes slightly off due to rounding errors.
      
      For example, a frame size of 432x240 with a render size of 176x240
      previously resulted in a PAR of 98:240 instead of the correct 11:27.
      
      Also reduce fractions to produce more readable numbers.
      3bfe8c7c
  12. Aug 30, 2020
  13. Aug 29, 2020
    • Martin Storsjö's avatar
      arm32: mc: NEON implementation of avg/mask/w_avg for 16 bpc · 80aa7823
      Martin Storsjö authored
                            Cortex A7       A8       A9      A53      A72      A73
      avg_w4_16bpc_neon:        131.4     81.8    117.3    111.0     50.9     58.8
      avg_w8_16bpc_neon:        291.9    173.1    293.1    230.9    114.7    128.8
      avg_w16_16bpc_neon:       803.3    480.1    821.4    645.8    345.7    384.9
      avg_w32_16bpc_neon:      3350.0   1833.1   3188.1   2343.5   1343.9   1500.6
      avg_w64_16bpc_neon:      8185.9   4390.6  10448.2   6078.8   3303.6   3466.7
      avg_w128_16bpc_neon:    22384.3  10901.2  33721.9  16782.7   8165.1   8416.5
      w_avg_w4_16bpc_neon:      251.3    165.8    203.9    158.3     99.6    106.9
      w_avg_w8_16bpc_neon:      638.4    427.8    555.7    365.1    283.2    277.4
      w_avg_w16_16bpc_neon:    1912.3   1257.5   1623.4   1056.5    879.5    841.8
      w_avg_w32_16bpc_neon:    7461.3   4889.6   6383.8   3966.3   3286.8   3296.8
      w_avg_w64_16bpc_neon:   18689.3  11698.1  18487.3  10134.1   8156.2   7939.5
      w_avg_w128_16bpc_neon:  48776.6  28989.0  53203.3  26004.1  20055.2  20049.4
      mask_w4_16bpc_neon:       298.6    189.2    242.3    191.6    115.2    129.6
      mask_w8_16bpc_neon:       768.6    501.5    646.1    432.4    302.9    326.8
      mask_w16_16bpc_neon:     2320.5   1480.9   1873.0   1270.2    932.2    976.1
      mask_w32_16bpc_neon:     9412.0   5791.9   7348.5   4875.1   3896.4   3821.1
      mask_w64_16bpc_neon:    23385.9  13875.6  21383.8  12235.9   9469.2   9160.2
      mask_w128_16bpc_neon:   60466.4  34762.6  61055.9  31214.0  23299.0  23324.5
      
      For comparison, the corresponding numbers for the existing arm64
      implementation:
      
      avg_w4_16bpc_neon:                                    78.0     38.5     50.0
      avg_w8_16bpc_neon:                                   198.3    105.4    117.8
      avg_w16_16bpc_neon:                                  614.9    339.9    376.7
      avg_w32_16bpc_neon:                                 2313.8   1391.1   1487.7
      avg_w64_16bpc_neon:                                 5733.3   3269.1   3648.4
      avg_w128_16bpc_neon:                               15105.9   8143.5   8970.4
      w_avg_w4_16bpc_neon:                                 119.2     87.7     92.9
      w_avg_w8_16bpc_neon:                                 322.9    252.3    263.5
      w_avg_w16_16bpc_neon:                               1016.8    794.0    828.6
      w_avg_w32_16bpc_neon:                               3910.9   3159.6   3308.3
      w_avg_w64_16bpc_neon:                               9499.6   7933.9   8026.5
      w_avg_w128_16bpc_neon:                             24508.3  19502.0  20389.8
      mask_w4_16bpc_neon:                                  138.9     98.7    106.7
      mask_w8_16bpc_neon:                                  375.5    301.1    302.7
      mask_w16_16bpc_neon:                                1217.2   1064.6    954.4
      mask_w32_16bpc_neon:                                4821.0   4018.4   3825.7
      mask_w64_16bpc_neon:                               12262.7   9471.3   9169.7
      mask_w128_16bpc_neon:                              31356.6  22657.6  23324.5
      80aa7823
  14. Aug 28, 2020
  15. Aug 22, 2020
  16. Aug 21, 2020
  17. Aug 07, 2020
Loading