Skip to content
Snippets Groups Projects
  1. Dec 18, 2019
    • Martin Storsjö's avatar
      Don't assume dlsym exists on linux · 14d586ac
      Martin Storsjö authored
      After checking if -ldl exists, use it for checking for the dlsym
      function.
      
      This fixes building in environments where the dlsym function is
      unavailable. (My testcase is NDK builds with -static, where dlsym
      isn't available for static linking, only if linking dynamically.)
      14d586ac
  2. Dec 17, 2019
  3. Dec 14, 2019
  4. Dec 13, 2019
  5. Dec 05, 2019
  6. Dec 02, 2019
  7. Nov 30, 2019
  8. Nov 27, 2019
    • Henrik Gramner's avatar
      Avoid excessive L2 collisions with certain frame widths · 82eda83a
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      Memory addresses with certain power-of-two offsets will map to the
      same set of cache lines. Using such offsets as strides will cause
      excessive cache evictions resulting in more cache misses.
      
      Avoid this by adding a small padding when the stride is a multiple
      of 1024 (somewhat arbitrarily chosen as the specific number depends
      on the hardware implementation) when allocating picture buffers.
      82eda83a
  9. Nov 26, 2019
  10. Nov 24, 2019
  11. Nov 23, 2019
  12. Nov 22, 2019
  13. Nov 21, 2019
  14. Nov 17, 2019
  15. Nov 16, 2019
  16. Nov 15, 2019
  17. Nov 12, 2019
    • Martin Storsjö's avatar
      arm: 64: loopfilter: Avoid nested ifdefs where easily possible · dcbbf775
      Martin Storsjö authored
      This was requested in the review of the arm32 version of the same.
      dcbbf775
    • Martin Storsjö's avatar
      arm: 64: loopfilter: Fix a typo in a macro parameter condition · 564482b6
      Martin Storsjö authored
      This removes one redundant instruction for loop filters smaller
      than 16.
      564482b6
    • Martin Storsjö's avatar
      arm64: loopfilter: Reorder instructions and tweak register use to match the arm32 port · 3069ab94
      Martin Storsjö authored
      This doesn't change performance measurably, but eases potential
      future maintainance of the code.
      3069ab94
    • Martin Storsjö's avatar
      abd07c67
    • Martin Storsjö's avatar
      arm: 32: Port the arm64 NEON loopfilter to arm32 · 9a100261
      Martin Storsjö authored
      The code is a fairly exact 1:1 port of the ARM64 code, but operating
      on 8 pixels at a time, instead of 16.
      
      Relative speedup over C code according to checkasm:
                             Cortex A7     A8     A9    A53    A72    A73
      lpf_h_sb_uv_w4_8bpc_neon:   1.36   1.40   1.25   1.71   1.55   1.59
      lpf_h_sb_uv_w6_8bpc_neon:   2.18   2.11   1.74   2.65   2.32   2.34
      lpf_h_sb_y_w4_8bpc_neon:    1.48   1.43   1.20   1.91   1.49   1.64
      lpf_h_sb_y_w8_8bpc_neon:    2.34   2.05   1.78   2.84   2.35   2.69
      lpf_h_sb_y_w16_8bpc_neon:   2.13   1.83   1.63   2.51   2.10   2.35
      lpf_v_sb_uv_w4_8bpc_neon:   1.69   1.66   1.60   2.16   2.24   2.24
      lpf_v_sb_uv_w6_8bpc_neon:   2.68   2.43   2.22   3.53   3.44   3.35
      lpf_v_sb_y_w4_8bpc_neon:    1.74   1.74   1.43   2.34   2.14   2.18
      lpf_v_sb_y_w8_8bpc_neon:    2.92   2.47   2.19   3.55   3.22   3.54
      lpf_v_sb_y_w16_8bpc_neon:   2.62   2.19   1.98   3.25   2.80   3.10
      
      Comparison to the original ARM64 assembly:
      ARM64:                        A53     A72     A73
      lpf_h_sb_uv_w4_8bpc_neon:   702.5   518.2   529.1
      lpf_h_sb_uv_w6_8bpc_neon:  1007.3   672.6   736.6
      lpf_h_sb_y_w4_8bpc_neon:   1652.8  1261.2  1276.5
      lpf_h_sb_y_w8_8bpc_neon:   2144.7  1559.8  1638.7
      lpf_h_sb_y_w16_8bpc_neon:  2318.3  1757.2  1792.8
      lpf_v_sb_uv_w4_8bpc_neon:   447.1   302.0   292.4
      lpf_v_sb_uv_w6_8bpc_neon:   600.0   397.7   406.9
      lpf_v_sb_y_w4_8bpc_neon:   1212.6   840.1   818.4
      lpf_v_sb_y_w8_8bpc_neon:   1623.3  1167.4  1156.7
      lpf_v_sb_y_w16_8bpc_neon:  1694.9  1237.9  1182.3
      ARM32:
      lpf_h_sb_uv_w4_8bpc_neon:   821.2   501.1   500.8
      lpf_h_sb_uv_w6_8bpc_neon:  1232.0   715.7   746.6
      lpf_h_sb_y_w4_8bpc_neon:   2208.1  1373.2  1414.7
      lpf_h_sb_y_w8_8bpc_neon:   3138.3  1843.1  1915.2
      lpf_h_sb_y_w16_8bpc_neon:  3293.1  1842.5  1975.9
      lpf_v_sb_uv_w4_8bpc_neon:   619.9   326.7   324.9
      lpf_v_sb_uv_w6_8bpc_neon:   855.9   446.7   468.2
      lpf_v_sb_y_w4_8bpc_neon:   1737.6   935.5  1007.0
      lpf_v_sb_y_w8_8bpc_neon:   2346.7  1232.8  1298.3
      lpf_v_sb_y_w16_8bpc_neon:  2353.4  1283.4  1379.9
      9a100261
Loading