Skip to content
Snippets Groups Projects
  1. Jan 03, 2022
  2. Jan 01, 2022
    • Niklas Haas's avatar
      Expose dav1d_apply_grain as part of the public API · 2a183945
      Niklas Haas authored
      This change is motivated by a desire to be able to toggle between CPU
      and GPU film gain synthesis in players such as VLC. Because VLC
      initializes the codec before the vout (and, indeed, the active vout
      module may change in the middle of decoding), it cannot make the
      decision of whether to apply film grain in libdav1d as part of codec
      initialization. It needs to be decided on a frame-by-frame basis
      depending on whether the currently active vout supports film grain
      synthesis or not.
      
      Using the new API, users like VLC can simply set `apply_grain` to 0 and
      then manually call `dav1d_apply_grain` whenever the vout does not
      support GPU film grain synthesis. As a side note, `dav1d_apply_grain`
      could also technically be called from dedicated worker threads,
      something that libdav1d does not currently do internally.
      
      The alternative to this solution would have been to allow changing
      Dav1dSettings at runtime, but that would be more invasive and a proper
      API would also need to take other settings into consideration, some of
      which can't be changed as easily as `apply_grain`. This commit
      represents a stop-gap solution.
      
      Bump the minor version to allow clients to depend on this API.
      2a183945
  3. Dec 29, 2021
  4. Dec 28, 2021
  5. Dec 13, 2021
    • Henrik Gramner's avatar
      x86: Add 10-bit sgr AVX-512 (Ice Lake) asm · b430f8ff
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      b430f8ff
    • Victorien Le Couviour--Tuffet's avatar
      x86: Add 8-bit mc(t)_scaled SSSE3 32-bit asm · 42ad602d
      Victorien Le Couviour--Tuffet authored
      mc_scaled_8tap_regular_w2_8bpc_c: 1070.7
      mc_scaled_8tap_regular_w2_8bpc_ssse3: 253.0
      mc_scaled_8tap_regular_w2_dy1_8bpc_c: 1079.9
      mc_scaled_8tap_regular_w2_dy1_8bpc_ssse3: 114.8
      mc_scaled_8tap_regular_w2_dy2_8bpc_c: 1466.1
      mc_scaled_8tap_regular_w2_dy2_8bpc_ssse3: 145.7
      mc_scaled_8tap_regular_w4_8bpc_c: 1965.4
      mc_scaled_8tap_regular_w4_8bpc_ssse3: 251.4
      mc_scaled_8tap_regular_w4_dy1_8bpc_c: 1989.4
      mc_scaled_8tap_regular_w4_dy1_8bpc_ssse3: 166.1
      mc_scaled_8tap_regular_w4_dy2_8bpc_c: 2728.8
      mc_scaled_8tap_regular_w4_dy2_8bpc_ssse3: 163.4
      mc_scaled_8tap_regular_w8_8bpc_c: 3670.1
      mc_scaled_8tap_regular_w8_8bpc_ssse3: 477.0
      mc_scaled_8tap_regular_w8_dy1_8bpc_c: 3651.1
      mc_scaled_8tap_regular_w8_dy1_8bpc_ssse3: 464.8
      mc_scaled_8tap_regular_w8_dy2_8bpc_c: 5079.6
      mc_scaled_8tap_regular_w8_dy2_8bpc_ssse3: 494.0
      mc_scaled_8tap_regular_w16_8bpc_c: 8366.9
      mc_scaled_8tap_regular_w16_8bpc_ssse3: 1197.4
      mc_scaled_8tap_regular_w16_dy1_8bpc_c: 9088.5
      mc_scaled_8tap_regular_w16_dy1_8bpc_ssse3: 1212.6
      mc_scaled_8tap_regular_w16_dy2_8bpc_c: 13166.1
      mc_scaled_8tap_regular_w16_dy2_8bpc_ssse3: 1301.4
      mc_scaled_8tap_regular_w32_8bpc_c: 29883.7
      mc_scaled_8tap_regular_w32_8bpc_ssse3: 3990.3
      mc_scaled_8tap_regular_w32_dy1_8bpc_c: 23404.1
      mc_scaled_8tap_regular_w32_dy1_8bpc_ssse3: 3617.4
      mc_scaled_8tap_regular_w32_dy2_8bpc_c: 36248.3
      mc_scaled_8tap_regular_w32_dy2_8bpc_ssse3: 3949.3
      mc_scaled_8tap_regular_w64_8bpc_c: 57228.6
      mc_scaled_8tap_regular_w64_8bpc_ssse3: 9359.4
      mc_scaled_8tap_regular_w64_dy1_8bpc_c: 87271.8
      mc_scaled_8tap_regular_w64_dy1_8bpc_ssse3: 12472.7
      mc_scaled_8tap_regular_w64_dy2_8bpc_c: 135050.9
      mc_scaled_8tap_regular_w64_dy2_8bpc_ssse3: 13585.4
      mc_scaled_8tap_regular_w128_8bpc_c: 219123.0
      mc_scaled_8tap_regular_w128_8bpc_ssse3: 31867.7
      mc_scaled_8tap_regular_w128_dy1_8bpc_c: 240143.3
      mc_scaled_8tap_regular_w128_dy1_8bpc_ssse3: 35275.7
      mc_scaled_8tap_regular_w128_dy2_8bpc_c: 376357.7
      mc_scaled_8tap_regular_w128_dy2_8bpc_ssse3: 39411.4
      
      mct_scaled_8tap_regular_w4_8bpc_c: 1178.7
      mct_scaled_8tap_regular_w4_8bpc_ssse3: 176.8
      mct_scaled_8tap_regular_w4_dy1_8bpc_c: 1354.8
      mct_scaled_8tap_regular_w4_dy1_8bpc_ssse3: 131.5
      mct_scaled_8tap_regular_w4_dy2_8bpc_c: 1832.2
      mct_scaled_8tap_regular_w4_dy2_8bpc_ssse3: 123.0
      mct_scaled_8tap_regular_w8_8bpc_c: 3547.6
      mct_scaled_8tap_regular_w8_8bpc_ssse3: 526.0
      mct_scaled_8tap_regular_w8_dy1_8bpc_c: 3683.8
      mct_scaled_8tap_regular_w8_dy1_8bpc_ssse3: 513.8
      mct_scaled_8tap_regular_w8_dy2_8bpc_c: 5260.7
      mct_scaled_8tap_regular_w8_dy2_8bpc_ssse3: 566.1
      mct_scaled_8tap_regular_w16_8bpc_c: 8424.5
      mct_scaled_8tap_regular_w16_8bpc_ssse3: 1340.0
      mct_scaled_8tap_regular_w16_dy1_8bpc_c: 9515.8
      mct_scaled_8tap_regular_w16_dy1_8bpc_ssse3: 1337.0
      mct_scaled_8tap_regular_w16_dy2_8bpc_c: 14247.3
      mct_scaled_8tap_regular_w16_dy2_8bpc_ssse3: 1492.7
      mct_scaled_8tap_regular_w32_8bpc_c: 32059.9
      mct_scaled_8tap_regular_w32_8bpc_ssse3: 5177.5
      mct_scaled_8tap_regular_w32_dy1_8bpc_c: 32557.6
      mct_scaled_8tap_regular_w32_dy1_8bpc_ssse3: 4889.9
      mct_scaled_8tap_regular_w32_dy2_8bpc_c: 50844.2
      mct_scaled_8tap_regular_w32_dy2_8bpc_ssse3: 5667.1
      mct_scaled_8tap_regular_w64_8bpc_c: 59903.1
      mct_scaled_8tap_regular_w64_8bpc_ssse3: 10453.6
      mct_scaled_8tap_regular_w64_dy1_8bpc_c: 80298.8
      mct_scaled_8tap_regular_w64_dy1_8bpc_ssse3: 12597.8
      mct_scaled_8tap_regular_w64_dy2_8bpc_c: 127244.8
      mct_scaled_8tap_regular_w64_dy2_8bpc_ssse3: 14677.9
      mct_scaled_8tap_regular_w128_8bpc_c: 280097.0
      mct_scaled_8tap_regular_w128_8bpc_ssse3: 41989.3
      mct_scaled_8tap_regular_w128_dy1_8bpc_c: 208913.2
      mct_scaled_8tap_regular_w128_dy1_8bpc_ssse3: 35525.2
      mct_scaled_8tap_regular_w128_dy2_8bpc_c: 341367.6
      mct_scaled_8tap_regular_w128_dy2_8bpc_ssse3: 41449.0
      42ad602d
  6. Dec 04, 2021
    • Matthias Dressel's avatar
      x86/itx: Add 16x8 12bpc AVX2 transforms · e8a3f99d
      Matthias Dressel authored
      inv_txfm_add_16x8_adst_adst_0_12bpc_c: 4517.9
      inv_txfm_add_16x8_adst_adst_0_12bpc_avx2: 432.4
      inv_txfm_add_16x8_adst_adst_1_12bpc_c: 4510.9
      inv_txfm_add_16x8_adst_adst_1_12bpc_avx2: 432.4
      inv_txfm_add_16x8_adst_adst_2_12bpc_c: 4498.6
      inv_txfm_add_16x8_adst_adst_2_12bpc_avx2: 432.4
      inv_txfm_add_16x8_adst_dct_0_12bpc_c: 4553.8
      inv_txfm_add_16x8_adst_dct_0_12bpc_avx2: 389.1
      inv_txfm_add_16x8_adst_dct_1_12bpc_c: 4543.3
      inv_txfm_add_16x8_adst_dct_1_12bpc_avx2: 389.1
      inv_txfm_add_16x8_adst_dct_2_12bpc_c: 4538.4
      inv_txfm_add_16x8_adst_dct_2_12bpc_avx2: 389.1
      inv_txfm_add_16x8_adst_flipadst_0_12bpc_c: 4532.6
      inv_txfm_add_16x8_adst_flipadst_0_12bpc_avx2: 435.4
      inv_txfm_add_16x8_adst_flipadst_1_12bpc_c: 4520.4
      inv_txfm_add_16x8_adst_flipadst_1_12bpc_avx2: 435.4
      inv_txfm_add_16x8_adst_flipadst_2_12bpc_c: 4516.2
      inv_txfm_add_16x8_adst_flipadst_2_12bpc_avx2: 435.4
      inv_txfm_add_16x8_adst_identity_0_12bpc_c: 3502.3
      inv_txfm_add_16x8_adst_identity_0_12bpc_avx2: 255.9
      inv_txfm_add_16x8_adst_identity_1_12bpc_c: 3492.9
      inv_txfm_add_16x8_adst_identity_1_12bpc_avx2: 256.3
      inv_txfm_add_16x8_adst_identity_2_12bpc_c: 3471.4
      inv_txfm_add_16x8_adst_identity_2_12bpc_avx2: 256.7
      inv_txfm_add_16x8_dct_adst_0_12bpc_c: 4563.2
      inv_txfm_add_16x8_dct_adst_0_12bpc_avx2: 383.6
      inv_txfm_add_16x8_dct_adst_1_12bpc_c: 4573.1
      inv_txfm_add_16x8_dct_adst_1_12bpc_avx2: 383.9
      inv_txfm_add_16x8_dct_adst_2_12bpc_c: 4562.2
      inv_txfm_add_16x8_dct_adst_2_12bpc_avx2: 383.7
      inv_txfm_add_16x8_dct_dct_0_12bpc_c: 514.0
      inv_txfm_add_16x8_dct_dct_0_12bpc_avx2: 25.0
      inv_txfm_add_16x8_dct_dct_1_12bpc_c: 4540.5
      inv_txfm_add_16x8_dct_dct_1_12bpc_avx2: 340.4
      inv_txfm_add_16x8_dct_dct_2_12bpc_c: 4563.0
      inv_txfm_add_16x8_dct_dct_2_12bpc_avx2: 339.3
      inv_txfm_add_16x8_dct_flipadst_0_12bpc_c: 4568.0
      inv_txfm_add_16x8_dct_flipadst_0_12bpc_avx2: 385.9
      inv_txfm_add_16x8_dct_flipadst_1_12bpc_c: 4577.5
      inv_txfm_add_16x8_dct_flipadst_1_12bpc_avx2: 385.8
      inv_txfm_add_16x8_dct_flipadst_2_12bpc_c: 4573.8
      inv_txfm_add_16x8_dct_flipadst_2_12bpc_avx2: 385.8
      inv_txfm_add_16x8_dct_identity_0_12bpc_c: 3549.9
      inv_txfm_add_16x8_dct_identity_0_12bpc_avx2: 212.1
      inv_txfm_add_16x8_dct_identity_1_12bpc_c: 3538.7
      inv_txfm_add_16x8_dct_identity_1_12bpc_avx2: 212.1
      inv_txfm_add_16x8_dct_identity_2_12bpc_c: 3539.7
      inv_txfm_add_16x8_dct_identity_2_12bpc_avx2: 212.1
      inv_txfm_add_16x8_flipadst_adst_0_12bpc_c: 4495.3
      inv_txfm_add_16x8_flipadst_adst_0_12bpc_avx2: 431.4
      inv_txfm_add_16x8_flipadst_adst_1_12bpc_c: 4496.3
      inv_txfm_add_16x8_flipadst_adst_1_12bpc_avx2: 431.4
      inv_txfm_add_16x8_flipadst_adst_2_12bpc_c: 4499.2
      inv_txfm_add_16x8_flipadst_adst_2_12bpc_avx2: 431.3
      inv_txfm_add_16x8_flipadst_dct_0_12bpc_c: 4506.9
      inv_txfm_add_16x8_flipadst_dct_0_12bpc_avx2: 386.3
      inv_txfm_add_16x8_flipadst_dct_1_12bpc_c: 4512.9
      inv_txfm_add_16x8_flipadst_dct_1_12bpc_avx2: 386.0
      inv_txfm_add_16x8_flipadst_dct_2_12bpc_c: 4503.2
      inv_txfm_add_16x8_flipadst_dct_2_12bpc_avx2: 386.0
      inv_txfm_add_16x8_flipadst_flipadst_0_12bpc_c: 4509.1
      inv_txfm_add_16x8_flipadst_flipadst_0_12bpc_avx2: 432.2
      inv_txfm_add_16x8_flipadst_flipadst_1_12bpc_c: 4519.0
      inv_txfm_add_16x8_flipadst_flipadst_1_12bpc_avx2: 432.1
      inv_txfm_add_16x8_flipadst_flipadst_2_12bpc_c: 4518.3
      inv_txfm_add_16x8_flipadst_flipadst_2_12bpc_avx2: 432.1
      inv_txfm_add_16x8_flipadst_identity_0_12bpc_c: 3511.0
      inv_txfm_add_16x8_flipadst_identity_0_12bpc_avx2: 257.1
      inv_txfm_add_16x8_flipadst_identity_1_12bpc_c: 3518.5
      inv_txfm_add_16x8_flipadst_identity_1_12bpc_avx2: 257.2
      inv_txfm_add_16x8_flipadst_identity_2_12bpc_c: 3521.7
      inv_txfm_add_16x8_flipadst_identity_2_12bpc_avx2: 257.1
      inv_txfm_add_16x8_identity_adst_0_12bpc_c: 3166.8
      inv_txfm_add_16x8_identity_adst_0_12bpc_avx2: 268.6
      inv_txfm_add_16x8_identity_adst_1_12bpc_c: 3157.9
      inv_txfm_add_16x8_identity_adst_1_12bpc_avx2: 268.6
      inv_txfm_add_16x8_identity_adst_2_12bpc_c: 3156.5
      inv_txfm_add_16x8_identity_adst_2_12bpc_avx2: 268.6
      inv_txfm_add_16x8_identity_dct_0_12bpc_c: 3187.4
      inv_txfm_add_16x8_identity_dct_0_12bpc_avx2: 224.4
      inv_txfm_add_16x8_identity_dct_1_12bpc_c: 3185.8
      inv_txfm_add_16x8_identity_dct_1_12bpc_avx2: 224.4
      inv_txfm_add_16x8_identity_dct_2_12bpc_c: 3190.8
      inv_txfm_add_16x8_identity_dct_2_12bpc_avx2: 224.4
      inv_txfm_add_16x8_identity_flipadst_0_12bpc_c: 3167.7
      inv_txfm_add_16x8_identity_flipadst_0_12bpc_avx2: 269.7
      inv_txfm_add_16x8_identity_flipadst_1_12bpc_c: 3174.1
      inv_txfm_add_16x8_identity_flipadst_1_12bpc_avx2: 269.8
      inv_txfm_add_16x8_identity_flipadst_2_12bpc_c: 3174.7
      inv_txfm_add_16x8_identity_flipadst_2_12bpc_avx2: 269.7
      inv_txfm_add_16x8_identity_identity_0_12bpc_c: 2153.3
      inv_txfm_add_16x8_identity_identity_0_12bpc_avx2: 99.1
      inv_txfm_add_16x8_identity_identity_1_12bpc_c: 2143.6
      inv_txfm_add_16x8_identity_identity_1_12bpc_avx2: 99.3
      inv_txfm_add_16x8_identity_identity_2_12bpc_c: 2145.9
      inv_txfm_add_16x8_identity_identity_2_12bpc_avx2: 98.6
      e8a3f99d
    • Matthias Dressel's avatar
      x86/itx: Add 8x16 12bpc AVX2 transforms · 23e8405c
      Matthias Dressel authored
      inv_txfm_add_8x16_adst_adst_0_12bpc_c: 4440.4
      inv_txfm_add_8x16_adst_adst_0_12bpc_avx2: 354.3
      inv_txfm_add_8x16_adst_adst_1_12bpc_c: 4437.3
      inv_txfm_add_8x16_adst_adst_1_12bpc_avx2: 354.3
      inv_txfm_add_8x16_adst_adst_2_12bpc_c: 4438.8
      inv_txfm_add_8x16_adst_adst_2_12bpc_avx2: 442.6
      inv_txfm_add_8x16_adst_dct_0_12bpc_c: 4507.3
      inv_txfm_add_8x16_adst_dct_0_12bpc_avx2: 310.0
      inv_txfm_add_8x16_adst_dct_1_12bpc_c: 4500.3
      inv_txfm_add_8x16_adst_dct_1_12bpc_avx2: 310.0
      inv_txfm_add_8x16_adst_dct_2_12bpc_c: 4516.1
      inv_txfm_add_8x16_adst_dct_2_12bpc_avx2: 399.5
      inv_txfm_add_8x16_adst_flipadst_0_12bpc_c: 4457.3
      inv_txfm_add_8x16_adst_flipadst_0_12bpc_avx2: 355.6
      inv_txfm_add_8x16_adst_flipadst_1_12bpc_c: 4441.3
      inv_txfm_add_8x16_adst_flipadst_1_12bpc_avx2: 355.6
      inv_txfm_add_8x16_adst_flipadst_2_12bpc_c: 4448.9
      inv_txfm_add_8x16_adst_flipadst_2_12bpc_avx2: 445.5
      inv_txfm_add_8x16_adst_identity_0_12bpc_c: 3204.0
      inv_txfm_add_8x16_adst_identity_0_12bpc_avx2: 173.1
      inv_txfm_add_8x16_adst_identity_1_12bpc_c: 3207.1
      inv_txfm_add_8x16_adst_identity_1_12bpc_avx2: 173.6
      inv_txfm_add_8x16_adst_identity_2_12bpc_c: 3210.4
      inv_txfm_add_8x16_adst_identity_2_12bpc_avx2: 261.2
      inv_txfm_add_8x16_dct_adst_0_12bpc_c: 4484.2
      inv_txfm_add_8x16_dct_adst_0_12bpc_avx2: 334.0
      inv_txfm_add_8x16_dct_adst_1_12bpc_c: 4503.8
      inv_txfm_add_8x16_dct_adst_1_12bpc_avx2: 334.6
      inv_txfm_add_8x16_dct_adst_2_12bpc_c: 4490.7
      inv_txfm_add_8x16_dct_adst_2_12bpc_avx2: 395.6
      inv_txfm_add_8x16_dct_dct_0_12bpc_c: 419.9
      inv_txfm_add_8x16_dct_dct_0_12bpc_avx2: 37.6
      inv_txfm_add_8x16_dct_dct_1_12bpc_c: 4482.6
      inv_txfm_add_8x16_dct_dct_1_12bpc_avx2: 284.6
      inv_txfm_add_8x16_dct_dct_2_12bpc_c: 4468.7
      inv_txfm_add_8x16_dct_dct_2_12bpc_avx2: 348.3
      inv_txfm_add_8x16_dct_flipadst_0_12bpc_c: 4468.4
      inv_txfm_add_8x16_dct_flipadst_0_12bpc_avx2: 333.6
      inv_txfm_add_8x16_dct_flipadst_1_12bpc_c: 4463.5
      inv_txfm_add_8x16_dct_flipadst_1_12bpc_avx2: 333.5
      inv_txfm_add_8x16_dct_flipadst_2_12bpc_c: 4459.4
      inv_txfm_add_8x16_dct_flipadst_2_12bpc_avx2: 397.4
      inv_txfm_add_8x16_dct_identity_0_12bpc_c: 3237.1
      inv_txfm_add_8x16_dct_identity_0_12bpc_avx2: 149.6
      inv_txfm_add_8x16_dct_identity_1_12bpc_c: 3229.9
      inv_txfm_add_8x16_dct_identity_1_12bpc_avx2: 148.6
      inv_txfm_add_8x16_dct_identity_2_12bpc_c: 3225.6
      inv_txfm_add_8x16_dct_identity_2_12bpc_avx2: 211.3
      inv_txfm_add_8x16_flipadst_adst_0_12bpc_c: 4532.1
      inv_txfm_add_8x16_flipadst_adst_0_12bpc_avx2: 356.2
      inv_txfm_add_8x16_flipadst_adst_1_12bpc_c: 4527.6
      inv_txfm_add_8x16_flipadst_adst_1_12bpc_avx2: 356.1
      inv_txfm_add_8x16_flipadst_adst_2_12bpc_c: 4532.5
      inv_txfm_add_8x16_flipadst_adst_2_12bpc_avx2: 440.0
      inv_txfm_add_8x16_flipadst_dct_0_12bpc_c: 4571.6
      inv_txfm_add_8x16_flipadst_dct_0_12bpc_avx2: 310.3
      inv_txfm_add_8x16_flipadst_dct_1_12bpc_c: 4554.5
      inv_txfm_add_8x16_flipadst_dct_1_12bpc_avx2: 309.7
      inv_txfm_add_8x16_flipadst_dct_2_12bpc_c: 4554.3
      inv_txfm_add_8x16_flipadst_dct_2_12bpc_avx2: 399.9
      inv_txfm_add_8x16_flipadst_flipadst_0_12bpc_c: 4497.2
      inv_txfm_add_8x16_flipadst_flipadst_0_12bpc_avx2: 355.9
      inv_txfm_add_8x16_flipadst_flipadst_1_12bpc_c: 4486.2
      inv_txfm_add_8x16_flipadst_flipadst_1_12bpc_avx2: 355.6
      inv_txfm_add_8x16_flipadst_flipadst_2_12bpc_c: 4493.4
      inv_txfm_add_8x16_flipadst_flipadst_2_12bpc_avx2: 446.0
      inv_txfm_add_8x16_flipadst_identity_0_12bpc_c: 3265.7
      inv_txfm_add_8x16_flipadst_identity_0_12bpc_avx2: 173.8
      inv_txfm_add_8x16_flipadst_identity_1_12bpc_c: 3270.8
      inv_txfm_add_8x16_flipadst_identity_1_12bpc_avx2: 173.5
      inv_txfm_add_8x16_flipadst_identity_2_12bpc_c: 3271.8
      inv_txfm_add_8x16_flipadst_identity_2_12bpc_avx2: 261.6
      inv_txfm_add_8x16_identity_adst_0_12bpc_c: 3295.3
      inv_txfm_add_8x16_identity_adst_0_12bpc_avx2: 302.5
      inv_txfm_add_8x16_identity_adst_1_12bpc_c: 3303.1
      inv_txfm_add_8x16_identity_adst_1_12bpc_avx2: 303.0
      inv_txfm_add_8x16_identity_adst_2_12bpc_c: 3304.6
      inv_txfm_add_8x16_identity_adst_2_12bpc_avx2: 303.1
      inv_txfm_add_8x16_identity_dct_0_12bpc_c: 3298.9
      inv_txfm_add_8x16_identity_dct_0_12bpc_avx2: 257.8
      inv_txfm_add_8x16_identity_dct_1_12bpc_c: 3308.1
      inv_txfm_add_8x16_identity_dct_1_12bpc_avx2: 259.2
      inv_txfm_add_8x16_identity_dct_2_12bpc_c: 3306.6
      inv_txfm_add_8x16_identity_dct_2_12bpc_avx2: 259.2
      inv_txfm_add_8x16_identity_flipadst_0_12bpc_c: 3294.7
      inv_txfm_add_8x16_identity_flipadst_0_12bpc_avx2: 302.2
      inv_txfm_add_8x16_identity_flipadst_1_12bpc_c: 3292.5
      inv_txfm_add_8x16_identity_flipadst_1_12bpc_avx2: 302.2
      inv_txfm_add_8x16_identity_flipadst_2_12bpc_c: 3275.4
      inv_txfm_add_8x16_identity_flipadst_2_12bpc_avx2: 303.3
      inv_txfm_add_8x16_identity_identity_0_12bpc_c: 2044.6
      inv_txfm_add_8x16_identity_identity_0_12bpc_avx2: 116.2
      inv_txfm_add_8x16_identity_identity_1_12bpc_c: 2059.9
      inv_txfm_add_8x16_identity_identity_1_12bpc_avx2: 117.0
      inv_txfm_add_8x16_identity_identity_2_12bpc_c: 2048.4
      inv_txfm_add_8x16_identity_identity_2_12bpc_avx2: 116.2
      23e8405c
  7. Dec 03, 2021
    • Henrik Gramner's avatar
      Fix lr line buffer padding · 7b99b0e1
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      Some cdef asm functions accesses memory before the start of the buffer.
      
      There are two lr line buffers allocated, but only one of them had the
      correct padding applied.
      7b99b0e1
    • Jonathan Wright's avatar
      AArch64 Neon: Replace XTN, XTN2 pairs with single UZP1 · 19ff99ea
      Jonathan Wright authored and Martin Storsjö's avatar Martin Storsjö committed
      It is often necessary to narrow the elements in a pair of Neon
      vectors to half the current width, before combining the results. This
      is usually achieved with a pair of XTN/XTN2 instructions. However, it
      is possible to achieve the same outcome with a single 'unzip' (UZP1)
      instruction.
      
      This patch changes all sequential AArch64 Neon XTN, XTN2 instruction
      pairs to use a single UZP1 instruction.
      
      Change-Id: I2a9fad3082d2cf363b1edce9ef0b8d547ec6c41a
      19ff99ea
    • Jonathan Wright's avatar
      AArch64 Neon: Use CMLT instead of SSHR to compute sign · 4e412738
      Jonathan Wright authored and Martin Storsjö's avatar Martin Storsjö committed
      The CMLT instruction has twice the throughput of SSHR on all modern
      out-of-order Arm cores. The Software Optimization Guides (SWOG) for
      the Cortex-A76, Cortex-A77 and Neoverse-N1 cores are being updated to
      reflect this. (The current version of the SWOG for these cores states
      that CMLT and SSHR both have the same execution throughput.)
      
      This patch changes all instances of sign computation to use CMLT
      instead of SSHR.
      
      Change-Id: Ice5747fee4e3bdd98ae8fbc036d735f55e492249
      4e412738
  8. Dec 02, 2021
  9. Nov 29, 2021
    • Henrik Gramner's avatar
    • Matthias Dressel's avatar
      x86/itx: Add 16x4 12bpc AVX2 transforms · 7be12857
      Matthias Dressel authored
      inv_txfm_add_16x4_adst_adst_0_12bpc_c: 1756.6
      inv_txfm_add_16x4_adst_adst_0_12bpc_avx2: 182.4
      inv_txfm_add_16x4_adst_adst_1_12bpc_c: 1756.0
      inv_txfm_add_16x4_adst_adst_1_12bpc_avx2: 182.5
      inv_txfm_add_16x4_adst_adst_2_12bpc_c: 1763.2
      inv_txfm_add_16x4_adst_adst_2_12bpc_avx2: 182.4
      inv_txfm_add_16x4_adst_dct_0_12bpc_c: 1863.6
      inv_txfm_add_16x4_adst_dct_0_12bpc_avx2: 176.0
      inv_txfm_add_16x4_adst_dct_1_12bpc_c: 1864.1
      inv_txfm_add_16x4_adst_dct_1_12bpc_avx2: 176.0
      inv_txfm_add_16x4_adst_dct_2_12bpc_c: 1861.3
      inv_txfm_add_16x4_adst_dct_2_12bpc_avx2: 176.0
      inv_txfm_add_16x4_adst_flipadst_0_12bpc_c: 1768.6
      inv_txfm_add_16x4_adst_flipadst_0_12bpc_avx2: 184.1
      inv_txfm_add_16x4_adst_flipadst_1_12bpc_c: 1768.8
      inv_txfm_add_16x4_adst_flipadst_1_12bpc_avx2: 184.5
      inv_txfm_add_16x4_adst_flipadst_2_12bpc_c: 1769.3
      inv_txfm_add_16x4_adst_flipadst_2_12bpc_avx2: 184.7
      inv_txfm_add_16x4_adst_identity_0_12bpc_c: 1686.6
      inv_txfm_add_16x4_adst_identity_0_12bpc_avx2: 145.4
      inv_txfm_add_16x4_adst_identity_1_12bpc_c: 1685.8
      inv_txfm_add_16x4_adst_identity_1_12bpc_avx2: 145.8
      inv_txfm_add_16x4_adst_identity_2_12bpc_c: 1681.7
      inv_txfm_add_16x4_adst_identity_2_12bpc_avx2: 145.8
      inv_txfm_add_16x4_dct_adst_0_12bpc_c: 1783.4
      inv_txfm_add_16x4_dct_adst_0_12bpc_avx2: 167.7
      inv_txfm_add_16x4_dct_adst_1_12bpc_c: 1789.1
      inv_txfm_add_16x4_dct_adst_1_12bpc_avx2: 167.9
      inv_txfm_add_16x4_dct_adst_2_12bpc_c: 1788.0
      inv_txfm_add_16x4_dct_adst_2_12bpc_avx2: 169.8
      inv_txfm_add_16x4_dct_dct_0_12bpc_c: 209.5
      inv_txfm_add_16x4_dct_dct_0_12bpc_avx2: 21.6
      inv_txfm_add_16x4_dct_dct_1_12bpc_c: 1894.3
      inv_txfm_add_16x4_dct_dct_1_12bpc_avx2: 156.8
      inv_txfm_add_16x4_dct_dct_2_12bpc_c: 1892.0
      inv_txfm_add_16x4_dct_dct_2_12bpc_avx2: 156.8
      inv_txfm_add_16x4_dct_flipadst_0_12bpc_c: 1784.7
      inv_txfm_add_16x4_dct_flipadst_0_12bpc_avx2: 167.2
      inv_txfm_add_16x4_dct_flipadst_1_12bpc_c: 1796.7
      inv_txfm_add_16x4_dct_flipadst_1_12bpc_avx2: 168.6
      inv_txfm_add_16x4_dct_flipadst_2_12bpc_c: 1788.9
      inv_txfm_add_16x4_dct_flipadst_2_12bpc_avx2: 168.9
      inv_txfm_add_16x4_dct_identity_0_12bpc_c: 1712.7
      inv_txfm_add_16x4_dct_identity_0_12bpc_avx2: 128.8
      inv_txfm_add_16x4_dct_identity_1_12bpc_c: 1714.8
      inv_txfm_add_16x4_dct_identity_1_12bpc_avx2: 128.8
      inv_txfm_add_16x4_dct_identity_2_12bpc_c: 1710.2
      inv_txfm_add_16x4_dct_identity_2_12bpc_avx2: 128.8
      inv_txfm_add_16x4_flipadst_adst_0_12bpc_c: 1763.6
      inv_txfm_add_16x4_flipadst_adst_0_12bpc_avx2: 186.6
      inv_txfm_add_16x4_flipadst_adst_1_12bpc_c: 1761.1
      inv_txfm_add_16x4_flipadst_adst_1_12bpc_avx2: 185.6
      inv_txfm_add_16x4_flipadst_adst_2_12bpc_c: 1761.8
      inv_txfm_add_16x4_flipadst_adst_2_12bpc_avx2: 187.0
      inv_txfm_add_16x4_flipadst_dct_0_12bpc_c: 1864.4
      inv_txfm_add_16x4_flipadst_dct_0_12bpc_avx2: 176.8
      inv_txfm_add_16x4_flipadst_dct_1_12bpc_c: 1862.7
      inv_txfm_add_16x4_flipadst_dct_1_12bpc_avx2: 176.8
      inv_txfm_add_16x4_flipadst_dct_2_12bpc_c: 1860.2
      inv_txfm_add_16x4_flipadst_dct_2_12bpc_avx2: 176.8
      inv_txfm_add_16x4_flipadst_flipadst_0_12bpc_c: 1760.4
      inv_txfm_add_16x4_flipadst_flipadst_0_12bpc_avx2: 185.3
      inv_txfm_add_16x4_flipadst_flipadst_1_12bpc_c: 1761.8
      inv_txfm_add_16x4_flipadst_flipadst_1_12bpc_avx2: 185.3
      inv_txfm_add_16x4_flipadst_flipadst_2_12bpc_c: 1766.5
      inv_txfm_add_16x4_flipadst_flipadst_2_12bpc_avx2: 184.9
      inv_txfm_add_16x4_flipadst_identity_0_12bpc_c: 1673.0
      inv_txfm_add_16x4_flipadst_identity_0_12bpc_avx2: 143.1
      inv_txfm_add_16x4_flipadst_identity_1_12bpc_c: 1673.2
      inv_txfm_add_16x4_flipadst_identity_1_12bpc_avx2: 143.1
      inv_txfm_add_16x4_flipadst_identity_2_12bpc_c: 1681.6
      inv_txfm_add_16x4_flipadst_identity_2_12bpc_avx2: 143.2
      inv_txfm_add_16x4_identity_adst_0_12bpc_c: 1128.7
      inv_txfm_add_16x4_identity_adst_0_12bpc_avx2: 102.8
      inv_txfm_add_16x4_identity_adst_1_12bpc_c: 1131.3
      inv_txfm_add_16x4_identity_adst_1_12bpc_avx2: 101.3
      inv_txfm_add_16x4_identity_adst_2_12bpc_c: 1127.5
      inv_txfm_add_16x4_identity_adst_2_12bpc_avx2: 99.1
      inv_txfm_add_16x4_identity_dct_0_12bpc_c: 1228.3
      inv_txfm_add_16x4_identity_dct_0_12bpc_avx2: 88.3
      inv_txfm_add_16x4_identity_dct_1_12bpc_c: 1220.5
      inv_txfm_add_16x4_identity_dct_1_12bpc_avx2: 88.0
      inv_txfm_add_16x4_identity_dct_2_12bpc_c: 1227.3
      inv_txfm_add_16x4_identity_dct_2_12bpc_avx2: 88.1
      inv_txfm_add_16x4_identity_flipadst_0_12bpc_c: 1142.4
      inv_txfm_add_16x4_identity_flipadst_0_12bpc_avx2: 100.3
      inv_txfm_add_16x4_identity_flipadst_1_12bpc_c: 1134.1
      inv_txfm_add_16x4_identity_flipadst_1_12bpc_avx2: 100.3
      inv_txfm_add_16x4_identity_flipadst_2_12bpc_c: 1136.4
      inv_txfm_add_16x4_identity_flipadst_2_12bpc_avx2: 100.3
      inv_txfm_add_16x4_identity_identity_0_12bpc_c: 1056.1
      inv_txfm_add_16x4_identity_identity_0_12bpc_avx2: 61.6
      inv_txfm_add_16x4_identity_identity_1_12bpc_c: 1064.6
      inv_txfm_add_16x4_identity_identity_1_12bpc_avx2: 62.9
      inv_txfm_add_16x4_identity_identity_2_12bpc_c: 1067.5
      inv_txfm_add_16x4_identity_identity_2_12bpc_avx2: 63.5
      7be12857
    • Matthias Dressel's avatar
      x86/itx: Add 4x16 12bpc AVX2 transforms · f64b2c22
      Matthias Dressel authored
      inv_txfm_add_4x16_adst_adst_0_12bpc_c: 1799.1
      inv_txfm_add_4x16_adst_adst_0_12bpc_avx2: 178.8
      inv_txfm_add_4x16_adst_adst_1_12bpc_c: 1795.0
      inv_txfm_add_4x16_adst_adst_1_12bpc_avx2: 179.1
      inv_txfm_add_4x16_adst_adst_2_12bpc_c: 1806.6
      inv_txfm_add_4x16_adst_adst_2_12bpc_avx2: 179.3
      inv_txfm_add_4x16_adst_dct_0_12bpc_c: 1824.8
      inv_txfm_add_4x16_adst_dct_0_12bpc_avx2: 166.8
      inv_txfm_add_4x16_adst_dct_1_12bpc_c: 1828.2
      inv_txfm_add_4x16_adst_dct_1_12bpc_avx2: 166.7
      inv_txfm_add_4x16_adst_dct_2_12bpc_c: 1830.9
      inv_txfm_add_4x16_adst_dct_2_12bpc_avx2: 165.6
      inv_txfm_add_4x16_adst_flipadst_0_12bpc_c: 1797.9
      inv_txfm_add_4x16_adst_flipadst_0_12bpc_avx2: 179.6
      inv_txfm_add_4x16_adst_flipadst_1_12bpc_c: 1795.9
      inv_txfm_add_4x16_adst_flipadst_1_12bpc_avx2: 180.6
      inv_txfm_add_4x16_adst_flipadst_2_12bpc_c: 1791.6
      inv_txfm_add_4x16_adst_flipadst_2_12bpc_avx2: 180.1
      inv_txfm_add_4x16_adst_identity_0_12bpc_c: 1163.7
      inv_txfm_add_4x16_adst_identity_0_12bpc_avx2: 78.6
      inv_txfm_add_4x16_adst_identity_1_12bpc_c: 1163.4
      inv_txfm_add_4x16_adst_identity_1_12bpc_avx2: 78.9
      inv_txfm_add_4x16_adst_identity_2_12bpc_c: 1164.3
      inv_txfm_add_4x16_adst_identity_2_12bpc_avx2: 78.8
      inv_txfm_add_4x16_dct_adst_0_12bpc_c: 1914.8
      inv_txfm_add_4x16_dct_adst_0_12bpc_avx2: 177.0
      inv_txfm_add_4x16_dct_adst_1_12bpc_c: 1904.8
      inv_txfm_add_4x16_dct_adst_1_12bpc_avx2: 177.3
      inv_txfm_add_4x16_dct_adst_2_12bpc_c: 1905.4
      inv_txfm_add_4x16_dct_adst_2_12bpc_avx2: 176.4
      inv_txfm_add_4x16_dct_dct_0_12bpc_c: 217.1
      inv_txfm_add_4x16_dct_dct_0_12bpc_avx2: 26.6
      inv_txfm_add_4x16_dct_dct_1_12bpc_c: 1955.1
      inv_txfm_add_4x16_dct_dct_1_12bpc_avx2: 162.3
      inv_txfm_add_4x16_dct_dct_2_12bpc_c: 1948.9
      inv_txfm_add_4x16_dct_dct_2_12bpc_avx2: 162.2
      inv_txfm_add_4x16_dct_flipadst_0_12bpc_c: 1922.8
      inv_txfm_add_4x16_dct_flipadst_0_12bpc_avx2: 180.6
      inv_txfm_add_4x16_dct_flipadst_1_12bpc_c: 1919.7
      inv_txfm_add_4x16_dct_flipadst_1_12bpc_avx2: 180.1
      inv_txfm_add_4x16_dct_flipadst_2_12bpc_c: 1912.0
      inv_txfm_add_4x16_dct_flipadst_2_12bpc_avx2: 180.1
      inv_txfm_add_4x16_dct_identity_0_12bpc_c: 1276.4
      inv_txfm_add_4x16_dct_identity_0_12bpc_avx2: 75.4
      inv_txfm_add_4x16_dct_identity_1_12bpc_c: 1277.5
      inv_txfm_add_4x16_dct_identity_1_12bpc_avx2: 75.4
      inv_txfm_add_4x16_dct_identity_2_12bpc_c: 1270.1
      inv_txfm_add_4x16_dct_identity_2_12bpc_avx2: 75.3
      inv_txfm_add_4x16_flipadst_adst_0_12bpc_c: 1802.8
      inv_txfm_add_4x16_flipadst_adst_0_12bpc_avx2: 180.8
      inv_txfm_add_4x16_flipadst_adst_1_12bpc_c: 1804.8
      inv_txfm_add_4x16_flipadst_adst_1_12bpc_avx2: 180.7
      inv_txfm_add_4x16_flipadst_adst_2_12bpc_c: 1800.6
      inv_txfm_add_4x16_flipadst_adst_2_12bpc_avx2: 181.2
      inv_txfm_add_4x16_flipadst_dct_0_12bpc_c: 1842.5
      inv_txfm_add_4x16_flipadst_dct_0_12bpc_avx2: 165.1
      inv_txfm_add_4x16_flipadst_dct_1_12bpc_c: 1837.8
      inv_txfm_add_4x16_flipadst_dct_1_12bpc_avx2: 164.4
      inv_txfm_add_4x16_flipadst_dct_2_12bpc_c: 1841.6
      inv_txfm_add_4x16_flipadst_dct_2_12bpc_avx2: 166.1
      inv_txfm_add_4x16_flipadst_flipadst_0_12bpc_c: 1812.4
      inv_txfm_add_4x16_flipadst_flipadst_0_12bpc_avx2: 182.0
      inv_txfm_add_4x16_flipadst_flipadst_1_12bpc_c: 1803.9
      inv_txfm_add_4x16_flipadst_flipadst_1_12bpc_avx2: 181.2
      inv_txfm_add_4x16_flipadst_flipadst_2_12bpc_c: 1809.9
      inv_txfm_add_4x16_flipadst_flipadst_2_12bpc_avx2: 183.2
      inv_txfm_add_4x16_flipadst_identity_0_12bpc_c: 1170.5
      inv_txfm_add_4x16_flipadst_identity_0_12bpc_avx2: 78.4
      inv_txfm_add_4x16_flipadst_identity_1_12bpc_c: 1172.1
      inv_txfm_add_4x16_flipadst_identity_1_12bpc_avx2: 80.0
      inv_txfm_add_4x16_flipadst_identity_2_12bpc_c: 1170.9
      inv_txfm_add_4x16_flipadst_identity_2_12bpc_avx2: 78.6
      inv_txfm_add_4x16_identity_adst_0_12bpc_c: 1705.4
      inv_txfm_add_4x16_identity_adst_0_12bpc_avx2: 162.6
      inv_txfm_add_4x16_identity_adst_1_12bpc_c: 1714.5
      inv_txfm_add_4x16_identity_adst_1_12bpc_avx2: 162.6
      inv_txfm_add_4x16_identity_adst_2_12bpc_c: 1703.1
      inv_txfm_add_4x16_identity_adst_2_12bpc_avx2: 162.5
      inv_txfm_add_4x16_identity_dct_0_12bpc_c: 1775.0
      inv_txfm_add_4x16_identity_dct_0_12bpc_avx2: 150.5
      inv_txfm_add_4x16_identity_dct_1_12bpc_c: 1753.0
      inv_txfm_add_4x16_identity_dct_1_12bpc_avx2: 150.6
      inv_txfm_add_4x16_identity_dct_2_12bpc_c: 1759.6
      inv_txfm_add_4x16_identity_dct_2_12bpc_avx2: 149.8
      inv_txfm_add_4x16_identity_flipadst_0_12bpc_c: 1727.5
      inv_txfm_add_4x16_identity_flipadst_0_12bpc_avx2: 160.3
      inv_txfm_add_4x16_identity_flipadst_1_12bpc_c: 1739.8
      inv_txfm_add_4x16_identity_flipadst_1_12bpc_avx2: 160.9
      inv_txfm_add_4x16_identity_flipadst_2_12bpc_c: 1728.3
      inv_txfm_add_4x16_identity_flipadst_2_12bpc_avx2: 159.9
      inv_txfm_add_4x16_identity_identity_0_12bpc_c: 1098.6
      inv_txfm_add_4x16_identity_identity_0_12bpc_avx2: 60.4
      inv_txfm_add_4x16_identity_identity_1_12bpc_c: 1095.4
      inv_txfm_add_4x16_identity_identity_1_12bpc_avx2: 61.3
      inv_txfm_add_4x16_identity_identity_2_12bpc_c: 1111.6
      inv_txfm_add_4x16_identity_identity_2_12bpc_avx2: 60.6
      f64b2c22
    • Matthias Dressel's avatar
      x86/itx: Convert 8bpc WHT to SSE2 · 00f92f2c
      Matthias Dressel authored
      WHT uses no SSSE3 instructions. The 16bpc variant is already SSE2.
      00f92f2c
  10. Nov 18, 2021
  11. Nov 15, 2021
  12. Nov 13, 2021
    • Matthias Dressel's avatar
      x86/itx: Add 8x8 12bpc AVX2 transforms · 31820a5e
      Matthias Dressel authored
      inv_txfm_add_8x8_adst_adst_0_12bpc_c: 1997.9
      inv_txfm_add_8x8_adst_adst_0_12bpc_avx2: 185.7
      inv_txfm_add_8x8_adst_adst_1_12bpc_c: 2009.8
      inv_txfm_add_8x8_adst_adst_1_12bpc_avx2: 185.7
      inv_txfm_add_8x8_adst_dct_0_12bpc_c: 1991.0
      inv_txfm_add_8x8_adst_dct_0_12bpc_avx2: 161.3
      inv_txfm_add_8x8_adst_dct_1_12bpc_c: 1977.0
      inv_txfm_add_8x8_adst_dct_1_12bpc_avx2: 161.4
      inv_txfm_add_8x8_adst_flipadst_0_12bpc_c: 2017.6
      inv_txfm_add_8x8_adst_flipadst_0_12bpc_avx2: 184.2
      inv_txfm_add_8x8_adst_flipadst_1_12bpc_c: 2018.9
      inv_txfm_add_8x8_adst_flipadst_1_12bpc_avx2: 184.2
      inv_txfm_add_8x8_adst_identity_0_12bpc_c: 1407.2
      inv_txfm_add_8x8_adst_identity_0_12bpc_avx2: 95.7
      inv_txfm_add_8x8_adst_identity_1_12bpc_c: 1405.9
      inv_txfm_add_8x8_adst_identity_1_12bpc_avx2: 95.8
      inv_txfm_add_8x8_dct_adst_0_12bpc_c: 2024.2
      inv_txfm_add_8x8_dct_adst_0_12bpc_avx2: 156.9
      inv_txfm_add_8x8_dct_adst_1_12bpc_c: 2018.8
      inv_txfm_add_8x8_dct_adst_1_12bpc_avx2: 160.1
      inv_txfm_add_8x8_dct_dct_0_12bpc_c: 213.0
      inv_txfm_add_8x8_dct_dct_0_12bpc_avx2: 24.8
      inv_txfm_add_8x8_dct_dct_1_12bpc_c: 2008.6
      inv_txfm_add_8x8_dct_dct_1_12bpc_avx2: 139.0
      inv_txfm_add_8x8_dct_flipadst_0_12bpc_c: 2012.3
      inv_txfm_add_8x8_dct_flipadst_0_12bpc_avx2: 159.2
      inv_txfm_add_8x8_dct_flipadst_1_12bpc_c: 2005.1
      inv_txfm_add_8x8_dct_flipadst_1_12bpc_avx2: 158.7
      inv_txfm_add_8x8_dct_identity_0_12bpc_c: 1470.4
      inv_txfm_add_8x8_dct_identity_0_12bpc_avx2: 71.7
      inv_txfm_add_8x8_dct_identity_1_12bpc_c: 1477.8
      inv_txfm_add_8x8_dct_identity_1_12bpc_avx2: 70.7
      inv_txfm_add_8x8_flipadst_adst_0_12bpc_c: 2006.1
      inv_txfm_add_8x8_flipadst_adst_0_12bpc_avx2: 183.6
      inv_txfm_add_8x8_flipadst_adst_1_12bpc_c: 1987.6
      inv_txfm_add_8x8_flipadst_adst_1_12bpc_avx2: 183.6
      inv_txfm_add_8x8_flipadst_dct_0_12bpc_c: 1986.6
      inv_txfm_add_8x8_flipadst_dct_0_12bpc_avx2: 163.0
      inv_txfm_add_8x8_flipadst_dct_1_12bpc_c: 1979.3
      inv_txfm_add_8x8_flipadst_dct_1_12bpc_avx2: 163.1
      inv_txfm_add_8x8_flipadst_flipadst_0_12bpc_c: 2004.0
      inv_txfm_add_8x8_flipadst_flipadst_0_12bpc_avx2: 184.3
      inv_txfm_add_8x8_flipadst_flipadst_1_12bpc_c: 2003.9
      inv_txfm_add_8x8_flipadst_flipadst_1_12bpc_avx2: 184.3
      inv_txfm_add_8x8_flipadst_identity_0_12bpc_c: 1433.5
      inv_txfm_add_8x8_flipadst_identity_0_12bpc_avx2: 95.3
      inv_txfm_add_8x8_flipadst_identity_1_12bpc_c: 1425.4
      inv_txfm_add_8x8_flipadst_identity_1_12bpc_avx2: 96.3
      inv_txfm_add_8x8_identity_adst_0_12bpc_c: 1456.5
      inv_txfm_add_8x8_identity_adst_0_12bpc_avx2: 115.8
      inv_txfm_add_8x8_identity_adst_1_12bpc_c: 1453.5
      inv_txfm_add_8x8_identity_adst_1_12bpc_avx2: 115.8
      inv_txfm_add_8x8_identity_dct_0_12bpc_c: 1450.0
      inv_txfm_add_8x8_identity_dct_0_12bpc_avx2: 93.5
      inv_txfm_add_8x8_identity_dct_1_12bpc_c: 1447.5
      inv_txfm_add_8x8_identity_dct_1_12bpc_avx2: 94.3
      inv_txfm_add_8x8_identity_flipadst_0_12bpc_c: 1451.7
      inv_txfm_add_8x8_identity_flipadst_0_12bpc_avx2: 114.0
      inv_txfm_add_8x8_identity_flipadst_1_12bpc_c: 1456.4
      inv_txfm_add_8x8_identity_flipadst_1_12bpc_avx2: 114.0
      inv_txfm_add_8x8_identity_identity_0_12bpc_c: 892.3
      inv_txfm_add_8x8_identity_identity_0_12bpc_avx2: 33.7
      inv_txfm_add_8x8_identity_identity_1_12bpc_c: 897.2
      inv_txfm_add_8x8_identity_identity_1_12bpc_avx2: 33.1
      31820a5e
    • Matthias Dressel's avatar
      x86/itx: Add 8x4 12bpc AVX2 transforms · 53cf6a3b
      Matthias Dressel authored
      inv_txfm_add_8x4_adst_adst_0_12bpc_c: 882.1
      inv_txfm_add_8x4_adst_adst_0_12bpc_avx2: 113.7
      inv_txfm_add_8x4_adst_adst_1_12bpc_c: 882.5
      inv_txfm_add_8x4_adst_adst_1_12bpc_avx2: 113.8
      inv_txfm_add_8x4_adst_dct_0_12bpc_c: 928.0
      inv_txfm_add_8x4_adst_dct_0_12bpc_avx2: 109.2
      inv_txfm_add_8x4_adst_dct_1_12bpc_c: 924.9
      inv_txfm_add_8x4_adst_dct_1_12bpc_avx2: 109.2
      inv_txfm_add_8x4_adst_flipadst_0_12bpc_c: 889.9
      inv_txfm_add_8x4_adst_flipadst_0_12bpc_avx2: 114.3
      inv_txfm_add_8x4_adst_flipadst_1_12bpc_c: 886.0
      inv_txfm_add_8x4_adst_flipadst_1_12bpc_avx2: 114.8
      inv_txfm_add_8x4_adst_identity_0_12bpc_c: 832.2
      inv_txfm_add_8x4_adst_identity_0_12bpc_avx2: 88.8
      inv_txfm_add_8x4_adst_identity_1_12bpc_c: 834.6
      inv_txfm_add_8x4_adst_identity_1_12bpc_avx2: 89.0
      inv_txfm_add_8x4_dct_adst_0_12bpc_c: 870.3
      inv_txfm_add_8x4_dct_adst_0_12bpc_avx2: 96.3
      inv_txfm_add_8x4_dct_adst_1_12bpc_c: 884.6
      inv_txfm_add_8x4_dct_adst_1_12bpc_avx2: 96.3
      inv_txfm_add_8x4_dct_dct_0_12bpc_c: 116.1
      inv_txfm_add_8x4_dct_dct_0_12bpc_avx2: 24.5
      inv_txfm_add_8x4_dct_dct_1_12bpc_c: 925.1
      inv_txfm_add_8x4_dct_dct_1_12bpc_avx2: 92.3
      inv_txfm_add_8x4_dct_flipadst_0_12bpc_c: 882.7
      inv_txfm_add_8x4_dct_flipadst_0_12bpc_avx2: 97.0
      inv_txfm_add_8x4_dct_flipadst_1_12bpc_c: 882.1
      inv_txfm_add_8x4_dct_flipadst_1_12bpc_avx2: 97.0
      inv_txfm_add_8x4_dct_identity_0_12bpc_c: 827.5
      inv_txfm_add_8x4_dct_identity_0_12bpc_avx2: 72.4
      inv_txfm_add_8x4_dct_identity_1_12bpc_c: 827.8
      inv_txfm_add_8x4_dct_identity_1_12bpc_avx2: 73.8
      inv_txfm_add_8x4_flipadst_adst_0_12bpc_c: 899.5
      inv_txfm_add_8x4_flipadst_adst_0_12bpc_avx2: 113.2
      inv_txfm_add_8x4_flipadst_adst_1_12bpc_c: 898.8
      inv_txfm_add_8x4_flipadst_adst_1_12bpc_avx2: 113.3
      inv_txfm_add_8x4_flipadst_dct_0_12bpc_c: 945.7
      inv_txfm_add_8x4_flipadst_dct_0_12bpc_avx2: 108.3
      inv_txfm_add_8x4_flipadst_dct_1_12bpc_c: 945.6
      inv_txfm_add_8x4_flipadst_dct_1_12bpc_avx2: 108.3
      inv_txfm_add_8x4_flipadst_flipadst_0_12bpc_c: 903.6
      inv_txfm_add_8x4_flipadst_flipadst_0_12bpc_avx2: 113.9
      inv_txfm_add_8x4_flipadst_flipadst_1_12bpc_c: 902.8
      inv_txfm_add_8x4_flipadst_flipadst_1_12bpc_avx2: 114.2
      inv_txfm_add_8x4_flipadst_identity_0_12bpc_c: 856.6
      inv_txfm_add_8x4_flipadst_identity_0_12bpc_avx2: 88.3
      inv_txfm_add_8x4_flipadst_identity_1_12bpc_c: 848.8
      inv_txfm_add_8x4_flipadst_identity_1_12bpc_avx2: 87.4
      inv_txfm_add_8x4_identity_adst_0_12bpc_c: 583.2
      inv_txfm_add_8x4_identity_adst_0_12bpc_avx2: 69.6
      inv_txfm_add_8x4_identity_adst_1_12bpc_c: 584.3
      inv_txfm_add_8x4_identity_adst_1_12bpc_avx2: 69.6
      inv_txfm_add_8x4_identity_dct_0_12bpc_c: 632.9
      inv_txfm_add_8x4_identity_dct_0_12bpc_avx2: 65.3
      inv_txfm_add_8x4_identity_dct_1_12bpc_c: 629.6
      inv_txfm_add_8x4_identity_dct_1_12bpc_avx2: 65.8
      inv_txfm_add_8x4_identity_flipadst_0_12bpc_c: 587.0
      inv_txfm_add_8x4_identity_flipadst_0_12bpc_avx2: 71.0
      inv_txfm_add_8x4_identity_flipadst_1_12bpc_c: 586.9
      inv_txfm_add_8x4_identity_flipadst_1_12bpc_avx2: 71.0
      inv_txfm_add_8x4_identity_identity_0_12bpc_c: 533.0
      inv_txfm_add_8x4_identity_identity_0_12bpc_avx2: 45.3
      inv_txfm_add_8x4_identity_identity_1_12bpc_c: 539.7
      inv_txfm_add_8x4_identity_identity_1_12bpc_avx2: 45.9
      53cf6a3b
    • Matthias Dressel's avatar
      x86/itx: Add 4x8 12bpc AVX2 transforms · 241753f5
      Matthias Dressel authored
      inv_txfm_add_4x8_adst_adst_0_12bpc_c: 900.8
      inv_txfm_add_4x8_adst_adst_0_12bpc_avx2: 118.8
      inv_txfm_add_4x8_adst_adst_1_12bpc_c: 893.7
      inv_txfm_add_4x8_adst_adst_1_12bpc_avx2: 118.8
      inv_txfm_add_4x8_adst_dct_0_12bpc_c: 890.2
      inv_txfm_add_4x8_adst_dct_0_12bpc_avx2: 104.8
      inv_txfm_add_4x8_adst_dct_1_12bpc_c: 887.4
      inv_txfm_add_4x8_adst_dct_1_12bpc_avx2: 104.8
      inv_txfm_add_4x8_adst_flipadst_0_12bpc_c: 919.6
      inv_txfm_add_4x8_adst_flipadst_0_12bpc_avx2: 116.6
      inv_txfm_add_4x8_adst_flipadst_1_12bpc_c: 912.1
      inv_txfm_add_4x8_adst_flipadst_1_12bpc_avx2: 116.6
      inv_txfm_add_4x8_adst_identity_0_12bpc_c: 613.5
      inv_txfm_add_4x8_adst_identity_0_12bpc_avx2: 42.8
      inv_txfm_add_4x8_adst_identity_1_12bpc_c: 608.7
      inv_txfm_add_4x8_adst_identity_1_12bpc_avx2: 43.3
      inv_txfm_add_4x8_dct_adst_0_12bpc_c: 951.7
      inv_txfm_add_4x8_dct_adst_0_12bpc_avx2: 113.8
      inv_txfm_add_4x8_dct_adst_1_12bpc_c: 949.0
      inv_txfm_add_4x8_dct_adst_1_12bpc_avx2: 113.1
      inv_txfm_add_4x8_dct_dct_0_12bpc_c: 118.6
      inv_txfm_add_4x8_dct_dct_0_12bpc_avx2: 24.5
      inv_txfm_add_4x8_dct_dct_1_12bpc_c: 942.4
      inv_txfm_add_4x8_dct_dct_1_12bpc_avx2: 99.2
      inv_txfm_add_4x8_dct_flipadst_0_12bpc_c: 959.3
      inv_txfm_add_4x8_dct_flipadst_0_12bpc_avx2: 113.9
      inv_txfm_add_4x8_dct_flipadst_1_12bpc_c: 964.1
      inv_txfm_add_4x8_dct_flipadst_1_12bpc_avx2: 114.3
      inv_txfm_add_4x8_dct_identity_0_12bpc_c: 659.9
      inv_txfm_add_4x8_dct_identity_0_12bpc_avx2: 41.9
      inv_txfm_add_4x8_dct_identity_1_12bpc_c: 658.6
      inv_txfm_add_4x8_dct_identity_1_12bpc_avx2: 41.6
      inv_txfm_add_4x8_flipadst_adst_0_12bpc_c: 906.6
      inv_txfm_add_4x8_flipadst_adst_0_12bpc_avx2: 117.3
      inv_txfm_add_4x8_flipadst_adst_1_12bpc_c: 907.7
      inv_txfm_add_4x8_flipadst_adst_1_12bpc_avx2: 117.3
      inv_txfm_add_4x8_flipadst_dct_0_12bpc_c: 890.3
      inv_txfm_add_4x8_flipadst_dct_0_12bpc_avx2: 104.6
      inv_txfm_add_4x8_flipadst_dct_1_12bpc_c: 895.6
      inv_txfm_add_4x8_flipadst_dct_1_12bpc_avx2: 104.6
      inv_txfm_add_4x8_flipadst_flipadst_0_12bpc_c: 902.9
      inv_txfm_add_4x8_flipadst_flipadst_0_12bpc_avx2: 116.5
      inv_txfm_add_4x8_flipadst_flipadst_1_12bpc_c: 915.0
      inv_txfm_add_4x8_flipadst_flipadst_1_12bpc_avx2: 116.4
      inv_txfm_add_4x8_flipadst_identity_0_12bpc_c: 618.6
      inv_txfm_add_4x8_flipadst_identity_0_12bpc_avx2: 45.3
      inv_txfm_add_4x8_flipadst_identity_1_12bpc_c: 618.1
      inv_txfm_add_4x8_flipadst_identity_1_12bpc_avx2: 44.0
      inv_txfm_add_4x8_identity_adst_0_12bpc_c: 829.7
      inv_txfm_add_4x8_identity_adst_0_12bpc_avx2: 107.4
      inv_txfm_add_4x8_identity_adst_1_12bpc_c: 831.7
      inv_txfm_add_4x8_identity_adst_1_12bpc_avx2: 107.8
      inv_txfm_add_4x8_identity_dct_0_12bpc_c: 823.2
      inv_txfm_add_4x8_identity_dct_0_12bpc_avx2: 90.7
      inv_txfm_add_4x8_identity_dct_1_12bpc_c: 824.1
      inv_txfm_add_4x8_identity_dct_1_12bpc_avx2: 90.7
      inv_txfm_add_4x8_identity_flipadst_0_12bpc_c: 853.4
      inv_txfm_add_4x8_identity_flipadst_0_12bpc_avx2: 106.8
      inv_txfm_add_4x8_identity_flipadst_1_12bpc_c: 852.2
      inv_txfm_add_4x8_identity_flipadst_1_12bpc_avx2: 106.8
      inv_txfm_add_4x8_identity_identity_0_12bpc_c: 543.2
      inv_txfm_add_4x8_identity_identity_0_12bpc_avx2: 36.4
      inv_txfm_add_4x8_identity_identity_1_12bpc_c: 544.8
      inv_txfm_add_4x8_identity_identity_1_12bpc_avx2: 36.6
      241753f5
  13. Nov 12, 2021
  14. Nov 11, 2021
  15. Nov 10, 2021
  16. Nov 05, 2021
  17. Nov 02, 2021
  18. Nov 01, 2021
  19. Oct 31, 2021
  20. Oct 29, 2021
  21. Oct 28, 2021
    • Martin Storsjö's avatar
      meson: Check for the pthread_getaffinity_np function before deciding to use it · 8c94f95c
      Martin Storsjö authored
      Use the check result instead of hardcoding what OSes have got the
      function.
      
      This also requires checking for the pthread_np.h header and including
      it while testing for functions in meson, but allows getting rid of the
      hardcoded OS conditions in the source.
      
      This fixes building for Android, if _GNU_SOURCE happens to be defined.
      (It gets defined if building with a slightly nonstandard cross file
      that defines "system = 'linux'", but it could also have been set by the
      caller.)
      8c94f95c
  22. Oct 27, 2021
  23. Oct 18, 2021
    • Matthias Dressel's avatar
      x86/itx: Add 12-bit 4x4 transforms in AVX2 · eb0308bc
      Matthias Dressel authored
      Refactors itx into separate 10, 12 bit functions to prevent conditional
      jumps.
      
      inv_txfm_add_4x4_adst_adst_0_12bpc_c: 370.9
      inv_txfm_add_4x4_adst_adst_0_12bpc_avx2: 68.6
      inv_txfm_add_4x4_adst_adst_1_12bpc_c: 371.0
      inv_txfm_add_4x4_adst_adst_1_12bpc_avx2: 68.7
      inv_txfm_add_4x4_adst_dct_0_12bpc_c: 413.1
      inv_txfm_add_4x4_adst_dct_0_12bpc_avx2: 69.2
      inv_txfm_add_4x4_adst_dct_1_12bpc_c: 412.7
      inv_txfm_add_4x4_adst_dct_1_12bpc_avx2: 68.8
      inv_txfm_add_4x4_adst_flipadst_0_12bpc_c: 378.5
      inv_txfm_add_4x4_adst_flipadst_0_12bpc_avx2: 74.9
      inv_txfm_add_4x4_adst_flipadst_1_12bpc_c: 378.1
      inv_txfm_add_4x4_adst_flipadst_1_12bpc_avx2: 74.6
      inv_txfm_add_4x4_adst_identity_0_12bpc_c: 347.8
      inv_txfm_add_4x4_adst_identity_0_12bpc_avx2: 48.8
      inv_txfm_add_4x4_adst_identity_1_12bpc_c: 342.7
      inv_txfm_add_4x4_adst_identity_1_12bpc_avx2: 49.0
      inv_txfm_add_4x4_dct_adst_0_12bpc_c: 399.2
      inv_txfm_add_4x4_dct_adst_0_12bpc_avx2: 73.1
      inv_txfm_add_4x4_dct_adst_1_12bpc_c: 398.7
      inv_txfm_add_4x4_dct_adst_1_12bpc_avx2: 72.2
      inv_txfm_add_4x4_dct_dct_0_12bpc_c: 69.6
      inv_txfm_add_4x4_dct_dct_0_12bpc_avx2: 32.9
      inv_txfm_add_4x4_dct_dct_1_12bpc_c: 420.5
      inv_txfm_add_4x4_dct_dct_1_12bpc_avx2: 72.2
      inv_txfm_add_4x4_dct_flipadst_0_12bpc_c: 405.5
      inv_txfm_add_4x4_dct_flipadst_0_12bpc_avx2: 75.9
      inv_txfm_add_4x4_dct_flipadst_1_12bpc_c: 404.2
      inv_txfm_add_4x4_dct_flipadst_1_12bpc_avx2: 75.6
      inv_txfm_add_4x4_dct_identity_0_12bpc_c: 374.1
      inv_txfm_add_4x4_dct_identity_0_12bpc_avx2: 51.6
      inv_txfm_add_4x4_dct_identity_1_12bpc_c: 368.0
      inv_txfm_add_4x4_dct_identity_1_12bpc_avx2: 51.8
      inv_txfm_add_4x4_flipadst_adst_0_12bpc_c: 368.0
      inv_txfm_add_4x4_flipadst_adst_0_12bpc_avx2: 69.2
      inv_txfm_add_4x4_flipadst_adst_1_12bpc_c: 370.7
      inv_txfm_add_4x4_flipadst_adst_1_12bpc_avx2: 70.4
      inv_txfm_add_4x4_flipadst_dct_0_12bpc_c: 393.7
      inv_txfm_add_4x4_flipadst_dct_0_12bpc_avx2: 70.1
      inv_txfm_add_4x4_flipadst_dct_1_12bpc_c: 392.9
      inv_txfm_add_4x4_flipadst_dct_1_12bpc_avx2: 69.6
      inv_txfm_add_4x4_flipadst_flipadst_0_12bpc_c: 382.2
      inv_txfm_add_4x4_flipadst_flipadst_0_12bpc_avx2: 74.6
      inv_txfm_add_4x4_flipadst_flipadst_1_12bpc_c: 381.3
      inv_txfm_add_4x4_flipadst_flipadst_1_12bpc_avx2: 74.9
      inv_txfm_add_4x4_flipadst_identity_0_12bpc_c: 346.7
      inv_txfm_add_4x4_flipadst_identity_0_12bpc_avx2: 48.2
      inv_txfm_add_4x4_flipadst_identity_1_12bpc_c: 347.9
      inv_txfm_add_4x4_flipadst_identity_1_12bpc_avx2: 48.7
      inv_txfm_add_4x4_identity_adst_0_12bpc_c: 344.7
      inv_txfm_add_4x4_identity_adst_0_12bpc_avx2: 59.8
      inv_txfm_add_4x4_identity_adst_1_12bpc_c: 340.5
      inv_txfm_add_4x4_identity_adst_1_12bpc_avx2: 59.2
      inv_txfm_add_4x4_identity_dct_0_12bpc_c: 369.8
      inv_txfm_add_4x4_identity_dct_0_12bpc_avx2: 59.3
      inv_txfm_add_4x4_identity_dct_1_12bpc_c: 369.5
      inv_txfm_add_4x4_identity_dct_1_12bpc_avx2: 59.2
      inv_txfm_add_4x4_identity_flipadst_0_12bpc_c: 353.4
      inv_txfm_add_4x4_identity_flipadst_0_12bpc_avx2: 65.6
      inv_txfm_add_4x4_identity_flipadst_1_12bpc_c: 350.9
      inv_txfm_add_4x4_identity_flipadst_1_12bpc_avx2: 65.9
      inv_txfm_add_4x4_identity_identity_0_12bpc_c: 326.1
      inv_txfm_add_4x4_identity_identity_0_12bpc_avx2: 39.5
      inv_txfm_add_4x4_identity_identity_1_12bpc_c: 321.6
      inv_txfm_add_4x4_identity_identity_1_12bpc_avx2: 39.5
      eb0308bc
    • Matthias Dressel's avatar
      x86/itx: Rename rax to r6 · 4cdfe691
      Matthias Dressel authored
      Use numerical GPR references everywhere for consistency.
      4cdfe691
Loading