Skip to content
Snippets Groups Projects
  1. Dec 04, 2021
    • Matthias Dressel's avatar
      x86/itx: Add 16x8 12bpc AVX2 transforms · e8a3f99d
      Matthias Dressel authored
      inv_txfm_add_16x8_adst_adst_0_12bpc_c: 4517.9
      inv_txfm_add_16x8_adst_adst_0_12bpc_avx2: 432.4
      inv_txfm_add_16x8_adst_adst_1_12bpc_c: 4510.9
      inv_txfm_add_16x8_adst_adst_1_12bpc_avx2: 432.4
      inv_txfm_add_16x8_adst_adst_2_12bpc_c: 4498.6
      inv_txfm_add_16x8_adst_adst_2_12bpc_avx2: 432.4
      inv_txfm_add_16x8_adst_dct_0_12bpc_c: 4553.8
      inv_txfm_add_16x8_adst_dct_0_12bpc_avx2: 389.1
      inv_txfm_add_16x8_adst_dct_1_12bpc_c: 4543.3
      inv_txfm_add_16x8_adst_dct_1_12bpc_avx2: 389.1
      inv_txfm_add_16x8_adst_dct_2_12bpc_c: 4538.4
      inv_txfm_add_16x8_adst_dct_2_12bpc_avx2: 389.1
      inv_txfm_add_16x8_adst_flipadst_0_12bpc_c: 4532.6
      inv_txfm_add_16x8_adst_flipadst_0_12bpc_avx2: 435.4
      inv_txfm_add_16x8_adst_flipadst_1_12bpc_c: 4520.4
      inv_txfm_add_16x8_adst_flipadst_1_12bpc_avx2: 435.4
      inv_txfm_add_16x8_adst_flipadst_2_12bpc_c: 4516.2
      inv_txfm_add_16x8_adst_flipadst_2_12bpc_avx2: 435.4
      inv_txfm_add_16x8_adst_identity_0_12bpc_c: 3502.3
      inv_txfm_add_16x8_adst_identity_0_12bpc_avx2: 255.9
      inv_txfm_add_16x8_adst_identity_1_12bpc_c: 3492.9
      inv_txfm_add_16x8_adst_identity_1_12bpc_avx2: 256.3
      inv_txfm_add_16x8_adst_identity_2_12bpc_c: 3471.4
      inv_txfm_add_16x8_adst_identity_2_12bpc_avx2: 256.7
      inv_txfm_add_16x8_dct_adst_0_12bpc_c: 4563.2
      inv_txfm_add_16x8_dct_adst_0_12bpc_avx2: 383.6
      inv_txfm_add_16x8_dct_adst_1_12bpc_c: 4573.1
      inv_txfm_add_16x8_dct_adst_1_12bpc_avx2: 383.9
      inv_txfm_add_16x8_dct_adst_2_12bpc_c: 4562.2
      inv_txfm_add_16x8_dct_adst_2_12bpc_avx2: 383.7
      inv_txfm_add_16x8_dct_dct_0_12bpc_c: 514.0
      inv_txfm_add_16x8_dct_dct_0_12bpc_avx2: 25.0
      inv_txfm_add_16x8_dct_dct_1_12bpc_c: 4540.5
      inv_txfm_add_16x8_dct_dct_1_12bpc_avx2: 340.4
      inv_txfm_add_16x8_dct_dct_2_12bpc_c: 4563.0
      inv_txfm_add_16x8_dct_dct_2_12bpc_avx2: 339.3
      inv_txfm_add_16x8_dct_flipadst_0_12bpc_c: 4568.0
      inv_txfm_add_16x8_dct_flipadst_0_12bpc_avx2: 385.9
      inv_txfm_add_16x8_dct_flipadst_1_12bpc_c: 4577.5
      inv_txfm_add_16x8_dct_flipadst_1_12bpc_avx2: 385.8
      inv_txfm_add_16x8_dct_flipadst_2_12bpc_c: 4573.8
      inv_txfm_add_16x8_dct_flipadst_2_12bpc_avx2: 385.8
      inv_txfm_add_16x8_dct_identity_0_12bpc_c: 3549.9
      inv_txfm_add_16x8_dct_identity_0_12bpc_avx2: 212.1
      inv_txfm_add_16x8_dct_identity_1_12bpc_c: 3538.7
      inv_txfm_add_16x8_dct_identity_1_12bpc_avx2: 212.1
      inv_txfm_add_16x8_dct_identity_2_12bpc_c: 3539.7
      inv_txfm_add_16x8_dct_identity_2_12bpc_avx2: 212.1
      inv_txfm_add_16x8_flipadst_adst_0_12bpc_c: 4495.3
      inv_txfm_add_16x8_flipadst_adst_0_12bpc_avx2: 431.4
      inv_txfm_add_16x8_flipadst_adst_1_12bpc_c: 4496.3
      inv_txfm_add_16x8_flipadst_adst_1_12bpc_avx2: 431.4
      inv_txfm_add_16x8_flipadst_adst_2_12bpc_c: 4499.2
      inv_txfm_add_16x8_flipadst_adst_2_12bpc_avx2: 431.3
      inv_txfm_add_16x8_flipadst_dct_0_12bpc_c: 4506.9
      inv_txfm_add_16x8_flipadst_dct_0_12bpc_avx2: 386.3
      inv_txfm_add_16x8_flipadst_dct_1_12bpc_c: 4512.9
      inv_txfm_add_16x8_flipadst_dct_1_12bpc_avx2: 386.0
      inv_txfm_add_16x8_flipadst_dct_2_12bpc_c: 4503.2
      inv_txfm_add_16x8_flipadst_dct_2_12bpc_avx2: 386.0
      inv_txfm_add_16x8_flipadst_flipadst_0_12bpc_c: 4509.1
      inv_txfm_add_16x8_flipadst_flipadst_0_12bpc_avx2: 432.2
      inv_txfm_add_16x8_flipadst_flipadst_1_12bpc_c: 4519.0
      inv_txfm_add_16x8_flipadst_flipadst_1_12bpc_avx2: 432.1
      inv_txfm_add_16x8_flipadst_flipadst_2_12bpc_c: 4518.3
      inv_txfm_add_16x8_flipadst_flipadst_2_12bpc_avx2: 432.1
      inv_txfm_add_16x8_flipadst_identity_0_12bpc_c: 3511.0
      inv_txfm_add_16x8_flipadst_identity_0_12bpc_avx2: 257.1
      inv_txfm_add_16x8_flipadst_identity_1_12bpc_c: 3518.5
      inv_txfm_add_16x8_flipadst_identity_1_12bpc_avx2: 257.2
      inv_txfm_add_16x8_flipadst_identity_2_12bpc_c: 3521.7
      inv_txfm_add_16x8_flipadst_identity_2_12bpc_avx2: 257.1
      inv_txfm_add_16x8_identity_adst_0_12bpc_c: 3166.8
      inv_txfm_add_16x8_identity_adst_0_12bpc_avx2: 268.6
      inv_txfm_add_16x8_identity_adst_1_12bpc_c: 3157.9
      inv_txfm_add_16x8_identity_adst_1_12bpc_avx2: 268.6
      inv_txfm_add_16x8_identity_adst_2_12bpc_c: 3156.5
      inv_txfm_add_16x8_identity_adst_2_12bpc_avx2: 268.6
      inv_txfm_add_16x8_identity_dct_0_12bpc_c: 3187.4
      inv_txfm_add_16x8_identity_dct_0_12bpc_avx2: 224.4
      inv_txfm_add_16x8_identity_dct_1_12bpc_c: 3185.8
      inv_txfm_add_16x8_identity_dct_1_12bpc_avx2: 224.4
      inv_txfm_add_16x8_identity_dct_2_12bpc_c: 3190.8
      inv_txfm_add_16x8_identity_dct_2_12bpc_avx2: 224.4
      inv_txfm_add_16x8_identity_flipadst_0_12bpc_c: 3167.7
      inv_txfm_add_16x8_identity_flipadst_0_12bpc_avx2: 269.7
      inv_txfm_add_16x8_identity_flipadst_1_12bpc_c: 3174.1
      inv_txfm_add_16x8_identity_flipadst_1_12bpc_avx2: 269.8
      inv_txfm_add_16x8_identity_flipadst_2_12bpc_c: 3174.7
      inv_txfm_add_16x8_identity_flipadst_2_12bpc_avx2: 269.7
      inv_txfm_add_16x8_identity_identity_0_12bpc_c: 2153.3
      inv_txfm_add_16x8_identity_identity_0_12bpc_avx2: 99.1
      inv_txfm_add_16x8_identity_identity_1_12bpc_c: 2143.6
      inv_txfm_add_16x8_identity_identity_1_12bpc_avx2: 99.3
      inv_txfm_add_16x8_identity_identity_2_12bpc_c: 2145.9
      inv_txfm_add_16x8_identity_identity_2_12bpc_avx2: 98.6
      e8a3f99d
    • Matthias Dressel's avatar
      x86/itx: Add 8x16 12bpc AVX2 transforms · 23e8405c
      Matthias Dressel authored
      inv_txfm_add_8x16_adst_adst_0_12bpc_c: 4440.4
      inv_txfm_add_8x16_adst_adst_0_12bpc_avx2: 354.3
      inv_txfm_add_8x16_adst_adst_1_12bpc_c: 4437.3
      inv_txfm_add_8x16_adst_adst_1_12bpc_avx2: 354.3
      inv_txfm_add_8x16_adst_adst_2_12bpc_c: 4438.8
      inv_txfm_add_8x16_adst_adst_2_12bpc_avx2: 442.6
      inv_txfm_add_8x16_adst_dct_0_12bpc_c: 4507.3
      inv_txfm_add_8x16_adst_dct_0_12bpc_avx2: 310.0
      inv_txfm_add_8x16_adst_dct_1_12bpc_c: 4500.3
      inv_txfm_add_8x16_adst_dct_1_12bpc_avx2: 310.0
      inv_txfm_add_8x16_adst_dct_2_12bpc_c: 4516.1
      inv_txfm_add_8x16_adst_dct_2_12bpc_avx2: 399.5
      inv_txfm_add_8x16_adst_flipadst_0_12bpc_c: 4457.3
      inv_txfm_add_8x16_adst_flipadst_0_12bpc_avx2: 355.6
      inv_txfm_add_8x16_adst_flipadst_1_12bpc_c: 4441.3
      inv_txfm_add_8x16_adst_flipadst_1_12bpc_avx2: 355.6
      inv_txfm_add_8x16_adst_flipadst_2_12bpc_c: 4448.9
      inv_txfm_add_8x16_adst_flipadst_2_12bpc_avx2: 445.5
      inv_txfm_add_8x16_adst_identity_0_12bpc_c: 3204.0
      inv_txfm_add_8x16_adst_identity_0_12bpc_avx2: 173.1
      inv_txfm_add_8x16_adst_identity_1_12bpc_c: 3207.1
      inv_txfm_add_8x16_adst_identity_1_12bpc_avx2: 173.6
      inv_txfm_add_8x16_adst_identity_2_12bpc_c: 3210.4
      inv_txfm_add_8x16_adst_identity_2_12bpc_avx2: 261.2
      inv_txfm_add_8x16_dct_adst_0_12bpc_c: 4484.2
      inv_txfm_add_8x16_dct_adst_0_12bpc_avx2: 334.0
      inv_txfm_add_8x16_dct_adst_1_12bpc_c: 4503.8
      inv_txfm_add_8x16_dct_adst_1_12bpc_avx2: 334.6
      inv_txfm_add_8x16_dct_adst_2_12bpc_c: 4490.7
      inv_txfm_add_8x16_dct_adst_2_12bpc_avx2: 395.6
      inv_txfm_add_8x16_dct_dct_0_12bpc_c: 419.9
      inv_txfm_add_8x16_dct_dct_0_12bpc_avx2: 37.6
      inv_txfm_add_8x16_dct_dct_1_12bpc_c: 4482.6
      inv_txfm_add_8x16_dct_dct_1_12bpc_avx2: 284.6
      inv_txfm_add_8x16_dct_dct_2_12bpc_c: 4468.7
      inv_txfm_add_8x16_dct_dct_2_12bpc_avx2: 348.3
      inv_txfm_add_8x16_dct_flipadst_0_12bpc_c: 4468.4
      inv_txfm_add_8x16_dct_flipadst_0_12bpc_avx2: 333.6
      inv_txfm_add_8x16_dct_flipadst_1_12bpc_c: 4463.5
      inv_txfm_add_8x16_dct_flipadst_1_12bpc_avx2: 333.5
      inv_txfm_add_8x16_dct_flipadst_2_12bpc_c: 4459.4
      inv_txfm_add_8x16_dct_flipadst_2_12bpc_avx2: 397.4
      inv_txfm_add_8x16_dct_identity_0_12bpc_c: 3237.1
      inv_txfm_add_8x16_dct_identity_0_12bpc_avx2: 149.6
      inv_txfm_add_8x16_dct_identity_1_12bpc_c: 3229.9
      inv_txfm_add_8x16_dct_identity_1_12bpc_avx2: 148.6
      inv_txfm_add_8x16_dct_identity_2_12bpc_c: 3225.6
      inv_txfm_add_8x16_dct_identity_2_12bpc_avx2: 211.3
      inv_txfm_add_8x16_flipadst_adst_0_12bpc_c: 4532.1
      inv_txfm_add_8x16_flipadst_adst_0_12bpc_avx2: 356.2
      inv_txfm_add_8x16_flipadst_adst_1_12bpc_c: 4527.6
      inv_txfm_add_8x16_flipadst_adst_1_12bpc_avx2: 356.1
      inv_txfm_add_8x16_flipadst_adst_2_12bpc_c: 4532.5
      inv_txfm_add_8x16_flipadst_adst_2_12bpc_avx2: 440.0
      inv_txfm_add_8x16_flipadst_dct_0_12bpc_c: 4571.6
      inv_txfm_add_8x16_flipadst_dct_0_12bpc_avx2: 310.3
      inv_txfm_add_8x16_flipadst_dct_1_12bpc_c: 4554.5
      inv_txfm_add_8x16_flipadst_dct_1_12bpc_avx2: 309.7
      inv_txfm_add_8x16_flipadst_dct_2_12bpc_c: 4554.3
      inv_txfm_add_8x16_flipadst_dct_2_12bpc_avx2: 399.9
      inv_txfm_add_8x16_flipadst_flipadst_0_12bpc_c: 4497.2
      inv_txfm_add_8x16_flipadst_flipadst_0_12bpc_avx2: 355.9
      inv_txfm_add_8x16_flipadst_flipadst_1_12bpc_c: 4486.2
      inv_txfm_add_8x16_flipadst_flipadst_1_12bpc_avx2: 355.6
      inv_txfm_add_8x16_flipadst_flipadst_2_12bpc_c: 4493.4
      inv_txfm_add_8x16_flipadst_flipadst_2_12bpc_avx2: 446.0
      inv_txfm_add_8x16_flipadst_identity_0_12bpc_c: 3265.7
      inv_txfm_add_8x16_flipadst_identity_0_12bpc_avx2: 173.8
      inv_txfm_add_8x16_flipadst_identity_1_12bpc_c: 3270.8
      inv_txfm_add_8x16_flipadst_identity_1_12bpc_avx2: 173.5
      inv_txfm_add_8x16_flipadst_identity_2_12bpc_c: 3271.8
      inv_txfm_add_8x16_flipadst_identity_2_12bpc_avx2: 261.6
      inv_txfm_add_8x16_identity_adst_0_12bpc_c: 3295.3
      inv_txfm_add_8x16_identity_adst_0_12bpc_avx2: 302.5
      inv_txfm_add_8x16_identity_adst_1_12bpc_c: 3303.1
      inv_txfm_add_8x16_identity_adst_1_12bpc_avx2: 303.0
      inv_txfm_add_8x16_identity_adst_2_12bpc_c: 3304.6
      inv_txfm_add_8x16_identity_adst_2_12bpc_avx2: 303.1
      inv_txfm_add_8x16_identity_dct_0_12bpc_c: 3298.9
      inv_txfm_add_8x16_identity_dct_0_12bpc_avx2: 257.8
      inv_txfm_add_8x16_identity_dct_1_12bpc_c: 3308.1
      inv_txfm_add_8x16_identity_dct_1_12bpc_avx2: 259.2
      inv_txfm_add_8x16_identity_dct_2_12bpc_c: 3306.6
      inv_txfm_add_8x16_identity_dct_2_12bpc_avx2: 259.2
      inv_txfm_add_8x16_identity_flipadst_0_12bpc_c: 3294.7
      inv_txfm_add_8x16_identity_flipadst_0_12bpc_avx2: 302.2
      inv_txfm_add_8x16_identity_flipadst_1_12bpc_c: 3292.5
      inv_txfm_add_8x16_identity_flipadst_1_12bpc_avx2: 302.2
      inv_txfm_add_8x16_identity_flipadst_2_12bpc_c: 3275.4
      inv_txfm_add_8x16_identity_flipadst_2_12bpc_avx2: 303.3
      inv_txfm_add_8x16_identity_identity_0_12bpc_c: 2044.6
      inv_txfm_add_8x16_identity_identity_0_12bpc_avx2: 116.2
      inv_txfm_add_8x16_identity_identity_1_12bpc_c: 2059.9
      inv_txfm_add_8x16_identity_identity_1_12bpc_avx2: 117.0
      inv_txfm_add_8x16_identity_identity_2_12bpc_c: 2048.4
      inv_txfm_add_8x16_identity_identity_2_12bpc_avx2: 116.2
      23e8405c
  2. Dec 03, 2021
    • Henrik Gramner's avatar
      Fix lr line buffer padding · 7b99b0e1
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      Some cdef asm functions accesses memory before the start of the buffer.
      
      There are two lr line buffers allocated, but only one of them had the
      correct padding applied.
      7b99b0e1
    • Jonathan Wright's avatar
      AArch64 Neon: Replace XTN, XTN2 pairs with single UZP1 · 19ff99ea
      Jonathan Wright authored and Martin Storsjö's avatar Martin Storsjö committed
      It is often necessary to narrow the elements in a pair of Neon
      vectors to half the current width, before combining the results. This
      is usually achieved with a pair of XTN/XTN2 instructions. However, it
      is possible to achieve the same outcome with a single 'unzip' (UZP1)
      instruction.
      
      This patch changes all sequential AArch64 Neon XTN, XTN2 instruction
      pairs to use a single UZP1 instruction.
      
      Change-Id: I2a9fad3082d2cf363b1edce9ef0b8d547ec6c41a
      19ff99ea
    • Jonathan Wright's avatar
      AArch64 Neon: Use CMLT instead of SSHR to compute sign · 4e412738
      Jonathan Wright authored and Martin Storsjö's avatar Martin Storsjö committed
      The CMLT instruction has twice the throughput of SSHR on all modern
      out-of-order Arm cores. The Software Optimization Guides (SWOG) for
      the Cortex-A76, Cortex-A77 and Neoverse-N1 cores are being updated to
      reflect this. (The current version of the SWOG for these cores states
      that CMLT and SSHR both have the same execution throughput.)
      
      This patch changes all instances of sign computation to use CMLT
      instead of SSHR.
      
      Change-Id: Ice5747fee4e3bdd98ae8fbc036d735f55e492249
      4e412738
  3. Dec 02, 2021
  4. Nov 29, 2021
    • Henrik Gramner's avatar
    • Matthias Dressel's avatar
      x86/itx: Add 16x4 12bpc AVX2 transforms · 7be12857
      Matthias Dressel authored
      inv_txfm_add_16x4_adst_adst_0_12bpc_c: 1756.6
      inv_txfm_add_16x4_adst_adst_0_12bpc_avx2: 182.4
      inv_txfm_add_16x4_adst_adst_1_12bpc_c: 1756.0
      inv_txfm_add_16x4_adst_adst_1_12bpc_avx2: 182.5
      inv_txfm_add_16x4_adst_adst_2_12bpc_c: 1763.2
      inv_txfm_add_16x4_adst_adst_2_12bpc_avx2: 182.4
      inv_txfm_add_16x4_adst_dct_0_12bpc_c: 1863.6
      inv_txfm_add_16x4_adst_dct_0_12bpc_avx2: 176.0
      inv_txfm_add_16x4_adst_dct_1_12bpc_c: 1864.1
      inv_txfm_add_16x4_adst_dct_1_12bpc_avx2: 176.0
      inv_txfm_add_16x4_adst_dct_2_12bpc_c: 1861.3
      inv_txfm_add_16x4_adst_dct_2_12bpc_avx2: 176.0
      inv_txfm_add_16x4_adst_flipadst_0_12bpc_c: 1768.6
      inv_txfm_add_16x4_adst_flipadst_0_12bpc_avx2: 184.1
      inv_txfm_add_16x4_adst_flipadst_1_12bpc_c: 1768.8
      inv_txfm_add_16x4_adst_flipadst_1_12bpc_avx2: 184.5
      inv_txfm_add_16x4_adst_flipadst_2_12bpc_c: 1769.3
      inv_txfm_add_16x4_adst_flipadst_2_12bpc_avx2: 184.7
      inv_txfm_add_16x4_adst_identity_0_12bpc_c: 1686.6
      inv_txfm_add_16x4_adst_identity_0_12bpc_avx2: 145.4
      inv_txfm_add_16x4_adst_identity_1_12bpc_c: 1685.8
      inv_txfm_add_16x4_adst_identity_1_12bpc_avx2: 145.8
      inv_txfm_add_16x4_adst_identity_2_12bpc_c: 1681.7
      inv_txfm_add_16x4_adst_identity_2_12bpc_avx2: 145.8
      inv_txfm_add_16x4_dct_adst_0_12bpc_c: 1783.4
      inv_txfm_add_16x4_dct_adst_0_12bpc_avx2: 167.7
      inv_txfm_add_16x4_dct_adst_1_12bpc_c: 1789.1
      inv_txfm_add_16x4_dct_adst_1_12bpc_avx2: 167.9
      inv_txfm_add_16x4_dct_adst_2_12bpc_c: 1788.0
      inv_txfm_add_16x4_dct_adst_2_12bpc_avx2: 169.8
      inv_txfm_add_16x4_dct_dct_0_12bpc_c: 209.5
      inv_txfm_add_16x4_dct_dct_0_12bpc_avx2: 21.6
      inv_txfm_add_16x4_dct_dct_1_12bpc_c: 1894.3
      inv_txfm_add_16x4_dct_dct_1_12bpc_avx2: 156.8
      inv_txfm_add_16x4_dct_dct_2_12bpc_c: 1892.0
      inv_txfm_add_16x4_dct_dct_2_12bpc_avx2: 156.8
      inv_txfm_add_16x4_dct_flipadst_0_12bpc_c: 1784.7
      inv_txfm_add_16x4_dct_flipadst_0_12bpc_avx2: 167.2
      inv_txfm_add_16x4_dct_flipadst_1_12bpc_c: 1796.7
      inv_txfm_add_16x4_dct_flipadst_1_12bpc_avx2: 168.6
      inv_txfm_add_16x4_dct_flipadst_2_12bpc_c: 1788.9
      inv_txfm_add_16x4_dct_flipadst_2_12bpc_avx2: 168.9
      inv_txfm_add_16x4_dct_identity_0_12bpc_c: 1712.7
      inv_txfm_add_16x4_dct_identity_0_12bpc_avx2: 128.8
      inv_txfm_add_16x4_dct_identity_1_12bpc_c: 1714.8
      inv_txfm_add_16x4_dct_identity_1_12bpc_avx2: 128.8
      inv_txfm_add_16x4_dct_identity_2_12bpc_c: 1710.2
      inv_txfm_add_16x4_dct_identity_2_12bpc_avx2: 128.8
      inv_txfm_add_16x4_flipadst_adst_0_12bpc_c: 1763.6
      inv_txfm_add_16x4_flipadst_adst_0_12bpc_avx2: 186.6
      inv_txfm_add_16x4_flipadst_adst_1_12bpc_c: 1761.1
      inv_txfm_add_16x4_flipadst_adst_1_12bpc_avx2: 185.6
      inv_txfm_add_16x4_flipadst_adst_2_12bpc_c: 1761.8
      inv_txfm_add_16x4_flipadst_adst_2_12bpc_avx2: 187.0
      inv_txfm_add_16x4_flipadst_dct_0_12bpc_c: 1864.4
      inv_txfm_add_16x4_flipadst_dct_0_12bpc_avx2: 176.8
      inv_txfm_add_16x4_flipadst_dct_1_12bpc_c: 1862.7
      inv_txfm_add_16x4_flipadst_dct_1_12bpc_avx2: 176.8
      inv_txfm_add_16x4_flipadst_dct_2_12bpc_c: 1860.2
      inv_txfm_add_16x4_flipadst_dct_2_12bpc_avx2: 176.8
      inv_txfm_add_16x4_flipadst_flipadst_0_12bpc_c: 1760.4
      inv_txfm_add_16x4_flipadst_flipadst_0_12bpc_avx2: 185.3
      inv_txfm_add_16x4_flipadst_flipadst_1_12bpc_c: 1761.8
      inv_txfm_add_16x4_flipadst_flipadst_1_12bpc_avx2: 185.3
      inv_txfm_add_16x4_flipadst_flipadst_2_12bpc_c: 1766.5
      inv_txfm_add_16x4_flipadst_flipadst_2_12bpc_avx2: 184.9
      inv_txfm_add_16x4_flipadst_identity_0_12bpc_c: 1673.0
      inv_txfm_add_16x4_flipadst_identity_0_12bpc_avx2: 143.1
      inv_txfm_add_16x4_flipadst_identity_1_12bpc_c: 1673.2
      inv_txfm_add_16x4_flipadst_identity_1_12bpc_avx2: 143.1
      inv_txfm_add_16x4_flipadst_identity_2_12bpc_c: 1681.6
      inv_txfm_add_16x4_flipadst_identity_2_12bpc_avx2: 143.2
      inv_txfm_add_16x4_identity_adst_0_12bpc_c: 1128.7
      inv_txfm_add_16x4_identity_adst_0_12bpc_avx2: 102.8
      inv_txfm_add_16x4_identity_adst_1_12bpc_c: 1131.3
      inv_txfm_add_16x4_identity_adst_1_12bpc_avx2: 101.3
      inv_txfm_add_16x4_identity_adst_2_12bpc_c: 1127.5
      inv_txfm_add_16x4_identity_adst_2_12bpc_avx2: 99.1
      inv_txfm_add_16x4_identity_dct_0_12bpc_c: 1228.3
      inv_txfm_add_16x4_identity_dct_0_12bpc_avx2: 88.3
      inv_txfm_add_16x4_identity_dct_1_12bpc_c: 1220.5
      inv_txfm_add_16x4_identity_dct_1_12bpc_avx2: 88.0
      inv_txfm_add_16x4_identity_dct_2_12bpc_c: 1227.3
      inv_txfm_add_16x4_identity_dct_2_12bpc_avx2: 88.1
      inv_txfm_add_16x4_identity_flipadst_0_12bpc_c: 1142.4
      inv_txfm_add_16x4_identity_flipadst_0_12bpc_avx2: 100.3
      inv_txfm_add_16x4_identity_flipadst_1_12bpc_c: 1134.1
      inv_txfm_add_16x4_identity_flipadst_1_12bpc_avx2: 100.3
      inv_txfm_add_16x4_identity_flipadst_2_12bpc_c: 1136.4
      inv_txfm_add_16x4_identity_flipadst_2_12bpc_avx2: 100.3
      inv_txfm_add_16x4_identity_identity_0_12bpc_c: 1056.1
      inv_txfm_add_16x4_identity_identity_0_12bpc_avx2: 61.6
      inv_txfm_add_16x4_identity_identity_1_12bpc_c: 1064.6
      inv_txfm_add_16x4_identity_identity_1_12bpc_avx2: 62.9
      inv_txfm_add_16x4_identity_identity_2_12bpc_c: 1067.5
      inv_txfm_add_16x4_identity_identity_2_12bpc_avx2: 63.5
      7be12857
    • Matthias Dressel's avatar
      x86/itx: Add 4x16 12bpc AVX2 transforms · f64b2c22
      Matthias Dressel authored
      inv_txfm_add_4x16_adst_adst_0_12bpc_c: 1799.1
      inv_txfm_add_4x16_adst_adst_0_12bpc_avx2: 178.8
      inv_txfm_add_4x16_adst_adst_1_12bpc_c: 1795.0
      inv_txfm_add_4x16_adst_adst_1_12bpc_avx2: 179.1
      inv_txfm_add_4x16_adst_adst_2_12bpc_c: 1806.6
      inv_txfm_add_4x16_adst_adst_2_12bpc_avx2: 179.3
      inv_txfm_add_4x16_adst_dct_0_12bpc_c: 1824.8
      inv_txfm_add_4x16_adst_dct_0_12bpc_avx2: 166.8
      inv_txfm_add_4x16_adst_dct_1_12bpc_c: 1828.2
      inv_txfm_add_4x16_adst_dct_1_12bpc_avx2: 166.7
      inv_txfm_add_4x16_adst_dct_2_12bpc_c: 1830.9
      inv_txfm_add_4x16_adst_dct_2_12bpc_avx2: 165.6
      inv_txfm_add_4x16_adst_flipadst_0_12bpc_c: 1797.9
      inv_txfm_add_4x16_adst_flipadst_0_12bpc_avx2: 179.6
      inv_txfm_add_4x16_adst_flipadst_1_12bpc_c: 1795.9
      inv_txfm_add_4x16_adst_flipadst_1_12bpc_avx2: 180.6
      inv_txfm_add_4x16_adst_flipadst_2_12bpc_c: 1791.6
      inv_txfm_add_4x16_adst_flipadst_2_12bpc_avx2: 180.1
      inv_txfm_add_4x16_adst_identity_0_12bpc_c: 1163.7
      inv_txfm_add_4x16_adst_identity_0_12bpc_avx2: 78.6
      inv_txfm_add_4x16_adst_identity_1_12bpc_c: 1163.4
      inv_txfm_add_4x16_adst_identity_1_12bpc_avx2: 78.9
      inv_txfm_add_4x16_adst_identity_2_12bpc_c: 1164.3
      inv_txfm_add_4x16_adst_identity_2_12bpc_avx2: 78.8
      inv_txfm_add_4x16_dct_adst_0_12bpc_c: 1914.8
      inv_txfm_add_4x16_dct_adst_0_12bpc_avx2: 177.0
      inv_txfm_add_4x16_dct_adst_1_12bpc_c: 1904.8
      inv_txfm_add_4x16_dct_adst_1_12bpc_avx2: 177.3
      inv_txfm_add_4x16_dct_adst_2_12bpc_c: 1905.4
      inv_txfm_add_4x16_dct_adst_2_12bpc_avx2: 176.4
      inv_txfm_add_4x16_dct_dct_0_12bpc_c: 217.1
      inv_txfm_add_4x16_dct_dct_0_12bpc_avx2: 26.6
      inv_txfm_add_4x16_dct_dct_1_12bpc_c: 1955.1
      inv_txfm_add_4x16_dct_dct_1_12bpc_avx2: 162.3
      inv_txfm_add_4x16_dct_dct_2_12bpc_c: 1948.9
      inv_txfm_add_4x16_dct_dct_2_12bpc_avx2: 162.2
      inv_txfm_add_4x16_dct_flipadst_0_12bpc_c: 1922.8
      inv_txfm_add_4x16_dct_flipadst_0_12bpc_avx2: 180.6
      inv_txfm_add_4x16_dct_flipadst_1_12bpc_c: 1919.7
      inv_txfm_add_4x16_dct_flipadst_1_12bpc_avx2: 180.1
      inv_txfm_add_4x16_dct_flipadst_2_12bpc_c: 1912.0
      inv_txfm_add_4x16_dct_flipadst_2_12bpc_avx2: 180.1
      inv_txfm_add_4x16_dct_identity_0_12bpc_c: 1276.4
      inv_txfm_add_4x16_dct_identity_0_12bpc_avx2: 75.4
      inv_txfm_add_4x16_dct_identity_1_12bpc_c: 1277.5
      inv_txfm_add_4x16_dct_identity_1_12bpc_avx2: 75.4
      inv_txfm_add_4x16_dct_identity_2_12bpc_c: 1270.1
      inv_txfm_add_4x16_dct_identity_2_12bpc_avx2: 75.3
      inv_txfm_add_4x16_flipadst_adst_0_12bpc_c: 1802.8
      inv_txfm_add_4x16_flipadst_adst_0_12bpc_avx2: 180.8
      inv_txfm_add_4x16_flipadst_adst_1_12bpc_c: 1804.8
      inv_txfm_add_4x16_flipadst_adst_1_12bpc_avx2: 180.7
      inv_txfm_add_4x16_flipadst_adst_2_12bpc_c: 1800.6
      inv_txfm_add_4x16_flipadst_adst_2_12bpc_avx2: 181.2
      inv_txfm_add_4x16_flipadst_dct_0_12bpc_c: 1842.5
      inv_txfm_add_4x16_flipadst_dct_0_12bpc_avx2: 165.1
      inv_txfm_add_4x16_flipadst_dct_1_12bpc_c: 1837.8
      inv_txfm_add_4x16_flipadst_dct_1_12bpc_avx2: 164.4
      inv_txfm_add_4x16_flipadst_dct_2_12bpc_c: 1841.6
      inv_txfm_add_4x16_flipadst_dct_2_12bpc_avx2: 166.1
      inv_txfm_add_4x16_flipadst_flipadst_0_12bpc_c: 1812.4
      inv_txfm_add_4x16_flipadst_flipadst_0_12bpc_avx2: 182.0
      inv_txfm_add_4x16_flipadst_flipadst_1_12bpc_c: 1803.9
      inv_txfm_add_4x16_flipadst_flipadst_1_12bpc_avx2: 181.2
      inv_txfm_add_4x16_flipadst_flipadst_2_12bpc_c: 1809.9
      inv_txfm_add_4x16_flipadst_flipadst_2_12bpc_avx2: 183.2
      inv_txfm_add_4x16_flipadst_identity_0_12bpc_c: 1170.5
      inv_txfm_add_4x16_flipadst_identity_0_12bpc_avx2: 78.4
      inv_txfm_add_4x16_flipadst_identity_1_12bpc_c: 1172.1
      inv_txfm_add_4x16_flipadst_identity_1_12bpc_avx2: 80.0
      inv_txfm_add_4x16_flipadst_identity_2_12bpc_c: 1170.9
      inv_txfm_add_4x16_flipadst_identity_2_12bpc_avx2: 78.6
      inv_txfm_add_4x16_identity_adst_0_12bpc_c: 1705.4
      inv_txfm_add_4x16_identity_adst_0_12bpc_avx2: 162.6
      inv_txfm_add_4x16_identity_adst_1_12bpc_c: 1714.5
      inv_txfm_add_4x16_identity_adst_1_12bpc_avx2: 162.6
      inv_txfm_add_4x16_identity_adst_2_12bpc_c: 1703.1
      inv_txfm_add_4x16_identity_adst_2_12bpc_avx2: 162.5
      inv_txfm_add_4x16_identity_dct_0_12bpc_c: 1775.0
      inv_txfm_add_4x16_identity_dct_0_12bpc_avx2: 150.5
      inv_txfm_add_4x16_identity_dct_1_12bpc_c: 1753.0
      inv_txfm_add_4x16_identity_dct_1_12bpc_avx2: 150.6
      inv_txfm_add_4x16_identity_dct_2_12bpc_c: 1759.6
      inv_txfm_add_4x16_identity_dct_2_12bpc_avx2: 149.8
      inv_txfm_add_4x16_identity_flipadst_0_12bpc_c: 1727.5
      inv_txfm_add_4x16_identity_flipadst_0_12bpc_avx2: 160.3
      inv_txfm_add_4x16_identity_flipadst_1_12bpc_c: 1739.8
      inv_txfm_add_4x16_identity_flipadst_1_12bpc_avx2: 160.9
      inv_txfm_add_4x16_identity_flipadst_2_12bpc_c: 1728.3
      inv_txfm_add_4x16_identity_flipadst_2_12bpc_avx2: 159.9
      inv_txfm_add_4x16_identity_identity_0_12bpc_c: 1098.6
      inv_txfm_add_4x16_identity_identity_0_12bpc_avx2: 60.4
      inv_txfm_add_4x16_identity_identity_1_12bpc_c: 1095.4
      inv_txfm_add_4x16_identity_identity_1_12bpc_avx2: 61.3
      inv_txfm_add_4x16_identity_identity_2_12bpc_c: 1111.6
      inv_txfm_add_4x16_identity_identity_2_12bpc_avx2: 60.6
      f64b2c22
    • Matthias Dressel's avatar
      x86/itx: Convert 8bpc WHT to SSE2 · 00f92f2c
      Matthias Dressel authored
      WHT uses no SSSE3 instructions. The 16bpc variant is already SSE2.
      00f92f2c
  5. Nov 18, 2021
  6. Nov 15, 2021
  7. Nov 13, 2021
    • Matthias Dressel's avatar
      x86/itx: Add 8x8 12bpc AVX2 transforms · 31820a5e
      Matthias Dressel authored
      inv_txfm_add_8x8_adst_adst_0_12bpc_c: 1997.9
      inv_txfm_add_8x8_adst_adst_0_12bpc_avx2: 185.7
      inv_txfm_add_8x8_adst_adst_1_12bpc_c: 2009.8
      inv_txfm_add_8x8_adst_adst_1_12bpc_avx2: 185.7
      inv_txfm_add_8x8_adst_dct_0_12bpc_c: 1991.0
      inv_txfm_add_8x8_adst_dct_0_12bpc_avx2: 161.3
      inv_txfm_add_8x8_adst_dct_1_12bpc_c: 1977.0
      inv_txfm_add_8x8_adst_dct_1_12bpc_avx2: 161.4
      inv_txfm_add_8x8_adst_flipadst_0_12bpc_c: 2017.6
      inv_txfm_add_8x8_adst_flipadst_0_12bpc_avx2: 184.2
      inv_txfm_add_8x8_adst_flipadst_1_12bpc_c: 2018.9
      inv_txfm_add_8x8_adst_flipadst_1_12bpc_avx2: 184.2
      inv_txfm_add_8x8_adst_identity_0_12bpc_c: 1407.2
      inv_txfm_add_8x8_adst_identity_0_12bpc_avx2: 95.7
      inv_txfm_add_8x8_adst_identity_1_12bpc_c: 1405.9
      inv_txfm_add_8x8_adst_identity_1_12bpc_avx2: 95.8
      inv_txfm_add_8x8_dct_adst_0_12bpc_c: 2024.2
      inv_txfm_add_8x8_dct_adst_0_12bpc_avx2: 156.9
      inv_txfm_add_8x8_dct_adst_1_12bpc_c: 2018.8
      inv_txfm_add_8x8_dct_adst_1_12bpc_avx2: 160.1
      inv_txfm_add_8x8_dct_dct_0_12bpc_c: 213.0
      inv_txfm_add_8x8_dct_dct_0_12bpc_avx2: 24.8
      inv_txfm_add_8x8_dct_dct_1_12bpc_c: 2008.6
      inv_txfm_add_8x8_dct_dct_1_12bpc_avx2: 139.0
      inv_txfm_add_8x8_dct_flipadst_0_12bpc_c: 2012.3
      inv_txfm_add_8x8_dct_flipadst_0_12bpc_avx2: 159.2
      inv_txfm_add_8x8_dct_flipadst_1_12bpc_c: 2005.1
      inv_txfm_add_8x8_dct_flipadst_1_12bpc_avx2: 158.7
      inv_txfm_add_8x8_dct_identity_0_12bpc_c: 1470.4
      inv_txfm_add_8x8_dct_identity_0_12bpc_avx2: 71.7
      inv_txfm_add_8x8_dct_identity_1_12bpc_c: 1477.8
      inv_txfm_add_8x8_dct_identity_1_12bpc_avx2: 70.7
      inv_txfm_add_8x8_flipadst_adst_0_12bpc_c: 2006.1
      inv_txfm_add_8x8_flipadst_adst_0_12bpc_avx2: 183.6
      inv_txfm_add_8x8_flipadst_adst_1_12bpc_c: 1987.6
      inv_txfm_add_8x8_flipadst_adst_1_12bpc_avx2: 183.6
      inv_txfm_add_8x8_flipadst_dct_0_12bpc_c: 1986.6
      inv_txfm_add_8x8_flipadst_dct_0_12bpc_avx2: 163.0
      inv_txfm_add_8x8_flipadst_dct_1_12bpc_c: 1979.3
      inv_txfm_add_8x8_flipadst_dct_1_12bpc_avx2: 163.1
      inv_txfm_add_8x8_flipadst_flipadst_0_12bpc_c: 2004.0
      inv_txfm_add_8x8_flipadst_flipadst_0_12bpc_avx2: 184.3
      inv_txfm_add_8x8_flipadst_flipadst_1_12bpc_c: 2003.9
      inv_txfm_add_8x8_flipadst_flipadst_1_12bpc_avx2: 184.3
      inv_txfm_add_8x8_flipadst_identity_0_12bpc_c: 1433.5
      inv_txfm_add_8x8_flipadst_identity_0_12bpc_avx2: 95.3
      inv_txfm_add_8x8_flipadst_identity_1_12bpc_c: 1425.4
      inv_txfm_add_8x8_flipadst_identity_1_12bpc_avx2: 96.3
      inv_txfm_add_8x8_identity_adst_0_12bpc_c: 1456.5
      inv_txfm_add_8x8_identity_adst_0_12bpc_avx2: 115.8
      inv_txfm_add_8x8_identity_adst_1_12bpc_c: 1453.5
      inv_txfm_add_8x8_identity_adst_1_12bpc_avx2: 115.8
      inv_txfm_add_8x8_identity_dct_0_12bpc_c: 1450.0
      inv_txfm_add_8x8_identity_dct_0_12bpc_avx2: 93.5
      inv_txfm_add_8x8_identity_dct_1_12bpc_c: 1447.5
      inv_txfm_add_8x8_identity_dct_1_12bpc_avx2: 94.3
      inv_txfm_add_8x8_identity_flipadst_0_12bpc_c: 1451.7
      inv_txfm_add_8x8_identity_flipadst_0_12bpc_avx2: 114.0
      inv_txfm_add_8x8_identity_flipadst_1_12bpc_c: 1456.4
      inv_txfm_add_8x8_identity_flipadst_1_12bpc_avx2: 114.0
      inv_txfm_add_8x8_identity_identity_0_12bpc_c: 892.3
      inv_txfm_add_8x8_identity_identity_0_12bpc_avx2: 33.7
      inv_txfm_add_8x8_identity_identity_1_12bpc_c: 897.2
      inv_txfm_add_8x8_identity_identity_1_12bpc_avx2: 33.1
      31820a5e
    • Matthias Dressel's avatar
      x86/itx: Add 8x4 12bpc AVX2 transforms · 53cf6a3b
      Matthias Dressel authored
      inv_txfm_add_8x4_adst_adst_0_12bpc_c: 882.1
      inv_txfm_add_8x4_adst_adst_0_12bpc_avx2: 113.7
      inv_txfm_add_8x4_adst_adst_1_12bpc_c: 882.5
      inv_txfm_add_8x4_adst_adst_1_12bpc_avx2: 113.8
      inv_txfm_add_8x4_adst_dct_0_12bpc_c: 928.0
      inv_txfm_add_8x4_adst_dct_0_12bpc_avx2: 109.2
      inv_txfm_add_8x4_adst_dct_1_12bpc_c: 924.9
      inv_txfm_add_8x4_adst_dct_1_12bpc_avx2: 109.2
      inv_txfm_add_8x4_adst_flipadst_0_12bpc_c: 889.9
      inv_txfm_add_8x4_adst_flipadst_0_12bpc_avx2: 114.3
      inv_txfm_add_8x4_adst_flipadst_1_12bpc_c: 886.0
      inv_txfm_add_8x4_adst_flipadst_1_12bpc_avx2: 114.8
      inv_txfm_add_8x4_adst_identity_0_12bpc_c: 832.2
      inv_txfm_add_8x4_adst_identity_0_12bpc_avx2: 88.8
      inv_txfm_add_8x4_adst_identity_1_12bpc_c: 834.6
      inv_txfm_add_8x4_adst_identity_1_12bpc_avx2: 89.0
      inv_txfm_add_8x4_dct_adst_0_12bpc_c: 870.3
      inv_txfm_add_8x4_dct_adst_0_12bpc_avx2: 96.3
      inv_txfm_add_8x4_dct_adst_1_12bpc_c: 884.6
      inv_txfm_add_8x4_dct_adst_1_12bpc_avx2: 96.3
      inv_txfm_add_8x4_dct_dct_0_12bpc_c: 116.1
      inv_txfm_add_8x4_dct_dct_0_12bpc_avx2: 24.5
      inv_txfm_add_8x4_dct_dct_1_12bpc_c: 925.1
      inv_txfm_add_8x4_dct_dct_1_12bpc_avx2: 92.3
      inv_txfm_add_8x4_dct_flipadst_0_12bpc_c: 882.7
      inv_txfm_add_8x4_dct_flipadst_0_12bpc_avx2: 97.0
      inv_txfm_add_8x4_dct_flipadst_1_12bpc_c: 882.1
      inv_txfm_add_8x4_dct_flipadst_1_12bpc_avx2: 97.0
      inv_txfm_add_8x4_dct_identity_0_12bpc_c: 827.5
      inv_txfm_add_8x4_dct_identity_0_12bpc_avx2: 72.4
      inv_txfm_add_8x4_dct_identity_1_12bpc_c: 827.8
      inv_txfm_add_8x4_dct_identity_1_12bpc_avx2: 73.8
      inv_txfm_add_8x4_flipadst_adst_0_12bpc_c: 899.5
      inv_txfm_add_8x4_flipadst_adst_0_12bpc_avx2: 113.2
      inv_txfm_add_8x4_flipadst_adst_1_12bpc_c: 898.8
      inv_txfm_add_8x4_flipadst_adst_1_12bpc_avx2: 113.3
      inv_txfm_add_8x4_flipadst_dct_0_12bpc_c: 945.7
      inv_txfm_add_8x4_flipadst_dct_0_12bpc_avx2: 108.3
      inv_txfm_add_8x4_flipadst_dct_1_12bpc_c: 945.6
      inv_txfm_add_8x4_flipadst_dct_1_12bpc_avx2: 108.3
      inv_txfm_add_8x4_flipadst_flipadst_0_12bpc_c: 903.6
      inv_txfm_add_8x4_flipadst_flipadst_0_12bpc_avx2: 113.9
      inv_txfm_add_8x4_flipadst_flipadst_1_12bpc_c: 902.8
      inv_txfm_add_8x4_flipadst_flipadst_1_12bpc_avx2: 114.2
      inv_txfm_add_8x4_flipadst_identity_0_12bpc_c: 856.6
      inv_txfm_add_8x4_flipadst_identity_0_12bpc_avx2: 88.3
      inv_txfm_add_8x4_flipadst_identity_1_12bpc_c: 848.8
      inv_txfm_add_8x4_flipadst_identity_1_12bpc_avx2: 87.4
      inv_txfm_add_8x4_identity_adst_0_12bpc_c: 583.2
      inv_txfm_add_8x4_identity_adst_0_12bpc_avx2: 69.6
      inv_txfm_add_8x4_identity_adst_1_12bpc_c: 584.3
      inv_txfm_add_8x4_identity_adst_1_12bpc_avx2: 69.6
      inv_txfm_add_8x4_identity_dct_0_12bpc_c: 632.9
      inv_txfm_add_8x4_identity_dct_0_12bpc_avx2: 65.3
      inv_txfm_add_8x4_identity_dct_1_12bpc_c: 629.6
      inv_txfm_add_8x4_identity_dct_1_12bpc_avx2: 65.8
      inv_txfm_add_8x4_identity_flipadst_0_12bpc_c: 587.0
      inv_txfm_add_8x4_identity_flipadst_0_12bpc_avx2: 71.0
      inv_txfm_add_8x4_identity_flipadst_1_12bpc_c: 586.9
      inv_txfm_add_8x4_identity_flipadst_1_12bpc_avx2: 71.0
      inv_txfm_add_8x4_identity_identity_0_12bpc_c: 533.0
      inv_txfm_add_8x4_identity_identity_0_12bpc_avx2: 45.3
      inv_txfm_add_8x4_identity_identity_1_12bpc_c: 539.7
      inv_txfm_add_8x4_identity_identity_1_12bpc_avx2: 45.9
      53cf6a3b
    • Matthias Dressel's avatar
      x86/itx: Add 4x8 12bpc AVX2 transforms · 241753f5
      Matthias Dressel authored
      inv_txfm_add_4x8_adst_adst_0_12bpc_c: 900.8
      inv_txfm_add_4x8_adst_adst_0_12bpc_avx2: 118.8
      inv_txfm_add_4x8_adst_adst_1_12bpc_c: 893.7
      inv_txfm_add_4x8_adst_adst_1_12bpc_avx2: 118.8
      inv_txfm_add_4x8_adst_dct_0_12bpc_c: 890.2
      inv_txfm_add_4x8_adst_dct_0_12bpc_avx2: 104.8
      inv_txfm_add_4x8_adst_dct_1_12bpc_c: 887.4
      inv_txfm_add_4x8_adst_dct_1_12bpc_avx2: 104.8
      inv_txfm_add_4x8_adst_flipadst_0_12bpc_c: 919.6
      inv_txfm_add_4x8_adst_flipadst_0_12bpc_avx2: 116.6
      inv_txfm_add_4x8_adst_flipadst_1_12bpc_c: 912.1
      inv_txfm_add_4x8_adst_flipadst_1_12bpc_avx2: 116.6
      inv_txfm_add_4x8_adst_identity_0_12bpc_c: 613.5
      inv_txfm_add_4x8_adst_identity_0_12bpc_avx2: 42.8
      inv_txfm_add_4x8_adst_identity_1_12bpc_c: 608.7
      inv_txfm_add_4x8_adst_identity_1_12bpc_avx2: 43.3
      inv_txfm_add_4x8_dct_adst_0_12bpc_c: 951.7
      inv_txfm_add_4x8_dct_adst_0_12bpc_avx2: 113.8
      inv_txfm_add_4x8_dct_adst_1_12bpc_c: 949.0
      inv_txfm_add_4x8_dct_adst_1_12bpc_avx2: 113.1
      inv_txfm_add_4x8_dct_dct_0_12bpc_c: 118.6
      inv_txfm_add_4x8_dct_dct_0_12bpc_avx2: 24.5
      inv_txfm_add_4x8_dct_dct_1_12bpc_c: 942.4
      inv_txfm_add_4x8_dct_dct_1_12bpc_avx2: 99.2
      inv_txfm_add_4x8_dct_flipadst_0_12bpc_c: 959.3
      inv_txfm_add_4x8_dct_flipadst_0_12bpc_avx2: 113.9
      inv_txfm_add_4x8_dct_flipadst_1_12bpc_c: 964.1
      inv_txfm_add_4x8_dct_flipadst_1_12bpc_avx2: 114.3
      inv_txfm_add_4x8_dct_identity_0_12bpc_c: 659.9
      inv_txfm_add_4x8_dct_identity_0_12bpc_avx2: 41.9
      inv_txfm_add_4x8_dct_identity_1_12bpc_c: 658.6
      inv_txfm_add_4x8_dct_identity_1_12bpc_avx2: 41.6
      inv_txfm_add_4x8_flipadst_adst_0_12bpc_c: 906.6
      inv_txfm_add_4x8_flipadst_adst_0_12bpc_avx2: 117.3
      inv_txfm_add_4x8_flipadst_adst_1_12bpc_c: 907.7
      inv_txfm_add_4x8_flipadst_adst_1_12bpc_avx2: 117.3
      inv_txfm_add_4x8_flipadst_dct_0_12bpc_c: 890.3
      inv_txfm_add_4x8_flipadst_dct_0_12bpc_avx2: 104.6
      inv_txfm_add_4x8_flipadst_dct_1_12bpc_c: 895.6
      inv_txfm_add_4x8_flipadst_dct_1_12bpc_avx2: 104.6
      inv_txfm_add_4x8_flipadst_flipadst_0_12bpc_c: 902.9
      inv_txfm_add_4x8_flipadst_flipadst_0_12bpc_avx2: 116.5
      inv_txfm_add_4x8_flipadst_flipadst_1_12bpc_c: 915.0
      inv_txfm_add_4x8_flipadst_flipadst_1_12bpc_avx2: 116.4
      inv_txfm_add_4x8_flipadst_identity_0_12bpc_c: 618.6
      inv_txfm_add_4x8_flipadst_identity_0_12bpc_avx2: 45.3
      inv_txfm_add_4x8_flipadst_identity_1_12bpc_c: 618.1
      inv_txfm_add_4x8_flipadst_identity_1_12bpc_avx2: 44.0
      inv_txfm_add_4x8_identity_adst_0_12bpc_c: 829.7
      inv_txfm_add_4x8_identity_adst_0_12bpc_avx2: 107.4
      inv_txfm_add_4x8_identity_adst_1_12bpc_c: 831.7
      inv_txfm_add_4x8_identity_adst_1_12bpc_avx2: 107.8
      inv_txfm_add_4x8_identity_dct_0_12bpc_c: 823.2
      inv_txfm_add_4x8_identity_dct_0_12bpc_avx2: 90.7
      inv_txfm_add_4x8_identity_dct_1_12bpc_c: 824.1
      inv_txfm_add_4x8_identity_dct_1_12bpc_avx2: 90.7
      inv_txfm_add_4x8_identity_flipadst_0_12bpc_c: 853.4
      inv_txfm_add_4x8_identity_flipadst_0_12bpc_avx2: 106.8
      inv_txfm_add_4x8_identity_flipadst_1_12bpc_c: 852.2
      inv_txfm_add_4x8_identity_flipadst_1_12bpc_avx2: 106.8
      inv_txfm_add_4x8_identity_identity_0_12bpc_c: 543.2
      inv_txfm_add_4x8_identity_identity_0_12bpc_avx2: 36.4
      inv_txfm_add_4x8_identity_identity_1_12bpc_c: 544.8
      inv_txfm_add_4x8_identity_identity_1_12bpc_avx2: 36.6
      241753f5
  8. Nov 12, 2021
  9. Nov 11, 2021
  10. Nov 10, 2021
  11. Nov 05, 2021
  12. Nov 02, 2021
  13. Nov 01, 2021
  14. Oct 31, 2021
  15. Oct 29, 2021
  16. Oct 28, 2021
    • Martin Storsjö's avatar
      meson: Check for the pthread_getaffinity_np function before deciding to use it · 8c94f95c
      Martin Storsjö authored
      Use the check result instead of hardcoding what OSes have got the
      function.
      
      This also requires checking for the pthread_np.h header and including
      it while testing for functions in meson, but allows getting rid of the
      hardcoded OS conditions in the source.
      
      This fixes building for Android, if _GNU_SOURCE happens to be defined.
      (It gets defined if building with a slightly nonstandard cross file
      that defines "system = 'linux'", but it could also have been set by the
      caller.)
      8c94f95c
  17. Oct 27, 2021
  18. Oct 18, 2021
    • Matthias Dressel's avatar
      x86/itx: Add 12-bit 4x4 transforms in AVX2 · eb0308bc
      Matthias Dressel authored
      Refactors itx into separate 10, 12 bit functions to prevent conditional
      jumps.
      
      inv_txfm_add_4x4_adst_adst_0_12bpc_c: 370.9
      inv_txfm_add_4x4_adst_adst_0_12bpc_avx2: 68.6
      inv_txfm_add_4x4_adst_adst_1_12bpc_c: 371.0
      inv_txfm_add_4x4_adst_adst_1_12bpc_avx2: 68.7
      inv_txfm_add_4x4_adst_dct_0_12bpc_c: 413.1
      inv_txfm_add_4x4_adst_dct_0_12bpc_avx2: 69.2
      inv_txfm_add_4x4_adst_dct_1_12bpc_c: 412.7
      inv_txfm_add_4x4_adst_dct_1_12bpc_avx2: 68.8
      inv_txfm_add_4x4_adst_flipadst_0_12bpc_c: 378.5
      inv_txfm_add_4x4_adst_flipadst_0_12bpc_avx2: 74.9
      inv_txfm_add_4x4_adst_flipadst_1_12bpc_c: 378.1
      inv_txfm_add_4x4_adst_flipadst_1_12bpc_avx2: 74.6
      inv_txfm_add_4x4_adst_identity_0_12bpc_c: 347.8
      inv_txfm_add_4x4_adst_identity_0_12bpc_avx2: 48.8
      inv_txfm_add_4x4_adst_identity_1_12bpc_c: 342.7
      inv_txfm_add_4x4_adst_identity_1_12bpc_avx2: 49.0
      inv_txfm_add_4x4_dct_adst_0_12bpc_c: 399.2
      inv_txfm_add_4x4_dct_adst_0_12bpc_avx2: 73.1
      inv_txfm_add_4x4_dct_adst_1_12bpc_c: 398.7
      inv_txfm_add_4x4_dct_adst_1_12bpc_avx2: 72.2
      inv_txfm_add_4x4_dct_dct_0_12bpc_c: 69.6
      inv_txfm_add_4x4_dct_dct_0_12bpc_avx2: 32.9
      inv_txfm_add_4x4_dct_dct_1_12bpc_c: 420.5
      inv_txfm_add_4x4_dct_dct_1_12bpc_avx2: 72.2
      inv_txfm_add_4x4_dct_flipadst_0_12bpc_c: 405.5
      inv_txfm_add_4x4_dct_flipadst_0_12bpc_avx2: 75.9
      inv_txfm_add_4x4_dct_flipadst_1_12bpc_c: 404.2
      inv_txfm_add_4x4_dct_flipadst_1_12bpc_avx2: 75.6
      inv_txfm_add_4x4_dct_identity_0_12bpc_c: 374.1
      inv_txfm_add_4x4_dct_identity_0_12bpc_avx2: 51.6
      inv_txfm_add_4x4_dct_identity_1_12bpc_c: 368.0
      inv_txfm_add_4x4_dct_identity_1_12bpc_avx2: 51.8
      inv_txfm_add_4x4_flipadst_adst_0_12bpc_c: 368.0
      inv_txfm_add_4x4_flipadst_adst_0_12bpc_avx2: 69.2
      inv_txfm_add_4x4_flipadst_adst_1_12bpc_c: 370.7
      inv_txfm_add_4x4_flipadst_adst_1_12bpc_avx2: 70.4
      inv_txfm_add_4x4_flipadst_dct_0_12bpc_c: 393.7
      inv_txfm_add_4x4_flipadst_dct_0_12bpc_avx2: 70.1
      inv_txfm_add_4x4_flipadst_dct_1_12bpc_c: 392.9
      inv_txfm_add_4x4_flipadst_dct_1_12bpc_avx2: 69.6
      inv_txfm_add_4x4_flipadst_flipadst_0_12bpc_c: 382.2
      inv_txfm_add_4x4_flipadst_flipadst_0_12bpc_avx2: 74.6
      inv_txfm_add_4x4_flipadst_flipadst_1_12bpc_c: 381.3
      inv_txfm_add_4x4_flipadst_flipadst_1_12bpc_avx2: 74.9
      inv_txfm_add_4x4_flipadst_identity_0_12bpc_c: 346.7
      inv_txfm_add_4x4_flipadst_identity_0_12bpc_avx2: 48.2
      inv_txfm_add_4x4_flipadst_identity_1_12bpc_c: 347.9
      inv_txfm_add_4x4_flipadst_identity_1_12bpc_avx2: 48.7
      inv_txfm_add_4x4_identity_adst_0_12bpc_c: 344.7
      inv_txfm_add_4x4_identity_adst_0_12bpc_avx2: 59.8
      inv_txfm_add_4x4_identity_adst_1_12bpc_c: 340.5
      inv_txfm_add_4x4_identity_adst_1_12bpc_avx2: 59.2
      inv_txfm_add_4x4_identity_dct_0_12bpc_c: 369.8
      inv_txfm_add_4x4_identity_dct_0_12bpc_avx2: 59.3
      inv_txfm_add_4x4_identity_dct_1_12bpc_c: 369.5
      inv_txfm_add_4x4_identity_dct_1_12bpc_avx2: 59.2
      inv_txfm_add_4x4_identity_flipadst_0_12bpc_c: 353.4
      inv_txfm_add_4x4_identity_flipadst_0_12bpc_avx2: 65.6
      inv_txfm_add_4x4_identity_flipadst_1_12bpc_c: 350.9
      inv_txfm_add_4x4_identity_flipadst_1_12bpc_avx2: 65.9
      inv_txfm_add_4x4_identity_identity_0_12bpc_c: 326.1
      inv_txfm_add_4x4_identity_identity_0_12bpc_avx2: 39.5
      inv_txfm_add_4x4_identity_identity_1_12bpc_c: 321.6
      inv_txfm_add_4x4_identity_identity_1_12bpc_avx2: 39.5
      eb0308bc
    • Matthias Dressel's avatar
      x86/itx: Rename rax to r6 · 4cdfe691
      Matthias Dressel authored
      Use numerical GPR references everywhere for consistency.
      4cdfe691
    • Matthias Dressel's avatar
      x86/itx: Name constants more explicit · 1ea40afd
      Matthias Dressel authored
      Give some constants a more explicit name to avoid confusion when 12bpc
      support is added.
      1ea40afd
    • Henrik Gramner's avatar
      x86: Add splat_mv AVX-512 (Ice Lake) asm · 8baea7b1
      Henrik Gramner authored
      8baea7b1
    • Victorien Le Couviour--Tuffet's avatar
      82d6d950
    • Henrik Gramner's avatar
      x86: Add sgr AVX-512 (Ice Lake) asm · 05682126
      Henrik Gramner authored
      05682126
    • Henrik Gramner's avatar
      bf0f4690
    • Henrik Gramner's avatar
      ef216e17
    • Henrik Gramner's avatar
Loading