Skip to content
Snippets Groups Projects
  1. Nov 29, 2021
    • Henrik Gramner's avatar
    • Matthias Dressel's avatar
      x86/itx: Add 16x4 12bpc AVX2 transforms · 7be12857
      Matthias Dressel authored
      inv_txfm_add_16x4_adst_adst_0_12bpc_c: 1756.6
      inv_txfm_add_16x4_adst_adst_0_12bpc_avx2: 182.4
      inv_txfm_add_16x4_adst_adst_1_12bpc_c: 1756.0
      inv_txfm_add_16x4_adst_adst_1_12bpc_avx2: 182.5
      inv_txfm_add_16x4_adst_adst_2_12bpc_c: 1763.2
      inv_txfm_add_16x4_adst_adst_2_12bpc_avx2: 182.4
      inv_txfm_add_16x4_adst_dct_0_12bpc_c: 1863.6
      inv_txfm_add_16x4_adst_dct_0_12bpc_avx2: 176.0
      inv_txfm_add_16x4_adst_dct_1_12bpc_c: 1864.1
      inv_txfm_add_16x4_adst_dct_1_12bpc_avx2: 176.0
      inv_txfm_add_16x4_adst_dct_2_12bpc_c: 1861.3
      inv_txfm_add_16x4_adst_dct_2_12bpc_avx2: 176.0
      inv_txfm_add_16x4_adst_flipadst_0_12bpc_c: 1768.6
      inv_txfm_add_16x4_adst_flipadst_0_12bpc_avx2: 184.1
      inv_txfm_add_16x4_adst_flipadst_1_12bpc_c: 1768.8
      inv_txfm_add_16x4_adst_flipadst_1_12bpc_avx2: 184.5
      inv_txfm_add_16x4_adst_flipadst_2_12bpc_c: 1769.3
      inv_txfm_add_16x4_adst_flipadst_2_12bpc_avx2: 184.7
      inv_txfm_add_16x4_adst_identity_0_12bpc_c: 1686.6
      inv_txfm_add_16x4_adst_identity_0_12bpc_avx2: 145.4
      inv_txfm_add_16x4_adst_identity_1_12bpc_c: 1685.8
      inv_txfm_add_16x4_adst_identity_1_12bpc_avx2: 145.8
      inv_txfm_add_16x4_adst_identity_2_12bpc_c: 1681.7
      inv_txfm_add_16x4_adst_identity_2_12bpc_avx2: 145.8
      inv_txfm_add_16x4_dct_adst_0_12bpc_c: 1783.4
      inv_txfm_add_16x4_dct_adst_0_12bpc_avx2: 167.7
      inv_txfm_add_16x4_dct_adst_1_12bpc_c: 1789.1
      inv_txfm_add_16x4_dct_adst_1_12bpc_avx2: 167.9
      inv_txfm_add_16x4_dct_adst_2_12bpc_c: 1788.0
      inv_txfm_add_16x4_dct_adst_2_12bpc_avx2: 169.8
      inv_txfm_add_16x4_dct_dct_0_12bpc_c: 209.5
      inv_txfm_add_16x4_dct_dct_0_12bpc_avx2: 21.6
      inv_txfm_add_16x4_dct_dct_1_12bpc_c: 1894.3
      inv_txfm_add_16x4_dct_dct_1_12bpc_avx2: 156.8
      inv_txfm_add_16x4_dct_dct_2_12bpc_c: 1892.0
      inv_txfm_add_16x4_dct_dct_2_12bpc_avx2: 156.8
      inv_txfm_add_16x4_dct_flipadst_0_12bpc_c: 1784.7
      inv_txfm_add_16x4_dct_flipadst_0_12bpc_avx2: 167.2
      inv_txfm_add_16x4_dct_flipadst_1_12bpc_c: 1796.7
      inv_txfm_add_16x4_dct_flipadst_1_12bpc_avx2: 168.6
      inv_txfm_add_16x4_dct_flipadst_2_12bpc_c: 1788.9
      inv_txfm_add_16x4_dct_flipadst_2_12bpc_avx2: 168.9
      inv_txfm_add_16x4_dct_identity_0_12bpc_c: 1712.7
      inv_txfm_add_16x4_dct_identity_0_12bpc_avx2: 128.8
      inv_txfm_add_16x4_dct_identity_1_12bpc_c: 1714.8
      inv_txfm_add_16x4_dct_identity_1_12bpc_avx2: 128.8
      inv_txfm_add_16x4_dct_identity_2_12bpc_c: 1710.2
      inv_txfm_add_16x4_dct_identity_2_12bpc_avx2: 128.8
      inv_txfm_add_16x4_flipadst_adst_0_12bpc_c: 1763.6
      inv_txfm_add_16x4_flipadst_adst_0_12bpc_avx2: 186.6
      inv_txfm_add_16x4_flipadst_adst_1_12bpc_c: 1761.1
      inv_txfm_add_16x4_flipadst_adst_1_12bpc_avx2: 185.6
      inv_txfm_add_16x4_flipadst_adst_2_12bpc_c: 1761.8
      inv_txfm_add_16x4_flipadst_adst_2_12bpc_avx2: 187.0
      inv_txfm_add_16x4_flipadst_dct_0_12bpc_c: 1864.4
      inv_txfm_add_16x4_flipadst_dct_0_12bpc_avx2: 176.8
      inv_txfm_add_16x4_flipadst_dct_1_12bpc_c: 1862.7
      inv_txfm_add_16x4_flipadst_dct_1_12bpc_avx2: 176.8
      inv_txfm_add_16x4_flipadst_dct_2_12bpc_c: 1860.2
      inv_txfm_add_16x4_flipadst_dct_2_12bpc_avx2: 176.8
      inv_txfm_add_16x4_flipadst_flipadst_0_12bpc_c: 1760.4
      inv_txfm_add_16x4_flipadst_flipadst_0_12bpc_avx2: 185.3
      inv_txfm_add_16x4_flipadst_flipadst_1_12bpc_c: 1761.8
      inv_txfm_add_16x4_flipadst_flipadst_1_12bpc_avx2: 185.3
      inv_txfm_add_16x4_flipadst_flipadst_2_12bpc_c: 1766.5
      inv_txfm_add_16x4_flipadst_flipadst_2_12bpc_avx2: 184.9
      inv_txfm_add_16x4_flipadst_identity_0_12bpc_c: 1673.0
      inv_txfm_add_16x4_flipadst_identity_0_12bpc_avx2: 143.1
      inv_txfm_add_16x4_flipadst_identity_1_12bpc_c: 1673.2
      inv_txfm_add_16x4_flipadst_identity_1_12bpc_avx2: 143.1
      inv_txfm_add_16x4_flipadst_identity_2_12bpc_c: 1681.6
      inv_txfm_add_16x4_flipadst_identity_2_12bpc_avx2: 143.2
      inv_txfm_add_16x4_identity_adst_0_12bpc_c: 1128.7
      inv_txfm_add_16x4_identity_adst_0_12bpc_avx2: 102.8
      inv_txfm_add_16x4_identity_adst_1_12bpc_c: 1131.3
      inv_txfm_add_16x4_identity_adst_1_12bpc_avx2: 101.3
      inv_txfm_add_16x4_identity_adst_2_12bpc_c: 1127.5
      inv_txfm_add_16x4_identity_adst_2_12bpc_avx2: 99.1
      inv_txfm_add_16x4_identity_dct_0_12bpc_c: 1228.3
      inv_txfm_add_16x4_identity_dct_0_12bpc_avx2: 88.3
      inv_txfm_add_16x4_identity_dct_1_12bpc_c: 1220.5
      inv_txfm_add_16x4_identity_dct_1_12bpc_avx2: 88.0
      inv_txfm_add_16x4_identity_dct_2_12bpc_c: 1227.3
      inv_txfm_add_16x4_identity_dct_2_12bpc_avx2: 88.1
      inv_txfm_add_16x4_identity_flipadst_0_12bpc_c: 1142.4
      inv_txfm_add_16x4_identity_flipadst_0_12bpc_avx2: 100.3
      inv_txfm_add_16x4_identity_flipadst_1_12bpc_c: 1134.1
      inv_txfm_add_16x4_identity_flipadst_1_12bpc_avx2: 100.3
      inv_txfm_add_16x4_identity_flipadst_2_12bpc_c: 1136.4
      inv_txfm_add_16x4_identity_flipadst_2_12bpc_avx2: 100.3
      inv_txfm_add_16x4_identity_identity_0_12bpc_c: 1056.1
      inv_txfm_add_16x4_identity_identity_0_12bpc_avx2: 61.6
      inv_txfm_add_16x4_identity_identity_1_12bpc_c: 1064.6
      inv_txfm_add_16x4_identity_identity_1_12bpc_avx2: 62.9
      inv_txfm_add_16x4_identity_identity_2_12bpc_c: 1067.5
      inv_txfm_add_16x4_identity_identity_2_12bpc_avx2: 63.5
      7be12857
    • Matthias Dressel's avatar
      x86/itx: Add 4x16 12bpc AVX2 transforms · f64b2c22
      Matthias Dressel authored
      inv_txfm_add_4x16_adst_adst_0_12bpc_c: 1799.1
      inv_txfm_add_4x16_adst_adst_0_12bpc_avx2: 178.8
      inv_txfm_add_4x16_adst_adst_1_12bpc_c: 1795.0
      inv_txfm_add_4x16_adst_adst_1_12bpc_avx2: 179.1
      inv_txfm_add_4x16_adst_adst_2_12bpc_c: 1806.6
      inv_txfm_add_4x16_adst_adst_2_12bpc_avx2: 179.3
      inv_txfm_add_4x16_adst_dct_0_12bpc_c: 1824.8
      inv_txfm_add_4x16_adst_dct_0_12bpc_avx2: 166.8
      inv_txfm_add_4x16_adst_dct_1_12bpc_c: 1828.2
      inv_txfm_add_4x16_adst_dct_1_12bpc_avx2: 166.7
      inv_txfm_add_4x16_adst_dct_2_12bpc_c: 1830.9
      inv_txfm_add_4x16_adst_dct_2_12bpc_avx2: 165.6
      inv_txfm_add_4x16_adst_flipadst_0_12bpc_c: 1797.9
      inv_txfm_add_4x16_adst_flipadst_0_12bpc_avx2: 179.6
      inv_txfm_add_4x16_adst_flipadst_1_12bpc_c: 1795.9
      inv_txfm_add_4x16_adst_flipadst_1_12bpc_avx2: 180.6
      inv_txfm_add_4x16_adst_flipadst_2_12bpc_c: 1791.6
      inv_txfm_add_4x16_adst_flipadst_2_12bpc_avx2: 180.1
      inv_txfm_add_4x16_adst_identity_0_12bpc_c: 1163.7
      inv_txfm_add_4x16_adst_identity_0_12bpc_avx2: 78.6
      inv_txfm_add_4x16_adst_identity_1_12bpc_c: 1163.4
      inv_txfm_add_4x16_adst_identity_1_12bpc_avx2: 78.9
      inv_txfm_add_4x16_adst_identity_2_12bpc_c: 1164.3
      inv_txfm_add_4x16_adst_identity_2_12bpc_avx2: 78.8
      inv_txfm_add_4x16_dct_adst_0_12bpc_c: 1914.8
      inv_txfm_add_4x16_dct_adst_0_12bpc_avx2: 177.0
      inv_txfm_add_4x16_dct_adst_1_12bpc_c: 1904.8
      inv_txfm_add_4x16_dct_adst_1_12bpc_avx2: 177.3
      inv_txfm_add_4x16_dct_adst_2_12bpc_c: 1905.4
      inv_txfm_add_4x16_dct_adst_2_12bpc_avx2: 176.4
      inv_txfm_add_4x16_dct_dct_0_12bpc_c: 217.1
      inv_txfm_add_4x16_dct_dct_0_12bpc_avx2: 26.6
      inv_txfm_add_4x16_dct_dct_1_12bpc_c: 1955.1
      inv_txfm_add_4x16_dct_dct_1_12bpc_avx2: 162.3
      inv_txfm_add_4x16_dct_dct_2_12bpc_c: 1948.9
      inv_txfm_add_4x16_dct_dct_2_12bpc_avx2: 162.2
      inv_txfm_add_4x16_dct_flipadst_0_12bpc_c: 1922.8
      inv_txfm_add_4x16_dct_flipadst_0_12bpc_avx2: 180.6
      inv_txfm_add_4x16_dct_flipadst_1_12bpc_c: 1919.7
      inv_txfm_add_4x16_dct_flipadst_1_12bpc_avx2: 180.1
      inv_txfm_add_4x16_dct_flipadst_2_12bpc_c: 1912.0
      inv_txfm_add_4x16_dct_flipadst_2_12bpc_avx2: 180.1
      inv_txfm_add_4x16_dct_identity_0_12bpc_c: 1276.4
      inv_txfm_add_4x16_dct_identity_0_12bpc_avx2: 75.4
      inv_txfm_add_4x16_dct_identity_1_12bpc_c: 1277.5
      inv_txfm_add_4x16_dct_identity_1_12bpc_avx2: 75.4
      inv_txfm_add_4x16_dct_identity_2_12bpc_c: 1270.1
      inv_txfm_add_4x16_dct_identity_2_12bpc_avx2: 75.3
      inv_txfm_add_4x16_flipadst_adst_0_12bpc_c: 1802.8
      inv_txfm_add_4x16_flipadst_adst_0_12bpc_avx2: 180.8
      inv_txfm_add_4x16_flipadst_adst_1_12bpc_c: 1804.8
      inv_txfm_add_4x16_flipadst_adst_1_12bpc_avx2: 180.7
      inv_txfm_add_4x16_flipadst_adst_2_12bpc_c: 1800.6
      inv_txfm_add_4x16_flipadst_adst_2_12bpc_avx2: 181.2
      inv_txfm_add_4x16_flipadst_dct_0_12bpc_c: 1842.5
      inv_txfm_add_4x16_flipadst_dct_0_12bpc_avx2: 165.1
      inv_txfm_add_4x16_flipadst_dct_1_12bpc_c: 1837.8
      inv_txfm_add_4x16_flipadst_dct_1_12bpc_avx2: 164.4
      inv_txfm_add_4x16_flipadst_dct_2_12bpc_c: 1841.6
      inv_txfm_add_4x16_flipadst_dct_2_12bpc_avx2: 166.1
      inv_txfm_add_4x16_flipadst_flipadst_0_12bpc_c: 1812.4
      inv_txfm_add_4x16_flipadst_flipadst_0_12bpc_avx2: 182.0
      inv_txfm_add_4x16_flipadst_flipadst_1_12bpc_c: 1803.9
      inv_txfm_add_4x16_flipadst_flipadst_1_12bpc_avx2: 181.2
      inv_txfm_add_4x16_flipadst_flipadst_2_12bpc_c: 1809.9
      inv_txfm_add_4x16_flipadst_flipadst_2_12bpc_avx2: 183.2
      inv_txfm_add_4x16_flipadst_identity_0_12bpc_c: 1170.5
      inv_txfm_add_4x16_flipadst_identity_0_12bpc_avx2: 78.4
      inv_txfm_add_4x16_flipadst_identity_1_12bpc_c: 1172.1
      inv_txfm_add_4x16_flipadst_identity_1_12bpc_avx2: 80.0
      inv_txfm_add_4x16_flipadst_identity_2_12bpc_c: 1170.9
      inv_txfm_add_4x16_flipadst_identity_2_12bpc_avx2: 78.6
      inv_txfm_add_4x16_identity_adst_0_12bpc_c: 1705.4
      inv_txfm_add_4x16_identity_adst_0_12bpc_avx2: 162.6
      inv_txfm_add_4x16_identity_adst_1_12bpc_c: 1714.5
      inv_txfm_add_4x16_identity_adst_1_12bpc_avx2: 162.6
      inv_txfm_add_4x16_identity_adst_2_12bpc_c: 1703.1
      inv_txfm_add_4x16_identity_adst_2_12bpc_avx2: 162.5
      inv_txfm_add_4x16_identity_dct_0_12bpc_c: 1775.0
      inv_txfm_add_4x16_identity_dct_0_12bpc_avx2: 150.5
      inv_txfm_add_4x16_identity_dct_1_12bpc_c: 1753.0
      inv_txfm_add_4x16_identity_dct_1_12bpc_avx2: 150.6
      inv_txfm_add_4x16_identity_dct_2_12bpc_c: 1759.6
      inv_txfm_add_4x16_identity_dct_2_12bpc_avx2: 149.8
      inv_txfm_add_4x16_identity_flipadst_0_12bpc_c: 1727.5
      inv_txfm_add_4x16_identity_flipadst_0_12bpc_avx2: 160.3
      inv_txfm_add_4x16_identity_flipadst_1_12bpc_c: 1739.8
      inv_txfm_add_4x16_identity_flipadst_1_12bpc_avx2: 160.9
      inv_txfm_add_4x16_identity_flipadst_2_12bpc_c: 1728.3
      inv_txfm_add_4x16_identity_flipadst_2_12bpc_avx2: 159.9
      inv_txfm_add_4x16_identity_identity_0_12bpc_c: 1098.6
      inv_txfm_add_4x16_identity_identity_0_12bpc_avx2: 60.4
      inv_txfm_add_4x16_identity_identity_1_12bpc_c: 1095.4
      inv_txfm_add_4x16_identity_identity_1_12bpc_avx2: 61.3
      inv_txfm_add_4x16_identity_identity_2_12bpc_c: 1111.6
      inv_txfm_add_4x16_identity_identity_2_12bpc_avx2: 60.6
      f64b2c22
    • Matthias Dressel's avatar
      x86/itx: Convert 8bpc WHT to SSE2 · 00f92f2c
      Matthias Dressel authored
      WHT uses no SSSE3 instructions. The 16bpc variant is already SSE2.
      00f92f2c
  2. Nov 18, 2021
  3. Nov 15, 2021
  4. Nov 13, 2021
    • Matthias Dressel's avatar
      x86/itx: Add 8x8 12bpc AVX2 transforms · 31820a5e
      Matthias Dressel authored
      inv_txfm_add_8x8_adst_adst_0_12bpc_c: 1997.9
      inv_txfm_add_8x8_adst_adst_0_12bpc_avx2: 185.7
      inv_txfm_add_8x8_adst_adst_1_12bpc_c: 2009.8
      inv_txfm_add_8x8_adst_adst_1_12bpc_avx2: 185.7
      inv_txfm_add_8x8_adst_dct_0_12bpc_c: 1991.0
      inv_txfm_add_8x8_adst_dct_0_12bpc_avx2: 161.3
      inv_txfm_add_8x8_adst_dct_1_12bpc_c: 1977.0
      inv_txfm_add_8x8_adst_dct_1_12bpc_avx2: 161.4
      inv_txfm_add_8x8_adst_flipadst_0_12bpc_c: 2017.6
      inv_txfm_add_8x8_adst_flipadst_0_12bpc_avx2: 184.2
      inv_txfm_add_8x8_adst_flipadst_1_12bpc_c: 2018.9
      inv_txfm_add_8x8_adst_flipadst_1_12bpc_avx2: 184.2
      inv_txfm_add_8x8_adst_identity_0_12bpc_c: 1407.2
      inv_txfm_add_8x8_adst_identity_0_12bpc_avx2: 95.7
      inv_txfm_add_8x8_adst_identity_1_12bpc_c: 1405.9
      inv_txfm_add_8x8_adst_identity_1_12bpc_avx2: 95.8
      inv_txfm_add_8x8_dct_adst_0_12bpc_c: 2024.2
      inv_txfm_add_8x8_dct_adst_0_12bpc_avx2: 156.9
      inv_txfm_add_8x8_dct_adst_1_12bpc_c: 2018.8
      inv_txfm_add_8x8_dct_adst_1_12bpc_avx2: 160.1
      inv_txfm_add_8x8_dct_dct_0_12bpc_c: 213.0
      inv_txfm_add_8x8_dct_dct_0_12bpc_avx2: 24.8
      inv_txfm_add_8x8_dct_dct_1_12bpc_c: 2008.6
      inv_txfm_add_8x8_dct_dct_1_12bpc_avx2: 139.0
      inv_txfm_add_8x8_dct_flipadst_0_12bpc_c: 2012.3
      inv_txfm_add_8x8_dct_flipadst_0_12bpc_avx2: 159.2
      inv_txfm_add_8x8_dct_flipadst_1_12bpc_c: 2005.1
      inv_txfm_add_8x8_dct_flipadst_1_12bpc_avx2: 158.7
      inv_txfm_add_8x8_dct_identity_0_12bpc_c: 1470.4
      inv_txfm_add_8x8_dct_identity_0_12bpc_avx2: 71.7
      inv_txfm_add_8x8_dct_identity_1_12bpc_c: 1477.8
      inv_txfm_add_8x8_dct_identity_1_12bpc_avx2: 70.7
      inv_txfm_add_8x8_flipadst_adst_0_12bpc_c: 2006.1
      inv_txfm_add_8x8_flipadst_adst_0_12bpc_avx2: 183.6
      inv_txfm_add_8x8_flipadst_adst_1_12bpc_c: 1987.6
      inv_txfm_add_8x8_flipadst_adst_1_12bpc_avx2: 183.6
      inv_txfm_add_8x8_flipadst_dct_0_12bpc_c: 1986.6
      inv_txfm_add_8x8_flipadst_dct_0_12bpc_avx2: 163.0
      inv_txfm_add_8x8_flipadst_dct_1_12bpc_c: 1979.3
      inv_txfm_add_8x8_flipadst_dct_1_12bpc_avx2: 163.1
      inv_txfm_add_8x8_flipadst_flipadst_0_12bpc_c: 2004.0
      inv_txfm_add_8x8_flipadst_flipadst_0_12bpc_avx2: 184.3
      inv_txfm_add_8x8_flipadst_flipadst_1_12bpc_c: 2003.9
      inv_txfm_add_8x8_flipadst_flipadst_1_12bpc_avx2: 184.3
      inv_txfm_add_8x8_flipadst_identity_0_12bpc_c: 1433.5
      inv_txfm_add_8x8_flipadst_identity_0_12bpc_avx2: 95.3
      inv_txfm_add_8x8_flipadst_identity_1_12bpc_c: 1425.4
      inv_txfm_add_8x8_flipadst_identity_1_12bpc_avx2: 96.3
      inv_txfm_add_8x8_identity_adst_0_12bpc_c: 1456.5
      inv_txfm_add_8x8_identity_adst_0_12bpc_avx2: 115.8
      inv_txfm_add_8x8_identity_adst_1_12bpc_c: 1453.5
      inv_txfm_add_8x8_identity_adst_1_12bpc_avx2: 115.8
      inv_txfm_add_8x8_identity_dct_0_12bpc_c: 1450.0
      inv_txfm_add_8x8_identity_dct_0_12bpc_avx2: 93.5
      inv_txfm_add_8x8_identity_dct_1_12bpc_c: 1447.5
      inv_txfm_add_8x8_identity_dct_1_12bpc_avx2: 94.3
      inv_txfm_add_8x8_identity_flipadst_0_12bpc_c: 1451.7
      inv_txfm_add_8x8_identity_flipadst_0_12bpc_avx2: 114.0
      inv_txfm_add_8x8_identity_flipadst_1_12bpc_c: 1456.4
      inv_txfm_add_8x8_identity_flipadst_1_12bpc_avx2: 114.0
      inv_txfm_add_8x8_identity_identity_0_12bpc_c: 892.3
      inv_txfm_add_8x8_identity_identity_0_12bpc_avx2: 33.7
      inv_txfm_add_8x8_identity_identity_1_12bpc_c: 897.2
      inv_txfm_add_8x8_identity_identity_1_12bpc_avx2: 33.1
      31820a5e
    • Matthias Dressel's avatar
      x86/itx: Add 8x4 12bpc AVX2 transforms · 53cf6a3b
      Matthias Dressel authored
      inv_txfm_add_8x4_adst_adst_0_12bpc_c: 882.1
      inv_txfm_add_8x4_adst_adst_0_12bpc_avx2: 113.7
      inv_txfm_add_8x4_adst_adst_1_12bpc_c: 882.5
      inv_txfm_add_8x4_adst_adst_1_12bpc_avx2: 113.8
      inv_txfm_add_8x4_adst_dct_0_12bpc_c: 928.0
      inv_txfm_add_8x4_adst_dct_0_12bpc_avx2: 109.2
      inv_txfm_add_8x4_adst_dct_1_12bpc_c: 924.9
      inv_txfm_add_8x4_adst_dct_1_12bpc_avx2: 109.2
      inv_txfm_add_8x4_adst_flipadst_0_12bpc_c: 889.9
      inv_txfm_add_8x4_adst_flipadst_0_12bpc_avx2: 114.3
      inv_txfm_add_8x4_adst_flipadst_1_12bpc_c: 886.0
      inv_txfm_add_8x4_adst_flipadst_1_12bpc_avx2: 114.8
      inv_txfm_add_8x4_adst_identity_0_12bpc_c: 832.2
      inv_txfm_add_8x4_adst_identity_0_12bpc_avx2: 88.8
      inv_txfm_add_8x4_adst_identity_1_12bpc_c: 834.6
      inv_txfm_add_8x4_adst_identity_1_12bpc_avx2: 89.0
      inv_txfm_add_8x4_dct_adst_0_12bpc_c: 870.3
      inv_txfm_add_8x4_dct_adst_0_12bpc_avx2: 96.3
      inv_txfm_add_8x4_dct_adst_1_12bpc_c: 884.6
      inv_txfm_add_8x4_dct_adst_1_12bpc_avx2: 96.3
      inv_txfm_add_8x4_dct_dct_0_12bpc_c: 116.1
      inv_txfm_add_8x4_dct_dct_0_12bpc_avx2: 24.5
      inv_txfm_add_8x4_dct_dct_1_12bpc_c: 925.1
      inv_txfm_add_8x4_dct_dct_1_12bpc_avx2: 92.3
      inv_txfm_add_8x4_dct_flipadst_0_12bpc_c: 882.7
      inv_txfm_add_8x4_dct_flipadst_0_12bpc_avx2: 97.0
      inv_txfm_add_8x4_dct_flipadst_1_12bpc_c: 882.1
      inv_txfm_add_8x4_dct_flipadst_1_12bpc_avx2: 97.0
      inv_txfm_add_8x4_dct_identity_0_12bpc_c: 827.5
      inv_txfm_add_8x4_dct_identity_0_12bpc_avx2: 72.4
      inv_txfm_add_8x4_dct_identity_1_12bpc_c: 827.8
      inv_txfm_add_8x4_dct_identity_1_12bpc_avx2: 73.8
      inv_txfm_add_8x4_flipadst_adst_0_12bpc_c: 899.5
      inv_txfm_add_8x4_flipadst_adst_0_12bpc_avx2: 113.2
      inv_txfm_add_8x4_flipadst_adst_1_12bpc_c: 898.8
      inv_txfm_add_8x4_flipadst_adst_1_12bpc_avx2: 113.3
      inv_txfm_add_8x4_flipadst_dct_0_12bpc_c: 945.7
      inv_txfm_add_8x4_flipadst_dct_0_12bpc_avx2: 108.3
      inv_txfm_add_8x4_flipadst_dct_1_12bpc_c: 945.6
      inv_txfm_add_8x4_flipadst_dct_1_12bpc_avx2: 108.3
      inv_txfm_add_8x4_flipadst_flipadst_0_12bpc_c: 903.6
      inv_txfm_add_8x4_flipadst_flipadst_0_12bpc_avx2: 113.9
      inv_txfm_add_8x4_flipadst_flipadst_1_12bpc_c: 902.8
      inv_txfm_add_8x4_flipadst_flipadst_1_12bpc_avx2: 114.2
      inv_txfm_add_8x4_flipadst_identity_0_12bpc_c: 856.6
      inv_txfm_add_8x4_flipadst_identity_0_12bpc_avx2: 88.3
      inv_txfm_add_8x4_flipadst_identity_1_12bpc_c: 848.8
      inv_txfm_add_8x4_flipadst_identity_1_12bpc_avx2: 87.4
      inv_txfm_add_8x4_identity_adst_0_12bpc_c: 583.2
      inv_txfm_add_8x4_identity_adst_0_12bpc_avx2: 69.6
      inv_txfm_add_8x4_identity_adst_1_12bpc_c: 584.3
      inv_txfm_add_8x4_identity_adst_1_12bpc_avx2: 69.6
      inv_txfm_add_8x4_identity_dct_0_12bpc_c: 632.9
      inv_txfm_add_8x4_identity_dct_0_12bpc_avx2: 65.3
      inv_txfm_add_8x4_identity_dct_1_12bpc_c: 629.6
      inv_txfm_add_8x4_identity_dct_1_12bpc_avx2: 65.8
      inv_txfm_add_8x4_identity_flipadst_0_12bpc_c: 587.0
      inv_txfm_add_8x4_identity_flipadst_0_12bpc_avx2: 71.0
      inv_txfm_add_8x4_identity_flipadst_1_12bpc_c: 586.9
      inv_txfm_add_8x4_identity_flipadst_1_12bpc_avx2: 71.0
      inv_txfm_add_8x4_identity_identity_0_12bpc_c: 533.0
      inv_txfm_add_8x4_identity_identity_0_12bpc_avx2: 45.3
      inv_txfm_add_8x4_identity_identity_1_12bpc_c: 539.7
      inv_txfm_add_8x4_identity_identity_1_12bpc_avx2: 45.9
      53cf6a3b
    • Matthias Dressel's avatar
      x86/itx: Add 4x8 12bpc AVX2 transforms · 241753f5
      Matthias Dressel authored
      inv_txfm_add_4x8_adst_adst_0_12bpc_c: 900.8
      inv_txfm_add_4x8_adst_adst_0_12bpc_avx2: 118.8
      inv_txfm_add_4x8_adst_adst_1_12bpc_c: 893.7
      inv_txfm_add_4x8_adst_adst_1_12bpc_avx2: 118.8
      inv_txfm_add_4x8_adst_dct_0_12bpc_c: 890.2
      inv_txfm_add_4x8_adst_dct_0_12bpc_avx2: 104.8
      inv_txfm_add_4x8_adst_dct_1_12bpc_c: 887.4
      inv_txfm_add_4x8_adst_dct_1_12bpc_avx2: 104.8
      inv_txfm_add_4x8_adst_flipadst_0_12bpc_c: 919.6
      inv_txfm_add_4x8_adst_flipadst_0_12bpc_avx2: 116.6
      inv_txfm_add_4x8_adst_flipadst_1_12bpc_c: 912.1
      inv_txfm_add_4x8_adst_flipadst_1_12bpc_avx2: 116.6
      inv_txfm_add_4x8_adst_identity_0_12bpc_c: 613.5
      inv_txfm_add_4x8_adst_identity_0_12bpc_avx2: 42.8
      inv_txfm_add_4x8_adst_identity_1_12bpc_c: 608.7
      inv_txfm_add_4x8_adst_identity_1_12bpc_avx2: 43.3
      inv_txfm_add_4x8_dct_adst_0_12bpc_c: 951.7
      inv_txfm_add_4x8_dct_adst_0_12bpc_avx2: 113.8
      inv_txfm_add_4x8_dct_adst_1_12bpc_c: 949.0
      inv_txfm_add_4x8_dct_adst_1_12bpc_avx2: 113.1
      inv_txfm_add_4x8_dct_dct_0_12bpc_c: 118.6
      inv_txfm_add_4x8_dct_dct_0_12bpc_avx2: 24.5
      inv_txfm_add_4x8_dct_dct_1_12bpc_c: 942.4
      inv_txfm_add_4x8_dct_dct_1_12bpc_avx2: 99.2
      inv_txfm_add_4x8_dct_flipadst_0_12bpc_c: 959.3
      inv_txfm_add_4x8_dct_flipadst_0_12bpc_avx2: 113.9
      inv_txfm_add_4x8_dct_flipadst_1_12bpc_c: 964.1
      inv_txfm_add_4x8_dct_flipadst_1_12bpc_avx2: 114.3
      inv_txfm_add_4x8_dct_identity_0_12bpc_c: 659.9
      inv_txfm_add_4x8_dct_identity_0_12bpc_avx2: 41.9
      inv_txfm_add_4x8_dct_identity_1_12bpc_c: 658.6
      inv_txfm_add_4x8_dct_identity_1_12bpc_avx2: 41.6
      inv_txfm_add_4x8_flipadst_adst_0_12bpc_c: 906.6
      inv_txfm_add_4x8_flipadst_adst_0_12bpc_avx2: 117.3
      inv_txfm_add_4x8_flipadst_adst_1_12bpc_c: 907.7
      inv_txfm_add_4x8_flipadst_adst_1_12bpc_avx2: 117.3
      inv_txfm_add_4x8_flipadst_dct_0_12bpc_c: 890.3
      inv_txfm_add_4x8_flipadst_dct_0_12bpc_avx2: 104.6
      inv_txfm_add_4x8_flipadst_dct_1_12bpc_c: 895.6
      inv_txfm_add_4x8_flipadst_dct_1_12bpc_avx2: 104.6
      inv_txfm_add_4x8_flipadst_flipadst_0_12bpc_c: 902.9
      inv_txfm_add_4x8_flipadst_flipadst_0_12bpc_avx2: 116.5
      inv_txfm_add_4x8_flipadst_flipadst_1_12bpc_c: 915.0
      inv_txfm_add_4x8_flipadst_flipadst_1_12bpc_avx2: 116.4
      inv_txfm_add_4x8_flipadst_identity_0_12bpc_c: 618.6
      inv_txfm_add_4x8_flipadst_identity_0_12bpc_avx2: 45.3
      inv_txfm_add_4x8_flipadst_identity_1_12bpc_c: 618.1
      inv_txfm_add_4x8_flipadst_identity_1_12bpc_avx2: 44.0
      inv_txfm_add_4x8_identity_adst_0_12bpc_c: 829.7
      inv_txfm_add_4x8_identity_adst_0_12bpc_avx2: 107.4
      inv_txfm_add_4x8_identity_adst_1_12bpc_c: 831.7
      inv_txfm_add_4x8_identity_adst_1_12bpc_avx2: 107.8
      inv_txfm_add_4x8_identity_dct_0_12bpc_c: 823.2
      inv_txfm_add_4x8_identity_dct_0_12bpc_avx2: 90.7
      inv_txfm_add_4x8_identity_dct_1_12bpc_c: 824.1
      inv_txfm_add_4x8_identity_dct_1_12bpc_avx2: 90.7
      inv_txfm_add_4x8_identity_flipadst_0_12bpc_c: 853.4
      inv_txfm_add_4x8_identity_flipadst_0_12bpc_avx2: 106.8
      inv_txfm_add_4x8_identity_flipadst_1_12bpc_c: 852.2
      inv_txfm_add_4x8_identity_flipadst_1_12bpc_avx2: 106.8
      inv_txfm_add_4x8_identity_identity_0_12bpc_c: 543.2
      inv_txfm_add_4x8_identity_identity_0_12bpc_avx2: 36.4
      inv_txfm_add_4x8_identity_identity_1_12bpc_c: 544.8
      inv_txfm_add_4x8_identity_identity_1_12bpc_avx2: 36.6
      241753f5
  5. Nov 12, 2021
  6. Nov 11, 2021
  7. Nov 10, 2021
  8. Nov 05, 2021
  9. Nov 02, 2021
  10. Nov 01, 2021
  11. Oct 31, 2021
  12. Oct 29, 2021
  13. Oct 28, 2021
    • Martin Storsjö's avatar
      meson: Check for the pthread_getaffinity_np function before deciding to use it · 8c94f95c
      Martin Storsjö authored
      Use the check result instead of hardcoding what OSes have got the
      function.
      
      This also requires checking for the pthread_np.h header and including
      it while testing for functions in meson, but allows getting rid of the
      hardcoded OS conditions in the source.
      
      This fixes building for Android, if _GNU_SOURCE happens to be defined.
      (It gets defined if building with a slightly nonstandard cross file
      that defines "system = 'linux'", but it could also have been set by the
      caller.)
      8c94f95c
  14. Oct 27, 2021
  15. Oct 18, 2021
    • Matthias Dressel's avatar
      x86/itx: Add 12-bit 4x4 transforms in AVX2 · eb0308bc
      Matthias Dressel authored
      Refactors itx into separate 10, 12 bit functions to prevent conditional
      jumps.
      
      inv_txfm_add_4x4_adst_adst_0_12bpc_c: 370.9
      inv_txfm_add_4x4_adst_adst_0_12bpc_avx2: 68.6
      inv_txfm_add_4x4_adst_adst_1_12bpc_c: 371.0
      inv_txfm_add_4x4_adst_adst_1_12bpc_avx2: 68.7
      inv_txfm_add_4x4_adst_dct_0_12bpc_c: 413.1
      inv_txfm_add_4x4_adst_dct_0_12bpc_avx2: 69.2
      inv_txfm_add_4x4_adst_dct_1_12bpc_c: 412.7
      inv_txfm_add_4x4_adst_dct_1_12bpc_avx2: 68.8
      inv_txfm_add_4x4_adst_flipadst_0_12bpc_c: 378.5
      inv_txfm_add_4x4_adst_flipadst_0_12bpc_avx2: 74.9
      inv_txfm_add_4x4_adst_flipadst_1_12bpc_c: 378.1
      inv_txfm_add_4x4_adst_flipadst_1_12bpc_avx2: 74.6
      inv_txfm_add_4x4_adst_identity_0_12bpc_c: 347.8
      inv_txfm_add_4x4_adst_identity_0_12bpc_avx2: 48.8
      inv_txfm_add_4x4_adst_identity_1_12bpc_c: 342.7
      inv_txfm_add_4x4_adst_identity_1_12bpc_avx2: 49.0
      inv_txfm_add_4x4_dct_adst_0_12bpc_c: 399.2
      inv_txfm_add_4x4_dct_adst_0_12bpc_avx2: 73.1
      inv_txfm_add_4x4_dct_adst_1_12bpc_c: 398.7
      inv_txfm_add_4x4_dct_adst_1_12bpc_avx2: 72.2
      inv_txfm_add_4x4_dct_dct_0_12bpc_c: 69.6
      inv_txfm_add_4x4_dct_dct_0_12bpc_avx2: 32.9
      inv_txfm_add_4x4_dct_dct_1_12bpc_c: 420.5
      inv_txfm_add_4x4_dct_dct_1_12bpc_avx2: 72.2
      inv_txfm_add_4x4_dct_flipadst_0_12bpc_c: 405.5
      inv_txfm_add_4x4_dct_flipadst_0_12bpc_avx2: 75.9
      inv_txfm_add_4x4_dct_flipadst_1_12bpc_c: 404.2
      inv_txfm_add_4x4_dct_flipadst_1_12bpc_avx2: 75.6
      inv_txfm_add_4x4_dct_identity_0_12bpc_c: 374.1
      inv_txfm_add_4x4_dct_identity_0_12bpc_avx2: 51.6
      inv_txfm_add_4x4_dct_identity_1_12bpc_c: 368.0
      inv_txfm_add_4x4_dct_identity_1_12bpc_avx2: 51.8
      inv_txfm_add_4x4_flipadst_adst_0_12bpc_c: 368.0
      inv_txfm_add_4x4_flipadst_adst_0_12bpc_avx2: 69.2
      inv_txfm_add_4x4_flipadst_adst_1_12bpc_c: 370.7
      inv_txfm_add_4x4_flipadst_adst_1_12bpc_avx2: 70.4
      inv_txfm_add_4x4_flipadst_dct_0_12bpc_c: 393.7
      inv_txfm_add_4x4_flipadst_dct_0_12bpc_avx2: 70.1
      inv_txfm_add_4x4_flipadst_dct_1_12bpc_c: 392.9
      inv_txfm_add_4x4_flipadst_dct_1_12bpc_avx2: 69.6
      inv_txfm_add_4x4_flipadst_flipadst_0_12bpc_c: 382.2
      inv_txfm_add_4x4_flipadst_flipadst_0_12bpc_avx2: 74.6
      inv_txfm_add_4x4_flipadst_flipadst_1_12bpc_c: 381.3
      inv_txfm_add_4x4_flipadst_flipadst_1_12bpc_avx2: 74.9
      inv_txfm_add_4x4_flipadst_identity_0_12bpc_c: 346.7
      inv_txfm_add_4x4_flipadst_identity_0_12bpc_avx2: 48.2
      inv_txfm_add_4x4_flipadst_identity_1_12bpc_c: 347.9
      inv_txfm_add_4x4_flipadst_identity_1_12bpc_avx2: 48.7
      inv_txfm_add_4x4_identity_adst_0_12bpc_c: 344.7
      inv_txfm_add_4x4_identity_adst_0_12bpc_avx2: 59.8
      inv_txfm_add_4x4_identity_adst_1_12bpc_c: 340.5
      inv_txfm_add_4x4_identity_adst_1_12bpc_avx2: 59.2
      inv_txfm_add_4x4_identity_dct_0_12bpc_c: 369.8
      inv_txfm_add_4x4_identity_dct_0_12bpc_avx2: 59.3
      inv_txfm_add_4x4_identity_dct_1_12bpc_c: 369.5
      inv_txfm_add_4x4_identity_dct_1_12bpc_avx2: 59.2
      inv_txfm_add_4x4_identity_flipadst_0_12bpc_c: 353.4
      inv_txfm_add_4x4_identity_flipadst_0_12bpc_avx2: 65.6
      inv_txfm_add_4x4_identity_flipadst_1_12bpc_c: 350.9
      inv_txfm_add_4x4_identity_flipadst_1_12bpc_avx2: 65.9
      inv_txfm_add_4x4_identity_identity_0_12bpc_c: 326.1
      inv_txfm_add_4x4_identity_identity_0_12bpc_avx2: 39.5
      inv_txfm_add_4x4_identity_identity_1_12bpc_c: 321.6
      inv_txfm_add_4x4_identity_identity_1_12bpc_avx2: 39.5
      eb0308bc
    • Matthias Dressel's avatar
      x86/itx: Rename rax to r6 · 4cdfe691
      Matthias Dressel authored
      Use numerical GPR references everywhere for consistency.
      4cdfe691
    • Matthias Dressel's avatar
      x86/itx: Name constants more explicit · 1ea40afd
      Matthias Dressel authored
      Give some constants a more explicit name to avoid confusion when 12bpc
      support is added.
      1ea40afd
    • Henrik Gramner's avatar
      x86: Add splat_mv AVX-512 (Ice Lake) asm · 8baea7b1
      Henrik Gramner authored
      8baea7b1
    • Victorien Le Couviour--Tuffet's avatar
      82d6d950
    • Henrik Gramner's avatar
      x86: Add sgr AVX-512 (Ice Lake) asm · 05682126
      Henrik Gramner authored
      05682126
    • Henrik Gramner's avatar
      bf0f4690
    • Henrik Gramner's avatar
      ef216e17
    • Henrik Gramner's avatar
    • Henrik Gramner's avatar
      5740c1d6
    • Henrik Gramner's avatar
      x86: Maintain frame thread coefficient buffer alignment · bddef4e0
      Henrik Gramner authored
      Realign the buffer if neccessary to maintain 64-byte alignment.
      bddef4e0
    • Henrik Gramner's avatar
      x86: Add blend AVX-512 (Ice Lake) asm · c19e0a98
      Henrik Gramner authored
      Also make some minor optimizations to the AVX2 asm.
      c19e0a98
    • Henrik Gramner's avatar
      822d00ae
    • Henrik Gramner's avatar
      x86: Add mc 8-tap AVX-512 (Ice Lake) asm · f7624f99
      Henrik Gramner authored
      f7624f99
    • Henrik Gramner's avatar
      8fc719c6
Loading