Skip to content
Snippets Groups Projects
  1. Oct 17, 2024
  2. Oct 16, 2024
    • Nathan E. Egge's avatar
      NEWS: add itx to riscv list · c3fa1db3
      Nathan E. Egge authored
      c3fa1db3
    • Nathan E. Egge's avatar
      riscv64/itx: Replace vwadd+vnsra with vnclip · 789a1f65
      Nathan E. Egge authored
      The vnclip instruction does a fixed-point saturating add then shift and
       can replace vwadd followed by vnsra in idct_4, idct_8, idct_16, iadst_8
       and iadst_16.
      Including 572c5a66 (which applies the same change to iadst_4) these
       commits give the following average improvements across all modified 2D
       transform functions:
      
                Kendryte K230     SpacemiT K1
      
         4x4       -5.50%           -4.44%
         8x8       -9.78%           -7.62%
        16x16      -9.70%           -9.04%
         4x8       -8.39%           -7.54%
         8x4       -8.10%           -4.66%
         4x16      -8.16%           -7.74%
        16x4       -8.07%           -6.96%
         8x16      -9.11%           -7.43%
        16x8       -9.87%           -7.81%
      
      Kendryte K230                                      Old     New     Delta
      
      inv_txfm_add_4x4_adst_adst_0_8bpc_rvv              99.0    93.4   -5.66%
      inv_txfm_add_4x4_adst_adst_1_8bpc_rvv              99.0    93.4   -5.66%
      inv_txfm_add_4x4_adst_dct_0_8bpc_rvv               93.4    87.2   -6.64%
      inv_txfm_add_4x4_adst_dct_1_8bpc_rvv               93.5    87.2   -6.74%
      inv_txfm_add_4x4_adst_flipadst_0_8bpc_rvv         100.3    94.9   -5.38%
      inv_txfm_add_4x4_adst_flipadst_1_8bpc_rvv         100.3    94.9   -5.38%
      inv_txfm_add_4x4_adst_identity_0_8bpc_rvv          80.5    77.2   -4.10%
      inv_txfm_add_4x4_adst_identity_1_8bpc_rvv          80.5    77.2   -4.10%
      inv_txfm_add_4x4_dct_adst_0_8bpc_rvv               94.1    88.5   -5.95%
      inv_txfm_add_4x4_dct_adst_1_8bpc_rvv               94.1    88.5   -5.95%
      inv_txfm_add_4x4_dct_dct_0_8bpc_rvv                40.3    40.3    0.00%
      inv_txfm_add_4x4_dct_dct_1_8bpc_rvv                92.2    82.1  -10.95%
      inv_txfm_add_4x4_dct_flipadst_0_8bpc_rvv           95.3    89.9   -5.67%
      inv_txfm_add_4x4_dct_flipadst_1_8bpc_rvv           95.3    89.9   -5.67%
      inv_txfm_add_4x4_dct_identity_0_8bpc_rvv           75.5    73.3   -2.91%
      inv_txfm_add_4x4_dct_identity_1_8bpc_rvv           75.5    73.3   -2.91%
      inv_txfm_add_4x4_flipadst_adst_0_8bpc_rvv         100.3    94.7   -5.58%
      inv_txfm_add_4x4_flipadst_adst_1_8bpc_rvv         100.3    94.7   -5.58%
      inv_txfm_add_4x4_flipadst_dct_0_8bpc_rvv           94.8    88.4   -6.75%
      inv_txfm_add_4x4_flipadst_dct_1_8bpc_rvv           94.8    88.5   -6.65%
      inv_txfm_add_4x4_flipadst_flipadst_0_8bpc_rvv     105.0    96.0   -8.57%
      inv_txfm_add_4x4_flipadst_flipadst_1_8bpc_rvv     105.0    95.9   -8.67%
      inv_txfm_add_4x4_flipadst_identity_0_8bpc_rvv      81.6    78.5   -3.80%
      inv_txfm_add_4x4_flipadst_identity_1_8bpc_rvv      81.6    78.4   -3.92%
      inv_txfm_add_4x4_identity_adst_0_8bpc_rvv          80.3    77.8   -3.11%
      inv_txfm_add_4x4_identity_adst_1_8bpc_rvv          80.3    77.8   -3.11%
      inv_txfm_add_4x4_identity_dct_0_8bpc_rvv           77.2    71.7   -7.12%
      inv_txfm_add_4x4_identity_dct_1_8bpc_rvv           77.2    71.7   -7.12%
      inv_txfm_add_4x4_identity_flipadst_0_8bpc_rvv      81.5    79.2   -2.82%
      inv_txfm_add_4x4_identity_flipadst_1_8bpc_rvv      81.6    79.2   -2.94%
      inv_txfm_add_4x4_identity_identity_0_8bpc_rvv      62.8    61.6   -1.91%
      inv_txfm_add_4x4_identity_identity_1_8bpc_rvv      62.8    61.6   -1.91%
      inv_txfm_add_4x4_wht_wht_0_8bpc_rvv                67.8    67.8    0.00%
      inv_txfm_add_4x4_wht_wht_1_8bpc_rvv                67.8    67.8    0.00%
      
      inv_txfm_add_8x8_adst_adst_0_8bpc_rvv             403.1   356.1  -11.66%
      inv_txfm_add_8x8_adst_adst_1_8bpc_rvv             403.1   356.0  -11.68%
      inv_txfm_add_8x8_adst_dct_0_8bpc_rvv              360.2   323.2  -10.27%
      inv_txfm_add_8x8_adst_dct_1_8bpc_rvv              360.2   323.2  -10.27%
      inv_txfm_add_8x8_adst_flipadst_0_8bpc_rvv         405.2   358.4  -11.55%
      inv_txfm_add_8x8_adst_flipadst_1_8bpc_rvv         405.2   358.4  -11.55%
      inv_txfm_add_8x8_adst_identity_0_8bpc_rvv         284.3   261.0   -8.20%
      inv_txfm_add_8x8_adst_identity_1_8bpc_rvv         284.4   260.9   -8.26%
      inv_txfm_add_8x8_dct_adst_0_8bpc_rvv              360.2   322.0  -10.61%
      inv_txfm_add_8x8_dct_adst_1_8bpc_rvv              360.0   321.9  -10.58%
      inv_txfm_add_8x8_dct_dct_0_8bpc_rvv                76.6    77.0    0.52%
      inv_txfm_add_8x8_dct_dct_1_8bpc_rvv               317.2   289.0   -8.89%
      inv_txfm_add_8x8_dct_flipadst_0_8bpc_rvv          363.7   324.3  -10.83%
      inv_txfm_add_8x8_dct_flipadst_1_8bpc_rvv          363.8   324.3  -10.86%
      inv_txfm_add_8x8_dct_identity_0_8bpc_rvv          241.2   226.9   -5.93%
      inv_txfm_add_8x8_dct_identity_1_8bpc_rvv          241.3   227.0   -5.93%
      inv_txfm_add_8x8_flipadst_adst_0_8bpc_rvv         404.9   358.0  -11.58%
      inv_txfm_add_8x8_flipadst_adst_1_8bpc_rvv         405.0   358.1  -11.58%
      inv_txfm_add_8x8_flipadst_dct_0_8bpc_rvv          365.1   323.8  -11.31%
      inv_txfm_add_8x8_flipadst_dct_1_8bpc_rvv          365.2   323.9  -11.31%
      inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_rvv     407.2   359.6  -11.69%
      inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_rvv     406.4   359.5  -11.54%
      inv_txfm_add_8x8_flipadst_identity_0_8bpc_rvv     285.8   261.9   -8.36%
      inv_txfm_add_8x8_flipadst_identity_1_8bpc_rvv     285.9   261.8   -8.43%
      inv_txfm_add_8x8_identity_adst_0_8bpc_rvv         269.9   244.5   -9.41%
      inv_txfm_add_8x8_identity_adst_1_8bpc_rvv         269.8   244.5   -9.38%
      inv_txfm_add_8x8_identity_dct_0_8bpc_rvv          225.5   209.6   -7.05%
      inv_txfm_add_8x8_identity_dct_1_8bpc_rvv          225.6   209.5   -7.14%
      inv_txfm_add_8x8_identity_flipadst_0_8bpc_rvv     270.5   246.5   -8.87%
      inv_txfm_add_8x8_identity_flipadst_1_8bpc_rvv     270.5   246.5   -8.87%
      inv_txfm_add_8x8_identity_identity_0_8bpc_rvv     146.5   145.4   -0.75%
      inv_txfm_add_8x8_identity_identity_1_8bpc_rvv     146.4   145.4   -0.68%
      
      inv_txfm_add_16x16_adst_adst_0_8bpc_rvv          1363.4  1212.0  -11.10%
      inv_txfm_add_16x16_adst_adst_1_8bpc_rvv          1363.6  1212.2  -11.10%
      inv_txfm_add_16x16_adst_adst_2_8bpc_rvv          1813.7  1601.4  -11.71%
      inv_txfm_add_16x16_adst_dct_0_8bpc_rvv           1185.9  1074.6   -9.39%
      inv_txfm_add_16x16_adst_dct_1_8bpc_rvv           1186.0  1074.7   -9.38%
      inv_txfm_add_16x16_adst_dct_2_8bpc_rvv           1639.5  1468.9  -10.41%
      inv_txfm_add_16x16_adst_flipadst_0_8bpc_rvv      1374.8  1214.8  -11.64%
      inv_txfm_add_16x16_adst_flipadst_1_8bpc_rvv      1374.7  1214.6  -11.65%
      inv_txfm_add_16x16_adst_flipadst_2_8bpc_rvv      1819.3  1610.9  -11.45%
      inv_txfm_add_16x16_dct_adst_0_8bpc_rvv           1283.3  1139.1  -11.24%
      inv_txfm_add_16x16_dct_adst_1_8bpc_rvv           1283.2  1139.2  -11.22%
      inv_txfm_add_16x16_dct_adst_2_8bpc_rvv           1632.4  1471.9   -9.83%
      inv_txfm_add_16x16_dct_dct_0_8bpc_rvv             160.9   158.7   -1.37%
      inv_txfm_add_16x16_dct_dct_1_8bpc_rvv            1099.5   997.1   -9.31%
      inv_txfm_add_16x16_dct_dct_2_8bpc_rvv            1465.3  1335.2   -8.88%
      inv_txfm_add_16x16_dct_flipadst_0_8bpc_rvv       1286.8  1143.2  -11.16%
      inv_txfm_add_16x16_dct_flipadst_1_8bpc_rvv       1286.8  1143.3  -11.15%
      inv_txfm_add_16x16_dct_flipadst_2_8bpc_rvv       1638.6  1473.5  -10.08%
      inv_txfm_add_16x16_dct_identity_0_8bpc_rvv        806.6   783.3   -2.89%
      inv_txfm_add_16x16_dct_identity_1_8bpc_rvv        806.7   783.4   -2.89%
      inv_txfm_add_16x16_dct_identity_2_8bpc_rvv       1163.1  1105.3   -4.97%
      inv_txfm_add_16x16_flipadst_adst_0_8bpc_rvv      1374.3  1216.0  -11.52%
      inv_txfm_add_16x16_flipadst_adst_1_8bpc_rvv      1374.3  1216.2  -11.50%
      inv_txfm_add_16x16_flipadst_adst_2_8bpc_rvv      1817.5  1609.7  -11.43%
      inv_txfm_add_16x16_flipadst_dct_0_8bpc_rvv       1190.4  1073.8   -9.80%
      inv_txfm_add_16x16_flipadst_dct_1_8bpc_rvv       1190.4  1073.9   -9.79%
      inv_txfm_add_16x16_flipadst_dct_2_8bpc_rvv       1640.4  1472.6  -10.23%
      inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_rvv  1376.0  1224.2  -11.03%
      inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_rvv  1376.0  1224.1  -11.04%
      inv_txfm_add_16x16_flipadst_flipadst_2_8bpc_rvv  1829.3  1616.6  -11.63%
      inv_txfm_add_16x16_identity_dct_0_8bpc_rvv        952.9   882.0   -7.44%
      inv_txfm_add_16x16_identity_dct_1_8bpc_rvv        952.8   881.9   -7.44%
      inv_txfm_add_16x16_identity_dct_2_8bpc_rvv       1172.0  1100.1   -6.13%
      inv_txfm_add_16x16_identity_identity_0_8bpc_rvv   657.6   659.8    0.33%
      inv_txfm_add_16x16_identity_identity_1_8bpc_rvv   657.6   659.7    0.32%
      inv_txfm_add_16x16_identity_identity_2_8bpc_rvv   876.2   878.1    0.22%
      
      inv_txfm_add_4x8_adst_adst_0_8bpc_rvv             197.3   178.0   -9.78%
      inv_txfm_add_4x8_adst_adst_1_8bpc_rvv             197.4   178.0   -9.83%
      inv_txfm_add_4x8_adst_dct_0_8bpc_rvv              174.9   159.9   -8.58%
      inv_txfm_add_4x8_adst_dct_1_8bpc_rvv              174.9   159.9   -8.58%
      inv_txfm_add_4x8_adst_flipadst_0_8bpc_rvv         199.2   180.2   -9.54%
      inv_txfm_add_4x8_adst_flipadst_1_8bpc_rvv         199.2   180.2   -9.54%
      inv_txfm_add_4x8_adst_identity_0_8bpc_rvv         123.3   118.0   -4.30%
      inv_txfm_add_4x8_adst_identity_1_8bpc_rvv         123.3   118.0   -4.30%
      inv_txfm_add_4x8_dct_adst_0_8bpc_rvv              191.1   171.8  -10.10%
      inv_txfm_add_4x8_dct_adst_1_8bpc_rvv              191.1   171.7  -10.15%
      inv_txfm_add_4x8_dct_dct_0_8bpc_rvv               168.9   153.6   -9.06%
      inv_txfm_add_4x8_dct_dct_1_8bpc_rvv               169.0   153.6   -9.11%
      inv_txfm_add_4x8_dct_flipadst_0_8bpc_rvv          193.0   173.9   -9.90%
      inv_txfm_add_4x8_dct_flipadst_1_8bpc_rvv          193.0   173.9   -9.90%
      inv_txfm_add_4x8_dct_identity_0_8bpc_rvv          117.0   111.7   -4.53%
      inv_txfm_add_4x8_dct_identity_1_8bpc_rvv          117.0   111.7   -4.53%
      inv_txfm_add_4x8_flipadst_adst_0_8bpc_rvv         198.0   178.6   -9.80%
      inv_txfm_add_4x8_flipadst_adst_1_8bpc_rvv         198.0   178.6   -9.80%
      inv_txfm_add_4x8_flipadst_dct_0_8bpc_rvv          175.8   160.5   -8.70%
      inv_txfm_add_4x8_flipadst_dct_1_8bpc_rvv          175.8   160.5   -8.70%
      inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_rvv     199.9   180.5   -9.70%
      inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_rvv     199.9   180.5   -9.70%
      inv_txfm_add_4x8_flipadst_identity_0_8bpc_rvv     123.6   118.6   -4.05%
      inv_txfm_add_4x8_flipadst_identity_1_8bpc_rvv     123.6   118.6   -4.05%
      inv_txfm_add_4x8_identity_adst_0_8bpc_rvv         171.3   154.2   -9.98%
      inv_txfm_add_4x8_identity_adst_1_8bpc_rvv         171.3   154.2   -9.98%
      inv_txfm_add_4x8_identity_dct_0_8bpc_rvv          148.6   136.5   -8.14%
      inv_txfm_add_4x8_identity_dct_1_8bpc_rvv          148.6   136.5   -8.14%
      inv_txfm_add_4x8_identity_flipadst_0_8bpc_rvv     173.1   156.4   -9.65%
      inv_txfm_add_4x8_identity_flipadst_1_8bpc_rvv     173.2   156.4   -9.70%
      inv_txfm_add_4x8_identity_identity_0_8bpc_rvv      94.3    94.2   -0.11%
      inv_txfm_add_4x8_identity_identity_1_8bpc_rvv      94.2    94.2    0.00%
      
      inv_txfm_add_8x4_adst_adst_0_8bpc_rvv             201.2   188.4   -6.36%
      inv_txfm_add_8x4_adst_adst_1_8bpc_rvv             201.2   188.4   -6.36%
      inv_txfm_add_8x4_adst_dct_0_8bpc_rvv              194.9   175.7   -9.85%
      inv_txfm_add_8x4_adst_dct_1_8bpc_rvv              194.9   175.7   -9.85%
      inv_txfm_add_8x4_adst_flipadst_0_8bpc_rvv         202.4   182.3   -9.93%
      inv_txfm_add_8x4_adst_flipadst_1_8bpc_rvv         202.4   182.3   -9.93%
      inv_txfm_add_8x4_adst_identity_0_8bpc_rvv         170.1   155.7   -8.47%
      inv_txfm_add_8x4_adst_identity_1_8bpc_rvv         170.1   155.7   -8.47%
      inv_txfm_add_8x4_dct_adst_0_8bpc_rvv              178.0   162.1   -8.93%
      inv_txfm_add_8x4_dct_adst_1_8bpc_rvv              178.0   162.1   -8.93%
      inv_txfm_add_8x4_dct_dct_0_8bpc_rvv               172.8   157.0   -9.14%
      inv_txfm_add_8x4_dct_dct_1_8bpc_rvv               172.9   157.0   -9.20%
      inv_txfm_add_8x4_dct_flipadst_0_8bpc_rvv          180.3   163.7   -9.21%
      inv_txfm_add_8x4_dct_flipadst_1_8bpc_rvv          180.3   163.7   -9.21%
      inv_txfm_add_8x4_dct_identity_0_8bpc_rvv          147.9   137.9   -6.76%
      inv_txfm_add_8x4_dct_identity_1_8bpc_rvv          147.9   137.9   -6.76%
      inv_txfm_add_8x4_flipadst_adst_0_8bpc_rvv         202.4   182.3   -9.93%
      inv_txfm_add_8x4_flipadst_adst_1_8bpc_rvv         202.4   182.3   -9.93%
      inv_txfm_add_8x4_flipadst_dct_0_8bpc_rvv          196.3   175.9  -10.39%
      inv_txfm_add_8x4_flipadst_dct_1_8bpc_rvv          196.3   175.9  -10.39%
      inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_rvv     203.7   183.4   -9.97%
      inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_rvv     203.7   183.4   -9.97%
      inv_txfm_add_8x4_flipadst_identity_0_8bpc_rvv     171.1   155.9   -8.88%
      inv_txfm_add_8x4_flipadst_identity_1_8bpc_rvv     171.1   155.9   -8.88%
      inv_txfm_add_8x4_identity_adst_0_8bpc_rvv         126.8   120.9   -4.65%
      inv_txfm_add_8x4_identity_adst_1_8bpc_rvv         126.8   120.9   -4.65%
      inv_txfm_add_8x4_identity_dct_0_8bpc_rvv          121.5   117.0   -3.70%
      inv_txfm_add_8x4_identity_dct_1_8bpc_rvv          121.6   117.0   -3.78%
      inv_txfm_add_8x4_identity_flipadst_0_8bpc_rvv     129.1   122.3   -5.27%
      inv_txfm_add_8x4_identity_flipadst_1_8bpc_rvv     129.1   122.3   -5.27%
      inv_txfm_add_8x4_identity_identity_0_8bpc_rvv      98.5    95.7   -2.84%
      inv_txfm_add_8x4_identity_identity_1_8bpc_rvv      98.5    95.7   -2.84%
      
      inv_txfm_add_4x16_adst_adst_0_8bpc_rvv            384.4   344.6  -10.35%
      inv_txfm_add_4x16_adst_adst_1_8bpc_rvv            384.5   344.6  -10.38%
      inv_txfm_add_4x16_adst_adst_2_8bpc_rvv            429.3   387.3   -9.78%
      inv_txfm_add_4x16_adst_dct_0_8bpc_rvv             333.7   304.3   -8.81%
      inv_txfm_add_4x16_adst_dct_1_8bpc_rvv             333.7   304.2   -8.84%
      inv_txfm_add_4x16_adst_dct_2_8bpc_rvv             381.2   354.2   -7.08%
      inv_txfm_add_4x16_adst_flipadst_0_8bpc_rvv        385.7   349.1   -9.49%
      inv_txfm_add_4x16_adst_flipadst_1_8bpc_rvv        385.7   349.1   -9.49%
      inv_txfm_add_4x16_adst_flipadst_2_8bpc_rvv        433.0   389.3  -10.09%
      inv_txfm_add_4x16_adst_identity_0_8bpc_rvv        251.6   244.2   -2.94%
      inv_txfm_add_4x16_adst_identity_1_8bpc_rvv        251.5   244.1   -2.94%
      inv_txfm_add_4x16_adst_identity_2_8bpc_rvv        300.4   289.6   -3.60%
      inv_txfm_add_4x16_dct_adst_0_8bpc_rvv             378.5   335.6  -11.33%
      inv_txfm_add_4x16_dct_adst_1_8bpc_rvv             378.5   335.5  -11.36%
      inv_txfm_add_4x16_dct_adst_2_8bpc_rvv             420.6   369.5  -12.15%
      inv_txfm_add_4x16_dct_dct_0_8bpc_rvv              323.5   295.3   -8.72%
      inv_txfm_add_4x16_dct_dct_1_8bpc_rvv              323.2   295.2   -8.66%
      inv_txfm_add_4x16_dct_dct_2_8bpc_rvv              362.9   333.0   -8.24%
      inv_txfm_add_4x16_dct_flipadst_0_8bpc_rvv         375.3   339.4   -9.57%
      inv_txfm_add_4x16_dct_flipadst_1_8bpc_rvv         375.4   339.0   -9.70%
      inv_txfm_add_4x16_dct_flipadst_2_8bpc_rvv         414.8   372.2  -10.27%
      inv_txfm_add_4x16_dct_identity_0_8bpc_rvv         240.8   234.7   -2.53%
      inv_txfm_add_4x16_dct_identity_1_8bpc_rvv         240.7   234.7   -2.49%
      inv_txfm_add_4x16_dct_identity_2_8bpc_rvv         283.2   268.0   -5.37%
      inv_txfm_add_4x16_flipadst_adst_0_8bpc_rvv        384.2   345.8   -9.99%
      inv_txfm_add_4x16_flipadst_adst_1_8bpc_rvv        384.1   345.8   -9.97%
      inv_txfm_add_4x16_flipadst_adst_2_8bpc_rvv        432.5   387.7  -10.36%
      inv_txfm_add_4x16_flipadst_dct_0_8bpc_rvv         334.9   307.0   -8.33%
      inv_txfm_add_4x16_flipadst_dct_1_8bpc_rvv         335.0   307.1   -8.33%
      inv_txfm_add_4x16_flipadst_dct_2_8bpc_rvv         386.1   347.2  -10.08%
      inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_rvv    386.7   349.4   -9.65%
      inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_rvv    386.8   349.5   -9.64%
      inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_rvv    436.6   392.9  -10.01%
      inv_txfm_add_4x16_flipadst_identity_0_8bpc_rvv    252.4   247.4   -1.98%
      inv_txfm_add_4x16_flipadst_identity_1_8bpc_rvv    252.4   247.5   -1.94%
      inv_txfm_add_4x16_flipadst_identity_2_8bpc_rvv    302.1   286.7   -5.10%
      inv_txfm_add_4x16_identity_adst_0_8bpc_rvv        348.3   317.4   -8.87%
      inv_txfm_add_4x16_identity_adst_1_8bpc_rvv        348.4   317.5   -8.87%
      inv_txfm_add_4x16_identity_adst_2_8bpc_rvv        361.4   329.0   -8.97%
      inv_txfm_add_4x16_identity_dct_0_8bpc_rvv         301.8   275.8   -8.61%
      inv_txfm_add_4x16_identity_dct_1_8bpc_rvv         301.8   275.8   -8.61%
      inv_txfm_add_4x16_identity_dct_2_8bpc_rvv         312.0   287.4   -7.88%
      inv_txfm_add_4x16_identity_flipadst_0_8bpc_rvv    352.2   321.9   -8.60%
      inv_txfm_add_4x16_identity_flipadst_1_8bpc_rvv    352.2   322.0   -8.57%
      inv_txfm_add_4x16_identity_flipadst_2_8bpc_rvv    363.7   332.5   -8.58%
      inv_txfm_add_4x16_identity_identity_0_8bpc_rvv    215.8   215.0   -0.37%
      inv_txfm_add_4x16_identity_identity_1_8bpc_rvv    215.8   215.1   -0.32%
      inv_txfm_add_4x16_identity_identity_2_8bpc_rvv    228.0   227.0   -0.44%
      
      inv_txfm_add_16x4_adst_adst_0_8bpc_rvv            430.3   388.5   -9.71%
      inv_txfm_add_16x4_adst_adst_1_8bpc_rvv            430.3   388.5   -9.71%
      inv_txfm_add_16x4_adst_adst_2_8bpc_rvv            430.2   388.5   -9.69%
      inv_txfm_add_16x4_adst_dct_0_8bpc_rvv             412.1   374.1   -9.22%
      inv_txfm_add_16x4_adst_dct_1_8bpc_rvv             412.0   374.3   -9.15%
      inv_txfm_add_16x4_adst_dct_2_8bpc_rvv             412.1   374.2   -9.20%
      inv_txfm_add_16x4_adst_flipadst_0_8bpc_rvv        432.9   391.0   -9.68%
      inv_txfm_add_16x4_adst_flipadst_1_8bpc_rvv        432.8   391.1   -9.63%
      inv_txfm_add_16x4_adst_flipadst_2_8bpc_rvv        432.4   391.0   -9.57%
      inv_txfm_add_16x4_adst_identity_0_8bpc_rvv        358.4   332.1   -7.34%
      inv_txfm_add_16x4_adst_identity_1_8bpc_rvv        358.4   332.3   -7.28%
      inv_txfm_add_16x4_adst_identity_2_8bpc_rvv        358.5   332.5   -7.25%
      inv_txfm_add_16x4_dct_adst_0_8bpc_rvv             386.9   347.1  -10.29%
      inv_txfm_add_16x4_dct_adst_1_8bpc_rvv             386.8   347.1  -10.26%
      inv_txfm_add_16x4_dct_adst_2_8bpc_rvv             387.0   346.8  -10.39%
      inv_txfm_add_16x4_dct_dct_0_8bpc_rvv              363.3   330.9   -8.92%
      inv_txfm_add_16x4_dct_dct_1_8bpc_rvv              363.3   330.9   -8.92%
      inv_txfm_add_16x4_dct_dct_2_8bpc_rvv              363.2   331.0   -8.87%
      inv_txfm_add_16x4_dct_flipadst_0_8bpc_rvv         383.7   349.8   -8.84%
      inv_txfm_add_16x4_dct_flipadst_1_8bpc_rvv         384.3   349.8   -8.98%
      inv_txfm_add_16x4_dct_flipadst_2_8bpc_rvv         384.3   349.7   -9.00%
      inv_txfm_add_16x4_dct_identity_0_8bpc_rvv         310.2   288.4   -7.03%
      inv_txfm_add_16x4_dct_identity_1_8bpc_rvv         310.2   288.4   -7.03%
      inv_txfm_add_16x4_dct_identity_2_8bpc_rvv         310.3   288.5   -7.03%
      inv_txfm_add_16x4_flipadst_adst_0_8bpc_rvv        434.1   391.5   -9.81%
      inv_txfm_add_16x4_flipadst_adst_1_8bpc_rvv        434.1   392.0   -9.70%
      inv_txfm_add_16x4_flipadst_adst_2_8bpc_rvv        434.1   392.0   -9.70%
      inv_txfm_add_16x4_flipadst_dct_0_8bpc_rvv         423.5   375.5  -11.33%
      inv_txfm_add_16x4_flipadst_dct_1_8bpc_rvv         423.5   375.4  -11.36%
      inv_txfm_add_16x4_flipadst_dct_2_8bpc_rvv         423.5   375.5  -11.33%
      inv_txfm_add_16x4_flipadst_flipadst_0_8bpc_rvv    438.0   396.1   -9.57%
      inv_txfm_add_16x4_flipadst_flipadst_1_8bpc_rvv    438.1   396.0   -9.61%
      inv_txfm_add_16x4_flipadst_flipadst_2_8bpc_rvv    438.0   395.8   -9.63%
      inv_txfm_add_16x4_flipadst_identity_0_8bpc_rvv    361.9   333.0   -7.99%
      inv_txfm_add_16x4_flipadst_identity_1_8bpc_rvv    362.4   333.0   -8.11%
      inv_txfm_add_16x4_flipadst_identity_2_8bpc_rvv    362.4   333.0   -8.11%
      inv_txfm_add_16x4_identity_adst_0_8bpc_rvv        308.3   296.3   -3.89%
      inv_txfm_add_16x4_identity_adst_1_8bpc_rvv        308.4   296.4   -3.89%
      inv_txfm_add_16x4_identity_adst_2_8bpc_rvv        308.4   296.4   -3.89%
      inv_txfm_add_16x4_identity_dct_0_8bpc_rvv         289.9   279.9   -3.45%
      inv_txfm_add_16x4_identity_dct_1_8bpc_rvv         289.9   280.0   -3.41%
      inv_txfm_add_16x4_identity_dct_2_8bpc_rvv         290.0   279.9   -3.48%
      inv_txfm_add_16x4_identity_flipadst_0_8bpc_rvv    311.2   298.9   -3.95%
      inv_txfm_add_16x4_identity_flipadst_1_8bpc_rvv    311.1   298.9   -3.92%
      inv_txfm_add_16x4_identity_flipadst_2_8bpc_rvv    310.9   298.9   -3.86%
      inv_txfm_add_16x4_identity_identity_0_8bpc_rvv    238.4   243.2    2.01%
      inv_txfm_add_16x4_identity_identity_1_8bpc_rvv    238.4   243.2    2.01%
      inv_txfm_add_16x4_identity_identity_2_8bpc_rvv    238.5   243.2    1.97%
      
      inv_txfm_add_8x16_adst_adst_0_8bpc_rvv            701.5   624.2  -11.02%
      inv_txfm_add_8x16_adst_adst_1_8bpc_rvv            701.6   624.2  -11.03%
      inv_txfm_add_8x16_adst_adst_2_8bpc_rvv            853.5   755.2  -11.52%
      inv_txfm_add_8x16_adst_dct_0_8bpc_rvv             611.1   551.6   -9.74%
      inv_txfm_add_8x16_adst_dct_1_8bpc_rvv             611.2   551.7   -9.73%
      inv_txfm_add_8x16_adst_dct_2_8bpc_rvv             765.0   682.8  -10.75%
      inv_txfm_add_8x16_adst_flipadst_0_8bpc_rvv        703.4   629.3  -10.53%
      inv_txfm_add_8x16_adst_flipadst_1_8bpc_rvv        703.4   629.5  -10.51%
      inv_txfm_add_8x16_adst_flipadst_2_8bpc_rvv        858.1   763.9  -10.98%
      inv_txfm_add_8x16_adst_identity_0_8bpc_rvv        463.7   440.2   -5.07%
      inv_txfm_add_8x16_adst_identity_1_8bpc_rvv        464.3   440.2   -5.19%
      inv_txfm_add_8x16_adst_identity_2_8bpc_rvv        618.6   571.7   -7.58%
      inv_txfm_add_8x16_dct_adst_0_8bpc_rvv             660.3   590.5  -10.57%
      inv_txfm_add_8x16_dct_adst_1_8bpc_rvv             660.2   590.3  -10.59%
      inv_txfm_add_8x16_dct_adst_2_8bpc_rvv             776.2   687.9  -11.38%
      inv_txfm_add_8x16_dct_dct_0_8bpc_rvv              566.9   516.3   -8.93%
      inv_txfm_add_8x16_dct_dct_1_8bpc_rvv              567.1   516.4   -8.94%
      inv_txfm_add_8x16_dct_dct_2_8bpc_rvv              685.9   616.6  -10.10%
      inv_txfm_add_8x16_dct_flipadst_0_8bpc_rvv         663.3   593.5  -10.52%
      inv_txfm_add_8x16_dct_flipadst_1_8bpc_rvv         663.2   593.5  -10.51%
      inv_txfm_add_8x16_dct_flipadst_2_8bpc_rvv         771.7   690.5  -10.52%
      inv_txfm_add_8x16_dct_identity_0_8bpc_rvv         421.3   406.1   -3.61%
      inv_txfm_add_8x16_dct_identity_1_8bpc_rvv         421.3   406.1   -3.61%
      inv_txfm_add_8x16_dct_identity_2_8bpc_rvv         536.6   503.6   -6.15%
      inv_txfm_add_8x16_flipadst_adst_0_8bpc_rvv        703.3   627.1  -10.83%
      inv_txfm_add_8x16_flipadst_adst_1_8bpc_rvv        703.4   627.2  -10.83%
      inv_txfm_add_8x16_flipadst_adst_2_8bpc_rvv        857.7   763.7  -10.96%
      inv_txfm_add_8x16_flipadst_dct_0_8bpc_rvv         613.5   552.8   -9.89%
      inv_txfm_add_8x16_flipadst_dct_1_8bpc_rvv         613.4   552.7   -9.90%
      inv_txfm_add_8x16_flipadst_dct_2_8bpc_rvv         771.0   693.1  -10.10%
      inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_rvv    706.3   631.4  -10.60%
      inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_rvv    706.5   631.7  -10.59%
      inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_rvv    861.1    76.9  -11.17%
      inv_txfm_add_8x16_flipadst_identity_0_8bpc_rvv    467.0   443.0   -5.14%
      inv_txfm_add_8x16_flipadst_identity_1_8bpc_rvv    467.0   443.0   -5.14%
      inv_txfm_add_8x16_flipadst_identity_2_8bpc_rvv    623.7   575.1   -7.79%
      inv_txfm_add_8x16_identity_adst_0_8bpc_rvv        565.6   512.0   -9.48%
      inv_txfm_add_8x16_identity_adst_1_8bpc_rvv        565.6   512.9   -9.32%
      inv_txfm_add_8x16_identity_adst_2_8bpc_rvv        585.6   532.8   -9.02%
      inv_txfm_add_8x16_identity_dct_0_8bpc_rvv         476.4   439.9   -7.66%
      inv_txfm_add_8x16_identity_dct_1_8bpc_rvv         476.4   440.0   -7.64%
      inv_txfm_add_8x16_identity_dct_2_8bpc_rvv         496.3   459.5   -7.41%
      inv_txfm_add_8x16_identity_flipadst_0_8bpc_rvv    570.7   516.4   -9.51%
      inv_txfm_add_8x16_identity_flipadst_1_8bpc_rvv    570.6   516.3   -9.52%
      inv_txfm_add_8x16_identity_flipadst_2_8bpc_rvv    590.2   540.0   -8.51%
      inv_txfm_add_8x16_identity_identity_0_8bpc_rvv    330.9   329.9   -0.30%
      inv_txfm_add_8x16_identity_identity_1_8bpc_rvv    330.9   329.9   -0.30%
      inv_txfm_add_8x16_identity_identity_2_8bpc_rvv    350.8   349.7   -0.31%
      
      inv_txfm_add_16x8_adst_adst_0_8bpc_rvv            855.5   752.1  -12.09%
      inv_txfm_add_16x8_adst_adst_1_8bpc_rvv            855.5   751.9  -12.11%
      inv_txfm_add_16x8_adst_adst_2_8bpc_rvv            855.4   752.1  -12.08%
      inv_txfm_add_16x8_adst_dct_0_8bpc_rvv             765.4   685.5  -10.44%
      inv_txfm_add_16x8_adst_dct_1_8bpc_rvv             765.5   685.3  -10.48%
      inv_txfm_add_16x8_adst_dct_2_8bpc_rvv             765.5   685.5  -10.45%
      inv_txfm_add_16x8_adst_flipadst_0_8bpc_rvv        859.2   755.8  -12.03%
      inv_txfm_add_16x8_adst_flipadst_1_8bpc_rvv        859.1   756.0  -12.00%
      inv_txfm_add_16x8_adst_flipadst_2_8bpc_rvv        859.1   755.9  -12.01%
      inv_txfm_add_16x8_adst_identity_0_8bpc_rvv        612.8   561.9   -8.31%
      inv_txfm_add_16x8_adst_identity_1_8bpc_rvv        612.9   561.9   -8.32%
      inv_txfm_add_16x8_adst_identity_2_8bpc_rvv        612.8   561.9   -8.31%
      inv_txfm_add_16x8_dct_adst_0_8bpc_rvv             765.1   676.0  -11.65%
      inv_txfm_add_16x8_dct_adst_1_8bpc_rvv             765.0   676.2  -11.61%
      inv_txfm_add_16x8_dct_adst_2_8bpc_rvv             765.0   676.2  -11.61%
      inv_txfm_add_16x8_dct_dct_0_8bpc_rvv              674.5   612.0   -9.27%
      inv_txfm_add_16x8_dct_dct_1_8bpc_rvv              674.5   612.1   -9.25%
      inv_txfm_add_16x8_dct_dct_2_8bpc_rvv              674.6   612.0   -9.28%
      inv_txfm_add_16x8_dct_flipadst_0_8bpc_rvv         777.2   679.9  -12.52%
      inv_txfm_add_16x8_dct_flipadst_1_8bpc_rvv         777.1   680.1  -12.48%
      inv_txfm_add_16x8_dct_flipadst_2_8bpc_rvv         777.1   680.0  -12.50%
      inv_txfm_add_16x8_dct_identity_0_8bpc_rvv         522.2   488.2   -6.51%
      inv_txfm_add_16x8_dct_identity_1_8bpc_rvv         522.1   488.2   -6.49%
      inv_txfm_add_16x8_dct_identity_2_8bpc_rvv         522.1   487.5   -6.63%
      inv_txfm_add_16x8_flipadst_adst_0_8bpc_rvv        859.2   753.5  -12.30%
      inv_txfm_add_16x8_flipadst_adst_1_8bpc_rvv        859.2   753.6  -12.29%
      inv_txfm_add_16x8_flipadst_adst_2_8bpc_rvv        859.2   753.5  -12.30%
      inv_txfm_add_16x8_flipadst_dct_0_8bpc_rvv         768.9   689.0  -10.39%
      inv_txfm_add_16x8_flipadst_dct_1_8bpc_rvv         768.9   689.2  -10.37%
      inv_txfm_add_16x8_flipadst_dct_2_8bpc_rvv         768.8   689.2  -10.35%
      inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_rvv    863.0   758.7  -12.09%
      inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_rvv    862.9   758.7  -12.08%
      inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_rvv    863.0   758.6  -12.10%
      inv_txfm_add_16x8_flipadst_identity_0_8bpc_rvv    616.5   566.7   -8.08%
      inv_txfm_add_16x8_flipadst_identity_1_8bpc_rvv    616.6   566.6   -8.11%
      inv_txfm_add_16x8_flipadst_identity_2_8bpc_rvv    616.3   567.0   -8.00%
      inv_txfm_add_16x8_identity_adst_0_8bpc_rvv        618.1   564.5   -8.67%
      inv_txfm_add_16x8_identity_adst_1_8bpc_rvv        618.0   564.5   -8.66%
      inv_txfm_add_16x8_identity_adst_2_8bpc_rvv        617.7   564.6   -8.60%
      inv_txfm_add_16x8_identity_dct_0_8bpc_rvv         527.9   500.6   -5.17%
      inv_txfm_add_16x8_identity_dct_1_8bpc_rvv         527.8   500.7   -5.13%
      inv_txfm_add_16x8_identity_dct_2_8bpc_rvv         527.7   500.7   -5.12%
      inv_txfm_add_16x8_identity_flipadst_0_8bpc_rvv    622.3   568.5   -8.65%
      inv_txfm_add_16x8_identity_flipadst_1_8bpc_rvv    622.2   568.5   -8.63%
      inv_txfm_add_16x8_identity_flipadst_2_8bpc_rvv    622.3   568.4   -8.66%
      inv_txfm_add_16x8_identity_identity_0_8bpc_rvv    373.4   374.4    0.27%
      inv_txfm_add_16x8_identity_identity_1_8bpc_rvv    373.4   374.5    0.29%
      inv_txfm_add_16x8_identity_identity_2_8bpc_rvv    373.4   374.4    0.27%
      
      SpacemiT K1                                        Old     New     Delta
      
      inv_txfm_add_4x4_adst_adst_0_8bpc_rvv             101.0    96.8   -4.16%
      inv_txfm_add_4x4_adst_adst_1_8bpc_rvv             101.1    96.8   -4.25%
      inv_txfm_add_4x4_adst_dct_0_8bpc_rvv               96.8    91.7   -5.27%
      inv_txfm_add_4x4_adst_dct_1_8bpc_rvv               95.9    91.8   -4.28%
      inv_txfm_add_4x4_adst_flipadst_0_8bpc_rvv         102.2    97.9   -4.21%
      inv_txfm_add_4x4_adst_flipadst_1_8bpc_rvv         102.2    97.9   -4.21%
      inv_txfm_add_4x4_adst_identity_0_8bpc_rvv          82.4    80.4   -2.43%
      inv_txfm_add_4x4_adst_identity_1_8bpc_rvv          82.4    80.5   -2.31%
      inv_txfm_add_4x4_dct_adst_0_8bpc_rvv               97.3    92.6   -4.83%
      inv_txfm_add_4x4_dct_adst_1_8bpc_rvv               97.2    92.3   -5.04%
      inv_txfm_add_4x4_dct_dct_0_8bpc_rvv                41.2    41.3    0.24%
      inv_txfm_add_4x4_dct_dct_1_8bpc_rvv                96.0    87.5   -8.85%
      inv_txfm_add_4x4_dct_flipadst_0_8bpc_rvv           98.5    94.5   -4.06%
      inv_txfm_add_4x4_dct_flipadst_1_8bpc_rvv           98.6    94.7   -3.96%
      inv_txfm_add_4x4_dct_identity_0_8bpc_rvv           78.6    76.1   -3.18%
      inv_txfm_add_4x4_dct_identity_1_8bpc_rvv           78.6    76.0   -3.31%
      inv_txfm_add_4x4_flipadst_adst_0_8bpc_rvv         104.3    99.1   -4.99%
      inv_txfm_add_4x4_flipadst_adst_1_8bpc_rvv         104.4    99.1   -5.08%
      inv_txfm_add_4x4_flipadst_dct_0_8bpc_rvv           98.0    94.6   -3.47%
      inv_txfm_add_4x4_flipadst_dct_1_8bpc_rvv           98.1    94.4   -3.77%
      inv_txfm_add_4x4_flipadst_flipadst_0_8bpc_rvv     104.2    99.2   -4.80%
      inv_txfm_add_4x4_flipadst_flipadst_1_8bpc_rvv     104.3    99.2   -4.89%
      inv_txfm_add_4x4_flipadst_identity_0_8bpc_rvv      86.9    81.8   -5.87%
      inv_txfm_add_4x4_flipadst_identity_1_8bpc_rvv      87.0    81.9   -5.86%
      inv_txfm_add_4x4_identity_adst_0_8bpc_rvv          86.0    80.8   -6.05%
      inv_txfm_add_4x4_identity_adst_1_8bpc_rvv          85.9    81.4   -5.24%
      inv_txfm_add_4x4_identity_dct_0_8bpc_rvv           78.5    76.1   -3.06%
      inv_txfm_add_4x4_identity_dct_1_8bpc_rvv           78.6    76.1   -3.18%
      inv_txfm_add_4x4_identity_flipadst_0_8bpc_rvv      85.9    82.5   -3.96%
      inv_txfm_add_4x4_identity_flipadst_1_8bpc_rvv      85.9    82.3   -4.19%
      inv_txfm_add_4x4_identity_identity_0_8bpc_rvv      65.9    64.9   -1.52%
      inv_txfm_add_4x4_identity_identity_1_8bpc_rvv      65.9    64.8   -1.67%
      inv_txfm_add_4x4_wht_wht_0_8bpc_rvv                71.2    71.3    0.14%
      inv_txfm_add_4x4_wht_wht_1_8bpc_rvv                71.2    71.3    0.14%
      
      inv_txfm_add_8x8_adst_adst_0_8bpc_rvv             440.6   399.3   -9.37%
      inv_txfm_add_8x8_adst_adst_1_8bpc_rvv             440.6   399.3   -9.37%
      inv_txfm_add_8x8_adst_dct_0_8bpc_rvv              401.7   368.4   -8.29%
      inv_txfm_add_8x8_adst_dct_1_8bpc_rvv              401.8   368.4   -8.31%
      inv_txfm_add_8x8_adst_flipadst_0_8bpc_rvv         442.4   401.2   -9.31%
      inv_txfm_add_8x8_adst_flipadst_1_8bpc_rvv         442.4   401.1   -9.34%
      inv_txfm_add_8x8_adst_identity_0_8bpc_rvv         329.7   310.1   -5.94%
      inv_txfm_add_8x8_adst_identity_1_8bpc_rvv         329.7   310.1   -5.94%
      inv_txfm_add_8x8_dct_adst_0_8bpc_rvv              401.8   367.4   -8.56%
      inv_txfm_add_8x8_dct_adst_1_8bpc_rvv              401.7   367.3   -8.56%
      inv_txfm_add_8x8_dct_dct_0_8bpc_rvv                79.5    80.2    0.88%
      inv_txfm_add_8x8_dct_dct_1_8bpc_rvv               362.1   335.8   -7.26%
      inv_txfm_add_8x8_dct_flipadst_0_8bpc_rvv          405.0   369.2   -8.84%
      inv_txfm_add_8x8_dct_flipadst_1_8bpc_rvv          405.1   369.2   -8.86%
      inv_txfm_add_8x8_dct_identity_0_8bpc_rvv          290.9   278.2   -4.37%
      inv_txfm_add_8x8_dct_identity_1_8bpc_rvv          290.8   278.2   -4.33%
      inv_txfm_add_8x8_flipadst_adst_0_8bpc_rvv         442.5   401.1   -9.36%
      inv_txfm_add_8x8_flipadst_adst_1_8bpc_rvv         442.5   401.2   -9.33%
      inv_txfm_add_8x8_flipadst_dct_0_8bpc_rvv          405.8   369.2   -9.02%
      inv_txfm_add_8x8_flipadst_dct_1_8bpc_rvv          405.8   369.1   -9.04%
      inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_rvv     444.3   403.0   -9.30%
      inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_rvv     444.3   403.1   -9.27%
      inv_txfm_add_8x8_flipadst_identity_0_8bpc_rvv     331.6   310.9   -6.24%
      inv_txfm_add_8x8_flipadst_identity_1_8bpc_rvv     331.6   310.9   -6.24%
      inv_txfm_add_8x8_identity_adst_0_8bpc_rvv         313.3   292.6   -6.61%
      inv_txfm_add_8x8_identity_adst_1_8bpc_rvv         313.1   292.6   -6.55%
      inv_txfm_add_8x8_identity_dct_0_8bpc_rvv          274.5   260.6   -5.06%
      inv_txfm_add_8x8_identity_dct_1_8bpc_rvv          274.4   260.7   -4.99%
      inv_txfm_add_8x8_identity_flipadst_0_8bpc_rvv     315.3   294.4   -6.63%
      inv_txfm_add_8x8_identity_flipadst_1_8bpc_rvv     315.3   294.4   -6.63%
      inv_txfm_add_8x8_identity_identity_0_8bpc_rvv     202.5   202.5    0.00%
      inv_txfm_add_8x8_identity_identity_1_8bpc_rvv     202.6   202.5   -0.05%
      
      inv_txfm_add_16x16_adst_adst_0_8bpc_rvv          1418.8  1268.2  -10.61%
      inv_txfm_add_16x16_adst_adst_1_8bpc_rvv          1418.9  1268.3  -10.61%
      inv_txfm_add_16x16_adst_adst_2_8bpc_rvv          1943.3  1733.6  -10.79%
      inv_txfm_add_16x16_adst_dct_0_8bpc_rvv           1241.7  1134.6   -8.63%
      inv_txfm_add_16x16_adst_dct_1_8bpc_rvv           1241.5  1134.5   -8.62%
      inv_txfm_add_16x16_adst_dct_2_8bpc_rvv           1772.5  1599.8   -9.74%
      inv_txfm_add_16x16_adst_flipadst_0_8bpc_rvv      1429.8  1270.3  -11.16%
      inv_txfm_add_16x16_adst_flipadst_1_8bpc_rvv      1429.7  1270.1  -11.16%
      inv_txfm_add_16x16_adst_flipadst_2_8bpc_rvv      1951.1  1741.4  -10.75%
      inv_txfm_add_16x16_dct_adst_0_8bpc_rvv           1337.8  1195.8  -10.61%
      inv_txfm_add_16x16_dct_adst_1_8bpc_rvv           1337.5  1196.0  -10.58%
      inv_txfm_add_16x16_dct_adst_2_8bpc_rvv           1763.2  1604.6   -9.00%
      inv_txfm_add_16x16_dct_dct_0_8bpc_rvv             179.3   181.1    1.00%
      inv_txfm_add_16x16_dct_dct_1_8bpc_rvv            1153.8  1060.7   -8.07%
      inv_txfm_add_16x16_dct_dct_2_8bpc_rvv            1601.6  1470.6   -8.18%
      inv_txfm_add_16x16_dct_flipadst_0_8bpc_rvv       1340.7  1199.8  -10.51%
      inv_txfm_add_16x16_dct_flipadst_1_8bpc_rvv       1340.4  1199.8  -10.49%
      inv_txfm_add_16x16_dct_flipadst_2_8bpc_rvv       1771.2  1606.6   -9.29%
      inv_txfm_add_16x16_dct_identity_0_8bpc_rvv        877.9   854.9   -2.62%
      inv_txfm_add_16x16_dct_identity_1_8bpc_rvv        877.7   855.2   -2.56%
      inv_txfm_add_16x16_dct_identity_2_8bpc_rvv       1311.6  1254.1   -4.38%
      inv_txfm_add_16x16_flipadst_adst_0_8bpc_rvv      1428.2  1270.5  -11.04%
      inv_txfm_add_16x16_flipadst_adst_1_8bpc_rvv      1428.3  1270.6  -11.04%
      inv_txfm_add_16x16_flipadst_adst_2_8bpc_rvv      1947.3  1737.3  -10.78%
      inv_txfm_add_16x16_flipadst_dct_0_8bpc_rvv       1245.8  1133.5   -9.01%
      inv_txfm_add_16x16_flipadst_dct_1_8bpc_rvv       1246.0  1133.7   -9.01%
      inv_txfm_add_16x16_flipadst_dct_2_8bpc_rvv       1769.9  1603.9   -9.38%
      inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_rvv  1428.7  1279.7  -10.43%
      inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_rvv  1428.8  1279.5  -10.45%
      inv_txfm_add_16x16_flipadst_flipadst_2_8bpc_rvv  1960.8  1745.8  -10.96%
      inv_txfm_add_16x16_identity_dct_0_8bpc_rvv       1016.6   948.8   -6.67%
      inv_txfm_add_16x16_identity_dct_1_8bpc_rvv       1016.7   948.8   -6.68%
      inv_txfm_add_16x16_identity_dct_2_8bpc_rvv       1319.8  1247.7   -5.46%
      inv_txfm_add_16x16_identity_identity_0_8bpc_rvv   735.4   736.6    0.16%
      inv_txfm_add_16x16_identity_identity_1_8bpc_rvv   735.3   736.4    0.15%
      inv_txfm_add_16x16_identity_identity_2_8bpc_rvv  1037.8  1036.7   -0.11%
      
      inv_txfm_add_4x8_adst_adst_0_8bpc_rvv             197.2   179.9   -8.77%
      inv_txfm_add_4x8_adst_adst_1_8bpc_rvv             197.1   180.0   -8.68%
      inv_txfm_add_4x8_adst_dct_0_8bpc_rvv              177.5   164.2   -7.49%
      inv_txfm_add_4x8_adst_dct_1_8bpc_rvv              177.5   164.3   -7.44%
      inv_txfm_add_4x8_adst_flipadst_0_8bpc_rvv         199.3   181.8   -8.78%
      inv_txfm_add_4x8_adst_flipadst_1_8bpc_rvv         199.0   181.8   -8.64%
      inv_txfm_add_4x8_adst_identity_0_8bpc_rvv         126.7   121.8   -3.87%
      inv_txfm_add_4x8_adst_identity_1_8bpc_rvv         126.7   121.9   -3.79%
      inv_txfm_add_4x8_dct_adst_0_8bpc_rvv              189.8   172.4   -9.17%
      inv_txfm_add_4x8_dct_adst_1_8bpc_rvv              189.8   172.4   -9.17%
      inv_txfm_add_4x8_dct_dct_0_8bpc_rvv               170.2   156.8   -7.87%
      inv_txfm_add_4x8_dct_dct_1_8bpc_rvv               170.2   156.9   -7.81%
      inv_txfm_add_4x8_dct_flipadst_0_8bpc_rvv          192.6   174.2   -9.55%
      inv_txfm_add_4x8_dct_flipadst_1_8bpc_rvv          192.6   174.2   -9.55%
      inv_txfm_add_4x8_dct_identity_0_8bpc_rvv          119.4   114.3   -4.27%
      inv_txfm_add_4x8_dct_identity_1_8bpc_rvv          119.6   114.2   -4.52%
      inv_txfm_add_4x8_flipadst_adst_0_8bpc_rvv         197.7   180.5   -8.70%
      inv_txfm_add_4x8_flipadst_adst_1_8bpc_rvv         197.8   180.6   -8.70%
      inv_txfm_add_4x8_flipadst_dct_0_8bpc_rvv          178.3   165.0   -7.46%
      inv_txfm_add_4x8_flipadst_dct_1_8bpc_rvv          178.3   164.9   -7.52%
      inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_rvv     199.7   182.5   -8.61%
      inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_rvv     200.0   182.4   -8.80%
      inv_txfm_add_4x8_flipadst_identity_0_8bpc_rvv     127.2   122.3   -3.85%
      inv_txfm_add_4x8_flipadst_identity_1_8bpc_rvv     127.3   122.5   -3.77%
      inv_txfm_add_4x8_identity_adst_0_8bpc_rvv         172.1   155.0   -9.94%
      inv_txfm_add_4x8_identity_adst_1_8bpc_rvv         172.1   155.0   -9.94%
      inv_txfm_add_4x8_identity_dct_0_8bpc_rvv          148.7   139.4   -6.25%
      inv_txfm_add_4x8_identity_dct_1_8bpc_rvv          148.7   139.5   -6.19%
      inv_txfm_add_4x8_identity_flipadst_0_8bpc_rvv     171.7   156.8   -8.68%
      inv_txfm_add_4x8_identity_flipadst_1_8bpc_rvv     171.6   156.9   -8.57%
      inv_txfm_add_4x8_identity_identity_0_8bpc_rvv      96.8    96.8    0.00%
      inv_txfm_add_4x8_identity_identity_1_8bpc_rvv      96.7    96.7    0.00%
      
      inv_txfm_add_8x4_adst_adst_0_8bpc_rvv             228.1   220.0   -3.55%
      inv_txfm_add_8x4_adst_adst_1_8bpc_rvv             227.9   219.9   -3.51%
      inv_txfm_add_8x4_adst_dct_0_8bpc_rvv              219.4   206.4   -5.93%
      inv_txfm_add_8x4_adst_dct_1_8bpc_rvv              219.4   206.4   -5.93%
      inv_txfm_add_8x4_adst_flipadst_0_8bpc_rvv         229.4   214.7   -6.41%
      inv_txfm_add_8x4_adst_flipadst_1_8bpc_rvv         229.4   214.8   -6.36%
      inv_txfm_add_8x4_adst_identity_0_8bpc_rvv         195.6   187.6   -4.09%
      inv_txfm_add_8x4_adst_identity_1_8bpc_rvv         195.8   187.6   -4.19%
      inv_txfm_add_8x4_dct_adst_0_8bpc_rvv              207.0   195.2   -5.70%
      inv_txfm_add_8x4_dct_adst_1_8bpc_rvv              206.9   195.2   -5.65%
      inv_txfm_add_8x4_dct_dct_0_8bpc_rvv               199.4   188.2   -5.62%
      inv_txfm_add_8x4_dct_dct_1_8bpc_rvv               199.4   188.5   -5.47%
      inv_txfm_add_8x4_dct_flipadst_0_8bpc_rvv          209.5   196.5   -6.21%
      inv_txfm_add_8x4_dct_flipadst_1_8bpc_rvv          209.7   196.6   -6.25%
      inv_txfm_add_8x4_dct_identity_0_8bpc_rvv          175.7   169.5   -3.53%
      inv_txfm_add_8x4_dct_identity_1_8bpc_rvv          175.9   169.6   -3.58%
      inv_txfm_add_8x4_flipadst_adst_0_8bpc_rvv         229.0   214.7   -6.24%
      inv_txfm_add_8x4_flipadst_adst_1_8bpc_rvv         229.3   214.5   -6.45%
      inv_txfm_add_8x4_flipadst_dct_0_8bpc_rvv          220.9   206.7   -6.43%
      inv_txfm_add_8x4_flipadst_dct_1_8bpc_rvv          220.6   206.5   -6.39%
      inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_rvv     230.6   215.9   -6.37%
      inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_rvv     230.7   215.9   -6.42%
      inv_txfm_add_8x4_flipadst_identity_0_8bpc_rvv     196.9   188.9   -4.06%
      inv_txfm_add_8x4_flipadst_identity_1_8bpc_rvv     196.9   188.9   -4.06%
      inv_txfm_add_8x4_identity_adst_0_8bpc_rvv         157.6   154.7   -1.84%
      inv_txfm_add_8x4_identity_adst_1_8bpc_rvv         157.5   154.9   -1.65%
      inv_txfm_add_8x4_identity_dct_0_8bpc_rvv          150.0   147.9   -1.40%
      inv_txfm_add_8x4_identity_dct_1_8bpc_rvv          150.0   147.7   -1.53%
      inv_txfm_add_8x4_identity_flipadst_0_8bpc_rvv     159.6   155.9   -2.32%
      inv_txfm_add_8x4_identity_flipadst_1_8bpc_rvv     159.8   155.6   -2.63%
      inv_txfm_add_8x4_identity_identity_0_8bpc_rvv     128.6   128.8    0.16%
      inv_txfm_add_8x4_identity_identity_1_8bpc_rvv     128.4   129.3    0.70%
      
      inv_txfm_add_4x16_adst_adst_0_8bpc_rvv            373.8   335.9  -10.14%
      inv_txfm_add_4x16_adst_adst_1_8bpc_rvv            373.8   335.7  -10.19%
      inv_txfm_add_4x16_adst_adst_2_8bpc_rvv            417.4   380.0   -8.96%
      inv_txfm_add_4x16_adst_dct_0_8bpc_rvv             328.3   301.7   -8.10%
      inv_txfm_add_4x16_adst_dct_1_8bpc_rvv             328.0   302.0   -7.93%
      inv_txfm_add_4x16_adst_dct_2_8bpc_rvv             374.3   351.3   -6.14%
      inv_txfm_add_4x16_adst_flipadst_0_8bpc_rvv        374.5   339.8   -9.27%
      inv_txfm_add_4x16_adst_flipadst_1_8bpc_rvv        374.3   339.4   -9.32%
      inv_txfm_add_4x16_adst_flipadst_2_8bpc_rvv        422.0   383.8   -9.05%
      inv_txfm_add_4x16_adst_identity_0_8bpc_rvv        248.0   242.9   -2.06%
      inv_txfm_add_4x16_adst_identity_1_8bpc_rvv        248.0   242.2   -2.34%
      inv_txfm_add_4x16_adst_identity_2_8bpc_rvv        298.6   290.3   -2.78%
      inv_txfm_add_4x16_dct_adst_0_8bpc_rvv             370.5   329.4  -11.09%
      inv_txfm_add_4x16_dct_adst_1_8bpc_rvv             370.8   329.0  -11.27%
      inv_txfm_add_4x16_dct_adst_2_8bpc_rvv             409.1   360.9  -11.78%
      inv_txfm_add_4x16_dct_dct_0_8bpc_rvv              321.1   293.7   -8.53%
      inv_txfm_add_4x16_dct_dct_1_8bpc_rvv              321.0   294.3   -8.32%
      inv_txfm_add_4x16_dct_dct_2_8bpc_rvv              357.8   329.8   -7.83%
      inv_txfm_add_4x16_dct_flipadst_0_8bpc_rvv         369.7   332.9   -9.95%
      inv_txfm_add_4x16_dct_flipadst_1_8bpc_rvv         370.4   333.0  -10.10%
      inv_txfm_add_4x16_dct_flipadst_2_8bpc_rvv         405.5   364.9  -10.01%
      inv_txfm_add_4x16_dct_identity_0_8bpc_rvv         241.6   236.6   -2.07%
      inv_txfm_add_4x16_dct_identity_1_8bpc_rvv         241.8   235.6   -2.56%
      inv_txfm_add_4x16_dct_identity_2_8bpc_rvv         281.9   266.9   -5.32%
      inv_txfm_add_4x16_flipadst_adst_0_8bpc_rvv        371.9   337.3   -9.30%
      inv_txfm_add_4x16_flipadst_adst_1_8bpc_rvv        372.2   337.1   -9.43%
      inv_txfm_add_4x16_flipadst_adst_2_8bpc_rvv        419.8   381.5   -9.12%
      inv_txfm_add_4x16_flipadst_dct_0_8bpc_rvv         328.3   302.9   -7.74%
      inv_txfm_add_4x16_flipadst_dct_1_8bpc_rvv         328.4   303.3   -7.64%
      inv_txfm_add_4x16_flipadst_dct_2_8bpc_rvv         380.6   343.7   -9.70%
      inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_rvv    377.7   341.1   -9.69%
      inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_rvv    377.6   341.5   -9.56%
      inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_rvv    423.6   386.7   -8.71%
      inv_txfm_add_4x16_flipadst_identity_0_8bpc_rvv    250.0   245.7   -1.72%
      inv_txfm_add_4x16_flipadst_identity_1_8bpc_rvv    249.3   246.0   -1.32%
      inv_txfm_add_4x16_flipadst_identity_2_8bpc_rvv    296.4   284.7   -3.95%
      inv_txfm_add_4x16_identity_adst_0_8bpc_rvv        343.0   311.2   -9.27%
      inv_txfm_add_4x16_identity_adst_1_8bpc_rvv        342.9   311.0   -9.30%
      inv_txfm_add_4x16_identity_adst_2_8bpc_rvv        354.8   325.0   -8.40%
      inv_txfm_add_4x16_identity_dct_0_8bpc_rvv         298.9   274.9   -8.03%
      inv_txfm_add_4x16_identity_dct_1_8bpc_rvv         298.8   275.0   -7.97%
      inv_txfm_add_4x16_identity_dct_2_8bpc_rvv         310.3   289.1   -6.83%
      inv_txfm_add_4x16_identity_flipadst_0_8bpc_rvv    344.7   314.9   -8.65%
      inv_txfm_add_4x16_identity_flipadst_1_8bpc_rvv    344.5   314.8   -8.62%
      inv_txfm_add_4x16_identity_flipadst_2_8bpc_rvv    358.3   328.6   -8.29%
      inv_txfm_add_4x16_identity_identity_0_8bpc_rvv    219.6   216.1   -1.59%
      inv_txfm_add_4x16_identity_identity_1_8bpc_rvv    218.3   216.3   -0.92%
      inv_txfm_add_4x16_identity_identity_2_8bpc_rvv    231.3   229.6   -0.73%
      
      inv_txfm_add_16x4_adst_adst_0_8bpc_rvv            468.5   428.8   -8.47%
      inv_txfm_add_16x4_adst_adst_1_8bpc_rvv            468.5   428.9   -8.45%
      inv_txfm_add_16x4_adst_adst_2_8bpc_rvv            468.5   428.9   -8.45%
      inv_txfm_add_16x4_adst_dct_0_8bpc_rvv             453.8   414.5   -8.66%
      inv_txfm_add_16x4_adst_dct_1_8bpc_rvv             453.8   414.5   -8.66%
      inv_txfm_add_16x4_adst_dct_2_8bpc_rvv             453.9   414.4   -8.70%
      inv_txfm_add_16x4_adst_flipadst_0_8bpc_rvv        471.0   431.5   -8.39%
      inv_txfm_add_16x4_adst_flipadst_1_8bpc_rvv        471.0   431.3   -8.43%
      inv_txfm_add_16x4_adst_flipadst_2_8bpc_rvv        471.0   431.5   -8.39%
      inv_txfm_add_16x4_adst_identity_0_8bpc_rvv        402.2   375.0   -6.76%
      inv_txfm_add_16x4_adst_identity_1_8bpc_rvv        402.1   375.0   -6.74%
      inv_txfm_add_16x4_adst_identity_2_8bpc_rvv        402.0   375.3   -6.64%
      inv_txfm_add_16x4_dct_adst_0_8bpc_rvv             432.8   392.5   -9.31%
      inv_txfm_add_16x4_dct_adst_1_8bpc_rvv             432.8   392.5   -9.31%
      inv_txfm_add_16x4_dct_adst_2_8bpc_rvv             432.8   392.5   -9.31%
      inv_txfm_add_16x4_dct_dct_0_8bpc_rvv              407.9   378.3   -7.26%
      inv_txfm_add_16x4_dct_dct_1_8bpc_rvv              407.8   378.1   -7.28%
      inv_txfm_add_16x4_dct_dct_2_8bpc_rvv              407.8   378.1   -7.28%
      inv_txfm_add_16x4_dct_flipadst_0_8bpc_rvv         426.0   395.1   -7.25%
      inv_txfm_add_16x4_dct_flipadst_1_8bpc_rvv         425.9   395.0   -7.26%
      inv_txfm_add_16x4_dct_flipadst_2_8bpc_rvv         426.0   395.1   -7.25%
      inv_txfm_add_16x4_dct_identity_0_8bpc_rvv         357.1   338.7   -5.15%
      inv_txfm_add_16x4_dct_identity_1_8bpc_rvv         357.1   338.7   -5.15%
      inv_txfm_add_16x4_dct_identity_2_8bpc_rvv         357.2   338.7   -5.18%
      inv_txfm_add_16x4_flipadst_adst_0_8bpc_rvv        472.4   432.6   -8.43%
      inv_txfm_add_16x4_flipadst_adst_1_8bpc_rvv        472.2   432.6   -8.39%
      inv_txfm_add_16x4_flipadst_adst_2_8bpc_rvv        472.3   432.7   -8.38%
      inv_txfm_add_16x4_flipadst_dct_0_8bpc_rvv         464.3   418.2   -9.93%
      inv_txfm_add_16x4_flipadst_dct_1_8bpc_rvv         464.2   418.2   -9.91%
      inv_txfm_add_16x4_flipadst_dct_2_8bpc_rvv         464.2   418.2   -9.91%
      inv_txfm_add_16x4_flipadst_flipadst_0_8bpc_rvv    474.7   435.1   -8.34%
      inv_txfm_add_16x4_flipadst_flipadst_1_8bpc_rvv    474.8   435.1   -8.36%
      inv_txfm_add_16x4_flipadst_flipadst_2_8bpc_rvv    474.7   435.1   -8.34%
      inv_txfm_add_16x4_flipadst_identity_0_8bpc_rvv    405.9   378.8   -6.68%
      inv_txfm_add_16x4_flipadst_identity_1_8bpc_rvv    406.0   378.8   -6.70%
      inv_txfm_add_16x4_flipadst_identity_2_8bpc_rvv    406.0   378.8   -6.70%
      inv_txfm_add_16x4_identity_adst_0_8bpc_rvv        353.7   342.2   -3.25%
      inv_txfm_add_16x4_identity_adst_1_8bpc_rvv        353.8   342.3   -3.25%
      inv_txfm_add_16x4_identity_adst_2_8bpc_rvv        353.7   342.4   -3.19%
      inv_txfm_add_16x4_identity_dct_0_8bpc_rvv         338.1   327.9   -3.02%
      inv_txfm_add_16x4_identity_dct_1_8bpc_rvv         338.1   327.9   -3.02%
      inv_txfm_add_16x4_identity_dct_2_8bpc_rvv         338.2   327.9   -3.05%
      inv_txfm_add_16x4_identity_flipadst_0_8bpc_rvv    357.5   344.8   -3.55%
      inv_txfm_add_16x4_identity_flipadst_1_8bpc_rvv    357.5   344.9   -3.52%
      inv_txfm_add_16x4_identity_flipadst_2_8bpc_rvv    357.5   344.7   -3.58%
      inv_txfm_add_16x4_identity_identity_0_8bpc_rvv    287.1   297.0    3.45%
      inv_txfm_add_16x4_identity_identity_1_8bpc_rvv    287.2   297.0    3.41%
      inv_txfm_add_16x4_identity_identity_2_8bpc_rvv    287.2   297.0    3.41%
      
      inv_txfm_add_8x16_adst_adst_0_8bpc_rvv            774.3   704.8   -8.98%
      inv_txfm_add_8x16_adst_adst_1_8bpc_rvv            774.4   704.8   -8.99%
      inv_txfm_add_8x16_adst_adst_2_8bpc_rvv            929.5   839.9   -9.64%
      inv_txfm_add_8x16_adst_dct_0_8bpc_rvv             687.9   634.9   -7.70%
      inv_txfm_add_8x16_adst_dct_1_8bpc_rvv             688.0   634.8   -7.73%
      inv_txfm_add_8x16_adst_dct_2_8bpc_rvv             845.5   768.4   -9.12%
      inv_txfm_add_8x16_adst_flipadst_0_8bpc_rvv        779.5   708.5   -9.11%
      inv_txfm_add_8x16_adst_flipadst_1_8bpc_rvv        779.5   708.5   -9.11%
      inv_txfm_add_8x16_adst_flipadst_2_8bpc_rvv        933.3   849.9   -8.94%
      inv_txfm_add_8x16_adst_identity_0_8bpc_rvv        546.5   529.0   -3.20%
      inv_txfm_add_8x16_adst_identity_1_8bpc_rvv        546.5   529.0   -3.20%
      inv_txfm_add_8x16_adst_identity_2_8bpc_rvv        702.5   664.1   -5.47%
      inv_txfm_add_8x16_dct_adst_0_8bpc_rvv             739.9   672.7   -9.08%
      inv_txfm_add_8x16_dct_adst_1_8bpc_rvv             739.9   672.7   -9.08%
      inv_txfm_add_8x16_dct_adst_2_8bpc_rvv             863.1   776.1  -10.08%
      inv_txfm_add_8x16_dct_dct_0_8bpc_rvv              651.2   601.9   -7.57%
      inv_txfm_add_8x16_dct_dct_1_8bpc_rvv              651.2   601.8   -7.59%
      inv_txfm_add_8x16_dct_dct_2_8bpc_rvv              777.6   706.5   -9.14%
      inv_txfm_add_8x16_dct_flipadst_0_8bpc_rvv         742.4   678.9   -8.55%
      inv_txfm_add_8x16_dct_flipadst_1_8bpc_rvv         742.5   678.9   -8.57%
      inv_txfm_add_8x16_dct_flipadst_2_8bpc_rvv         858.8   779.3   -9.26%
      inv_txfm_add_8x16_dct_identity_0_8bpc_rvv         510.8   496.4   -2.82%
      inv_txfm_add_8x16_dct_identity_1_8bpc_rvv         510.6   496.5   -2.76%
      inv_txfm_add_8x16_dct_identity_2_8bpc_rvv         630.0   599.7   -4.81%
      inv_txfm_add_8x16_flipadst_adst_0_8bpc_rvv        778.3   707.2   -9.14%
      inv_txfm_add_8x16_flipadst_adst_1_8bpc_rvv        778.3   707.1   -9.15%
      inv_txfm_add_8x16_flipadst_adst_2_8bpc_rvv        934.4   843.5   -9.73%
      inv_txfm_add_8x16_flipadst_dct_0_8bpc_rvv         689.3   634.7   -7.92%
      inv_txfm_add_8x16_flipadst_dct_1_8bpc_rvv         689.2   634.8   -7.89%
      inv_txfm_add_8x16_flipadst_dct_2_8bpc_rvv         845.8   774.4   -8.44%
      inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_rvv    779.9   710.5   -8.90%
      inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_rvv    780.0   710.4   -8.92%
      inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_rvv    936.4   848.1   -9.43%
      inv_txfm_add_8x16_flipadst_identity_0_8bpc_rvv    550.4   531.3   -3.47%
      inv_txfm_add_8x16_flipadst_identity_1_8bpc_rvv    550.4   531.3   -3.47%
      inv_txfm_add_8x16_flipadst_identity_2_8bpc_rvv    705.3   669.4   -5.09%
      inv_txfm_add_8x16_identity_adst_0_8bpc_rvv        649.0   599.7   -7.60%
      inv_txfm_add_8x16_identity_adst_1_8bpc_rvv        649.0   599.7   -7.60%
      inv_txfm_add_8x16_identity_adst_2_8bpc_rvv        682.8   633.4   -7.23%
      inv_txfm_add_8x16_identity_dct_0_8bpc_rvv         562.1   527.9   -6.08%
      inv_txfm_add_8x16_identity_dct_1_8bpc_rvv         562.0   527.9   -6.07%
      inv_txfm_add_8x16_identity_dct_2_8bpc_rvv         597.4   561.5   -6.01%
      inv_txfm_add_8x16_identity_flipadst_0_8bpc_rvv    652.7   603.6   -7.52%
      inv_txfm_add_8x16_identity_flipadst_1_8bpc_rvv    652.8   603.6   -7.54%
      inv_txfm_add_8x16_identity_flipadst_2_8bpc_rvv    686.6   640.5   -6.71%
      inv_txfm_add_8x16_identity_identity_0_8bpc_rvv    421.6   424.4    0.66%
      inv_txfm_add_8x16_identity_identity_1_8bpc_rvv    421.7   424.4    0.64%
      inv_txfm_add_8x16_identity_identity_2_8bpc_rvv    455.5   458.1    0.57%
      
      inv_txfm_add_16x8_adst_adst_0_8bpc_rvv            935.2   843.2   -9.84%
      inv_txfm_add_16x8_adst_adst_1_8bpc_rvv            935.2   843.3   -9.83%
      inv_txfm_add_16x8_adst_adst_2_8bpc_rvv            935.2   843.1   -9.85%
      inv_txfm_add_16x8_adst_dct_0_8bpc_rvv             857.0   781.1   -8.86%
      inv_txfm_add_16x8_adst_dct_1_8bpc_rvv             856.9   781.1   -8.85%
      inv_txfm_add_16x8_adst_dct_2_8bpc_rvv             856.9   781.0   -8.86%
      inv_txfm_add_16x8_adst_flipadst_0_8bpc_rvv        938.9   846.8   -9.81%
      inv_txfm_add_16x8_adst_flipadst_1_8bpc_rvv        938.8   847.0   -9.78%
      inv_txfm_add_16x8_adst_flipadst_2_8bpc_rvv        938.9   847.0   -9.79%
      inv_txfm_add_16x8_adst_identity_0_8bpc_rvv        711.2   661.6   -6.97%
      inv_txfm_add_16x8_adst_identity_1_8bpc_rvv        711.2   661.6   -6.97%
      inv_txfm_add_16x8_adst_identity_2_8bpc_rvv        711.2   661.6   -6.97%
      inv_txfm_add_16x8_dct_adst_0_8bpc_rvv             846.1   771.5   -8.82%
      inv_txfm_add_16x8_dct_adst_1_8bpc_rvv             845.9   771.5   -8.80%
      inv_txfm_add_16x8_dct_adst_2_8bpc_rvv             846.2   772.1   -8.76%
      inv_txfm_add_16x8_dct_dct_0_8bpc_rvv              767.8   710.3   -7.49%
      inv_txfm_add_16x8_dct_dct_1_8bpc_rvv              767.8   710.4   -7.48%
      inv_txfm_add_16x8_dct_dct_2_8bpc_rvv              767.4   710.4   -7.43%
      inv_txfm_add_16x8_dct_flipadst_0_8bpc_rvv         856.6   775.6   -9.46%
      inv_txfm_add_16x8_dct_flipadst_1_8bpc_rvv         856.5   775.1   -9.50%
      inv_txfm_add_16x8_dct_flipadst_2_8bpc_rvv         856.6   775.2   -9.50%
      inv_txfm_add_16x8_dct_identity_0_8bpc_rvv         623.3   589.9   -5.36%
      inv_txfm_add_16x8_dct_identity_1_8bpc_rvv         623.3   590.0   -5.34%
      inv_txfm_add_16x8_dct_identity_2_8bpc_rvv         623.3   589.7   -5.39%
      inv_txfm_add_16x8_flipadst_adst_0_8bpc_rvv        939.8   846.9   -9.89%
      inv_txfm_add_16x8_flipadst_adst_1_8bpc_rvv        939.8   847.0   -9.87%
      inv_txfm_add_16x8_flipadst_adst_2_8bpc_rvv        939.9   846.9   -9.89%
      inv_txfm_add_16x8_flipadst_dct_0_8bpc_rvv         860.8   784.9   -8.82%
      inv_txfm_add_16x8_flipadst_dct_1_8bpc_rvv         860.7   784.8   -8.82%
      inv_txfm_add_16x8_flipadst_dct_2_8bpc_rvv         860.8   784.9   -8.82%
      inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_rvv    942.7   852.2   -9.60%
      inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_rvv    942.7   852.1   -9.61%
      inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_rvv    942.8   852.1   -9.62%
      inv_txfm_add_16x8_flipadst_identity_0_8bpc_rvv    714.9   667.0   -6.70%
      inv_txfm_add_16x8_flipadst_identity_1_8bpc_rvv    715.0   666.9   -6.73%
      inv_txfm_add_16x8_flipadst_identity_2_8bpc_rvv    715.0   666.9   -6.73%
      inv_txfm_add_16x8_identity_adst_0_8bpc_rvv        707.9   667.2   -5.75%
      inv_txfm_add_16x8_identity_adst_1_8bpc_rvv        707.9   667.3   -5.74%
      inv_txfm_add_16x8_identity_adst_2_8bpc_rvv        707.9   667.2   -5.75%
      inv_txfm_add_16x8_identity_dct_0_8bpc_rvv         630.6   604.8   -4.09%
      inv_txfm_add_16x8_identity_dct_1_8bpc_rvv         630.7   604.9   -4.09%
      inv_txfm_add_16x8_identity_dct_2_8bpc_rvv         630.6   604.8   -4.09%
      inv_txfm_add_16x8_identity_flipadst_0_8bpc_rvv    711.7   671.1   -5.70%
      inv_txfm_add_16x8_identity_flipadst_1_8bpc_rvv    711.9   671.1   -5.73%
      inv_txfm_add_16x8_identity_flipadst_2_8bpc_rvv    711.8   671.2   -5.70%
      inv_txfm_add_16x8_identity_identity_0_8bpc_rvv    485.2   486.2    0.21%
      inv_txfm_add_16x8_identity_identity_1_8bpc_rvv    485.2   486.3    0.23%
      inv_txfm_add_16x8_identity_identity_2_8bpc_rvv    485.2   486.3    0.23%
      789a1f65
  3. Oct 14, 2024
  4. Oct 13, 2024
  5. Oct 12, 2024
  6. Oct 09, 2024
    • Bogdan Gligorijević's avatar
      riscv64/mc: warp_8x8 and warp_8x8t 8bpc · b2e7f06c
      Bogdan Gligorijević authored
      Benchmarks:
      - Kendryte K230:
      warp_8x8_8bpc_c:      4549.7 ( 1.00x)
      warp_8x8_8bpc_rvv:    2504.7 ( 1.82x)
      warp_8x8t_8bpc_c:     4414.7 ( 1.00x)
      warp_8x8t_8bpc_rvv:   2465.7 ( 1.79x)
      
      - Banana Pi BPI-F3:
      warp_8x8_8bpc_c:      4431.2 ( 1.00x)
      warp_8x8_8bpc_rvv:    3297.4 ( 1.34x)
      warp_8x8t_8bpc_c:     4299.3 ( 1.00x)
      warp_8x8t_8bpc_rvv:   3255.7 ( 1.32x)
      b2e7f06c
    • Niklas Haas's avatar
      riscv64/mc: Re-order instructions · 56f6d166
      Niklas Haas authored
      To avoid read-after-write. Speedup is about 1% for width=4 on a K230.
      56f6d166
    • Niklas Haas's avatar
      riscv64/mc: Add bidir functions · 3d12677c
      Niklas Haas authored
      This code compromises between the performance of a dedicated kernel per
      VLEN/width pair, and the flexibility of a fully VLEN-dynamic loop, by
      using a single special case for w=4, and subdividing the rest into the
      unrolled four line fast path, and the general-purpose slow path (for
      large width on small VLEN).
      
      Kendryte K230
      
      avg_w4_8bpc_c:          346.8 ( 1.00x)
      avg_w4_8bpc_rvv:         50.3 ( 6.90x)
      avg_w8_8bpc_c:         1054.9 ( 1.00x)
      avg_w8_8bpc_rvv:        139.1 ( 7.58x)
      avg_w16_8bpc_c:        3396.3 ( 1.00x)
      avg_w16_8bpc_rvv:       350.6 ( 9.69x)
      avg_w32_8bpc_c:       13734.3 ( 1.00x)
      avg_w32_8bpc_rvv:      1226.3 (11.20x)
      avg_w64_8bpc_c:       33260.9 ( 1.00x)
      avg_w64_8bpc_rvv:      3869.4 ( 8.60x)
      avg_w128_8bpc_c:      83441.3 ( 1.00x)
      avg_w128_8bpc_rvv:     9765.1 ( 8.54x)
      
      w_avg_w4_8bpc_c:        444.3 ( 1.00x)
      w_avg_w4_8bpc_rvv:       75.8 ( 5.86x)
      w_avg_w8_8bpc_c:       1365.6 ( 1.00x)
      w_avg_w8_8bpc_rvv:      208.8 ( 6.54x)
      w_avg_w16_8bpc_c:      4420.8 ( 1.00x)
      w_avg_w16_8bpc_rvv:     570.7 ( 7.75x)
      w_avg_w32_8bpc_c:     18010.9 ( 1.00x)
      w_avg_w32_8bpc_rvv:    2074.4 ( 8.68x)
      w_avg_w64_8bpc_c:     43050.4 ( 1.00x)
      w_avg_w64_8bpc_rvv:    5799.5 ( 7.42x)
      w_avg_w128_8bpc_c:   107153.6 ( 1.00x)
      w_avg_w128_8bpc_rvv:  14272.0 ( 7.51x)
      
      mask_w4_8bpc_c:        497.6 ( 1.00x)
      mask_w4_8bpc_rvv:       88.5 ( 5.63x)
      mask_w8_8bpc_c:       1528.5 ( 1.00x)
      mask_w8_8bpc_rvv:      253.1 ( 6.04x)
      mask_w16_8bpc_c:      4953.8 ( 1.00x)
      mask_w16_8bpc_rvv:     679.0 ( 7.30x)
      mask_w32_8bpc_c:     20298.3 ( 1.00x)
      mask_w32_8bpc_rvv:    3012.9 ( 6.74x)
      mask_w64_8bpc_c:     49718.8 ( 1.00x)
      mask_w64_8bpc_rvv:    7291.7 ( 6.82x)
      mask_w128_8bpc_c:   126740.3 ( 1.00x)
      mask_w128_8bpc_rvv:  18351.1 ( 6.91x)
      3d12677c
    • Niklas Haas's avatar
      riscv: Add $vtype helper definitions · 50ac8260
      Niklas Haas authored
      50ac8260
    • Nathan E. Egge's avatar
      riscv64/mc: Branchless vsetvl in blend_v function · cc7d8773
      Nathan E. Egge authored
      Kendryte K230
      
      blend_v_w2_8bpc_c:       221.4 ( 1.00x)
      blend_v_w2_8bpc_rvv:     147.7 ( 1.50x)
      blend_v_w4_8bpc_c:       945.3 ( 1.00x)
      blend_v_w4_8bpc_rvv:     243.3 ( 3.89x)
      blend_v_w8_8bpc_c:      1786.9 ( 1.00x)
      blend_v_w8_8bpc_rvv:     256.1 ( 6.98x)
      blend_v_w16_8bpc_c:     3472.1 ( 1.00x)
      blend_v_w16_8bpc_rvv:    351.1 ( 9.89x)
      blend_v_w32_8bpc_c:     6832.1 ( 1.00x)
      blend_v_w32_8bpc_rvv:    635.4 (10.75x)
      
      SpacemiT K1
      
      blend_v_w2_8bpc_c:       218.0 ( 1.00x)
      blend_v_w2_8bpc_rvv:     144.3 ( 1.51x)
      blend_v_w4_8bpc_c:       921.7 ( 1.00x)
      blend_v_w4_8bpc_rvv:     237.1 ( 3.89x)
      blend_v_w8_8bpc_c:      1739.8 ( 1.00x)
      blend_v_w8_8bpc_rvv:     237.4 ( 7.33x)
      blend_v_w16_8bpc_c:     3376.6 ( 1.00x)
      blend_v_w16_8bpc_rvv:    296.3 (11.40x)
      blend_v_w32_8bpc_c:     6647.2 ( 1.00x)
      blend_v_w32_8bpc_rvv:    408.1 (16.29x)
      cc7d8773
    • Nathan E. Egge's avatar
      riscv64/mc: Branchless vsetvl in blend_h function · 2da8107e
      Nathan E. Egge authored
      Kendryte K230
      
      blend_h_w2_8bpc_c:        165.9 ( 1.00x)
      blend_h_w2_8bpc_rvv:       83.8 ( 1.98x)
      blend_h_w4_8bpc_c:        295.2 ( 1.00x)
      blend_h_w4_8bpc_rvv:       83.8 ( 3.52x)
      blend_h_w8_8bpc_c:        557.9 ( 1.00x)
      blend_h_w8_8bpc_rvv:       92.5 ( 6.03x)
      blend_h_w16_8bpc_c:      1078.8 ( 1.00x)
      blend_h_w16_8bpc_rvv:     117.3 ( 9.19x)
      blend_h_w32_8bpc_c:      2117.8 ( 1.00x)
      blend_h_w32_8bpc_rvv:     200.5 (10.57x)
      blend_h_w64_8bpc_c:      4194.7 ( 1.00x)
      blend_h_w64_8bpc_rvv:     363.2 (11.55x)
      blend_h_w128_8bpc_c:    10271.4 ( 1.00x)
      blend_h_w128_8bpc_rvv:    844.5 (12.16x)
      
      SpacemiT K1
      
      blend_h_w2_8bpc_c:        162.5 ( 1.00x)
      blend_h_w2_8bpc_rvv:       83.9 ( 1.94x)
      blend_h_w4_8bpc_c:        288.6 ( 1.00x)
      blend_h_w4_8bpc_rvv:       83.7 ( 3.45x)
      blend_h_w8_8bpc_c:        544.7 ( 1.00x)
      blend_h_w8_8bpc_rvv:       84.0 ( 6.48x)
      blend_h_w16_8bpc_c:      1052.8 ( 1.00x)
      blend_h_w16_8bpc_rvv:     102.9 (10.23x)
      blend_h_w32_8bpc_c:      2068.0 ( 1.00x)
      blend_h_w32_8bpc_rvv:     131.4 (15.73x)
      blend_h_w64_8bpc_c:      4093.7 ( 1.00x)
      blend_h_w64_8bpc_rvv:     220.3 (18.58x)
      blend_h_w128_8bpc_c:    10023.1 ( 1.00x)
      blend_h_w128_8bpc_rvv:    467.3 (21.45x)
      2da8107e
    • Nathan E. Egge's avatar
      riscv64/mc: Branchless vsetvl in blend function · b374b24c
      Nathan E. Egge authored
      Kendryte K230
      
      blend_w4_8bpc_c:       204.8 ( 1.00x)
      blend_w4_8bpc_rvv:      59.8 ( 3.42x)
      blend_w8_8bpc_c:       608.9 ( 1.00x)
      blend_w8_8bpc_rvv:      87.2 ( 6.98x)
      blend_w16_8bpc_c:     2362.4 ( 1.00x)
      blend_w16_8bpc_rvv:    225.2 (10.49x)
      blend_w32_8bpc_c:     5990.4 ( 1.00x)
      blend_w32_8bpc_rvv:    518.3 (11.56x)
      
      SpacemiT K1
      
      blend_w4_8bpc_c:       201.6 ( 1.00x)
      blend_w4_8bpc_rvv:      58.0 ( 3.48x)
      blend_w8_8bpc_c:       595.1 ( 1.00x)
      blend_w8_8bpc_rvv:      82.1 ( 7.25x)
      blend_w16_8bpc_c:     2308.8 ( 1.00x)
      blend_w16_8bpc_rvv:    189.0 (12.22x)
      blend_w32_8bpc_c:     5853.1 ( 1.00x)
      blend_w32_8bpc_rvv:    339.5 (17.24x)
      b374b24c
    • Nathan E. Egge's avatar
      riscv64/mc: Add VLEN=256 8bpc RVV blend_v function · 0e3f70e8
      Nathan E. Egge authored
      SpacemiT K1
      
      blend_v_w2_8bpc_c:       217.0 ( 1.00x)
      blend_v_w2_8bpc_rvv:     143.3 ( 1.51x)
      blend_v_w4_8bpc_c:       921.6 ( 1.00x)
      blend_v_w4_8bpc_rvv:     236.3 ( 3.90x)
      blend_v_w8_8bpc_c:      1738.2 ( 1.00x)
      blend_v_w8_8bpc_rvv:     238.1 ( 7.30x)
      blend_v_w16_8bpc_c:     3376.1 ( 1.00x)
      blend_v_w16_8bpc_rvv:    298.0 (11.33x)
      blend_v_w32_8bpc_c:     6648.0 ( 1.00x)
      blend_v_w32_8bpc_rvv:    409.5 (16.24x)
      0e3f70e8
    • Nathan E. Egge's avatar
      riscv64/mc: Add VLEN=256 8bpc RVV blend_h function · a5b95448
      Nathan E. Egge authored
      SpacemiT K1
      
      blend_h_w2_8bpc_c:        161.8 ( 1.00x)
      blend_h_w2_8bpc_rvv:       83.5 ( 1.94x)
      blend_h_w4_8bpc_c:        288.4 ( 1.00x)
      blend_h_w4_8bpc_rvv:       83.7 ( 3.45x)
      blend_h_w8_8bpc_c:        543.9 ( 1.00x)
      blend_h_w8_8bpc_rvv:       84.5 ( 6.44x)
      blend_h_w16_8bpc_c:      1051.6 ( 1.00x)
      blend_h_w16_8bpc_rvv:     103.8 (10.13x)
      blend_h_w32_8bpc_c:      2066.0 ( 1.00x)
      blend_h_w32_8bpc_rvv:     133.8 (15.44x)
      blend_h_w64_8bpc_c:      4092.7 ( 1.00x)
      blend_h_w64_8bpc_rvv:     225.2 (18.18x)
      blend_h_w128_8bpc_c:    10011.3 ( 1.00x)
      blend_h_w128_8bpc_rvv:    474.7 (21.09x)
      a5b95448
    • Nathan E. Egge's avatar
      riscv64/mc: Add VLEN=256 8bpc RVV blend function · 83485c50
      Nathan E. Egge authored
      SpacemiT K1
      
      blend_w4_8bpc_c:       201.3 ( 1.00x)
      blend_w4_8bpc_rvv:      59.3 ( 3.40x)
      blend_w8_8bpc_c:       595.1 ( 1.00x)
      blend_w8_8bpc_rvv:      84.1 ( 7.07x)
      blend_w16_8bpc_c:     2309.0 ( 1.00x)
      blend_w16_8bpc_rvv:    190.5 (12.12x)
      blend_w32_8bpc_c:     5854.7 ( 1.00x)
      blend_w32_8bpc_rvv:    341.6 (17.14x)
      83485c50
    • Nathan E. Egge's avatar
      7f2bb2fb
    • Nathan E. Egge's avatar
      riscv64/mc: Add 8bpc RVV blend_v function · 01da36eb
      Nathan E. Egge authored
      Kendryte K230
      
      blend_v_w2_8bpc_c:       219.6 ( 1.00x)
      blend_v_w2_8bpc_rvv:     141.8 ( 1.55x)
      blend_v_w4_8bpc_c:       942.9 ( 1.00x)
      blend_v_w4_8bpc_rvv:     240.9 ( 3.91x)
      blend_v_w8_8bpc_c:      1783.5 ( 1.00x)
      blend_v_w8_8bpc_rvv:     254.7 ( 7.00x)
      blend_v_w16_8bpc_c:     3466.5 ( 1.00x)
      blend_v_w16_8bpc_rvv:    350.5 ( 9.89x)
      blend_v_w32_8bpc_c:     6825.2 ( 1.00x)
      blend_v_w32_8bpc_rvv:    635.1 (10.75x)
      01da36eb
    • Nathan E. Egge's avatar
      riscv64/mc: Add 8bpc RVV blend_h function · d3a94f11
      Nathan E. Egge authored
      Kendryte K230
      
      blend_h_w2_8bpc_c:        165.4 ( 1.00x)
      blend_h_w2_8bpc_rvv:       79.4 ( 2.08x)
      blend_h_w4_8bpc_c:        294.6 ( 1.00x)
      blend_h_w4_8bpc_rvv:       81.5 ( 3.61x)
      blend_h_w8_8bpc_c:        556.9 ( 1.00x)
      blend_h_w8_8bpc_rvv:       90.2 ( 6.17x)
      blend_h_w16_8bpc_c:      1077.6 ( 1.00x)
      blend_h_w16_8bpc_rvv:     116.1 ( 9.29x)
      blend_h_w32_8bpc_c:      2116.2 ( 1.00x)
      blend_h_w32_8bpc_rvv:     200.5 (10.55x)
      blend_h_w64_8bpc_c:      4191.8 ( 1.00x)
      blend_h_w64_8bpc_rvv:     363.3 (11.54x)
      blend_h_w128_8bpc_c:    10264.6 ( 1.00x)
      blend_h_w128_8bpc_rvv:    844.1 (12.16x)
      d3a94f11
    • Nathan E. Egge's avatar
      riscv64/mc: Add 8bpc RVV blend function · f851fcd0
      Nathan E. Egge authored
      Kendryte K230
      
      blend_w4_8bpc_c:       204.5 ( 1.00x)
      blend_w4_8bpc_rvv:      56.4 ( 3.62x)
      blend_w8_8bpc_c:       608.6 ( 1.00x)
      blend_w8_8bpc_rvv:      87.3 ( 6.97x)
      blend_w16_8bpc_c:     2363.8 ( 1.00x)
      blend_w16_8bpc_rvv:    225.1 (10.50x)
      blend_w32_8bpc_c:     5990.3 ( 1.00x)
      blend_w32_8bpc_rvv:    518.8 (11.55x)
      f851fcd0
    • Bogdan Gligorijević's avatar
      Tone down loop to only 2 iterations · 848c5a2d
      Bogdan Gligorijević authored
      Benchmark pending
      848c5a2d
    • Bogdan Gligorijević's avatar
      Scalar dc calculation · a0a08d85
      Bogdan Gligorijević authored
      Current benchmark:
      
      - Kendryte K230:
      inv_txfm_add_16x16_dct_dct_0_8bpc_c:     1729.4 ( 1.00x)
      inv_txfm_add_16x16_dct_dct_0_8bpc_rvv:    153.2 (11.29x)
      
      - spacemiT K1:
      inv_txfm_add_16x16_dct_dct_0_8bpc_c:     1533.4 ( 1.00x)
      inv_txfm_add_16x16_dct_dct_0_8bpc_rvv:    176.8 ( 8.67x)
      a0a08d85
    • Bogdan Gligorijević's avatar
      riscv64/itx: Special case 16x16 8bpc dct_dct eob=0 · c8749f06
      Bogdan Gligorijević authored
      Performance comparison:
      
      - SpacemiT K1:                             Master branch:       itx_16x16:
        inv_txfm_add_16x16_dct_dct_0_8bpc_c:     1534.1 ( 1.00x)      1534.9 ( 1.00x)
        inv_txfm_add_16x16_dct_dct_0_8bpc_rvv:   1173.6 ( 1.31x)       173.1 ( 8.87x)
      
      - Kendryte K230:                           Master branch:       itx_16x16:
        inv_txfm_add_16x16_dct_dct_0_8bpc_c:     1576.0 ( 1.00x)      1579.1 ( 1.00x)
        inv_txfm_add_16x16_dct_dct_0_8bpc_rvv:   1095.5 ( 1.44x)       146.8 (10.75x)
      c8749f06
    • Bogdan Gligorijević's avatar
      ipred_paeth · 0cdf1b4b
      Bogdan Gligorijević authored
      Benchmarks:
      - Kendryte K230:
      intra_pred_paeth_w4_8bpc_c:       412.9 ( 1.00x)
      intra_pred_paeth_w4_8bpc_rvv:     688.0 ( 0.60x)
      intra_pred_paeth_w8_8bpc_c:      1206.6 ( 1.00x)
      intra_pred_paeth_w8_8bpc_rvv:    1094.3 ( 1.10x)
      intra_pred_paeth_w16_8bpc_c:     3889.7 ( 1.00x)
      intra_pred_paeth_w16_8bpc_rvv:   1796.7 ( 2.16x)
      intra_pred_paeth_w32_8bpc_c:     9797.2 ( 1.00x)
      intra_pred_paeth_w32_8bpc_rvv:   4323.9 ( 2.27x)
      intra_pred_paeth_w64_8bpc_c:    24242.5 ( 1.00x)
      intra_pred_paeth_w64_8bpc_rvv:  10739.8 ( 2.26x)
      
      - Banana Pi BPI-F3
      intra_pred_paeth_w4_8bpc_c:       395.1 ( 1.00x)
      intra_pred_paeth_w4_8bpc_rvv:     705.4 ( 0.56x)
      intra_pred_paeth_w8_8bpc_c:      1184.9 ( 1.00x)
      intra_pred_paeth_w8_8bpc_rvv:    1125.3 ( 1.05x)
      intra_pred_paeth_w16_8bpc_c:     3807.8 ( 1.00x)
      intra_pred_paeth_w16_8bpc_rvv:   1850.8 ( 2.06x)
      intra_pred_paeth_w32_8bpc_c:     9985.1 ( 1.00x)
      intra_pred_paeth_w32_8bpc_rvv:   2235.5 ( 4.47x)
      intra_pred_paeth_w64_8bpc_c:    24040.4 ( 1.00x)
      intra_pred_paeth_w64_8bpc_rvv:   5450.0 ( 4.41x)
      0cdf1b4b
    • Bogdan Gligorijević's avatar
      pal_pred · b830ac82
      Bogdan Gligorijević authored
      Benchmarks:
      
      - Kendryte K230:
      pal_pred_w4_8bpc_c:        115.6 ( 1.00x)
      pal_pred_w4_8bpc_rvv:      331.4 ( 0.35x)
      pal_pred_w4_16bpc_c:       140.8 ( 1.00x)
      pal_pred_w4_16bpc_rvv:     247.9 ( 0.57x)
      pal_pred_w8_8bpc_c:        334.9 ( 1.00x)
      pal_pred_w8_8bpc_rvv:      520.8 ( 0.64x)
      pal_pred_w8_16bpc_c:       412.7 ( 1.00x)
      pal_pred_w8_16bpc_rvv:     386.2 ( 1.07x)
      pal_pred_w16_8bpc_c:      1044.4 ( 1.00x)
      pal_pred_w16_8bpc_rvv:     842.8 ( 1.24x)
      pal_pred_w16_16bpc_c:     1300.3 ( 1.00x)
      pal_pred_w16_16bpc_rvv:    619.9 ( 2.10x)
      pal_pred_w32_8bpc_c:      2452.8 ( 1.00x)
      pal_pred_w32_8bpc_rvv:    1016.1 ( 2.41x)
      pal_pred_w32_16bpc_c:     3072.1 ( 1.00x)
      pal_pred_w32_16bpc_rvv:   1440.5 ( 2.13x)
      pal_pred_w64_8bpc_c:      6015.8 ( 1.00x)
      pal_pred_w64_8bpc_rvv:    2505.5 ( 2.40x)
      pal_pred_w64_16bpc_c:     7552.4 ( 1.00x)
      pal_pred_w64_16bpc_rvv:   3512.7 ( 2.15x)
      
      - Banana Pi BPI-F3:
      pal_pred_w4_8bpc_c:        102.2 ( 1.00x)
      pal_pred_w4_8bpc_rvv:      511.2 ( 0.20x)
      pal_pred_w4_16bpc_c:       137.7 ( 1.00x)
      pal_pred_w4_16bpc_rvv:     330.9 ( 0.42x)
      pal_pred_w8_8bpc_c:        289.2 ( 1.00x)
      pal_pred_w8_8bpc_rvv:      819.6 ( 0.35x)
      pal_pred_w8_16bpc_c:       402.6 ( 1.00x)
      pal_pred_w8_16bpc_rvv:     520.7 ( 0.77x)
      pal_pred_w16_8bpc_c:       894.5 ( 1.00x)
      pal_pred_w16_8bpc_rvv:    1326.6 ( 0.67x)
      pal_pred_w16_16bpc_c:     1268.6 ( 1.00x)
      pal_pred_w16_16bpc_rvv:    845.8 ( 1.50x)
      pal_pred_w32_8bpc_c:      2094.5 ( 1.00x)
      pal_pred_w32_8bpc_rvv:    1610.9 ( 1.30x)
      pal_pred_w32_16bpc_c:     2999.4 ( 1.00x)
      pal_pred_w32_16bpc_rvv:   1029.8 ( 2.91x)
      pal_pred_w64_8bpc_c:      5128.0 ( 1.00x)
      pal_pred_w64_8bpc_rvv:    2000.8 ( 2.56x)
      pal_pred_w64_16bpc_c:     7375.0 ( 1.00x)
      pal_pred_w64_16bpc_rvv:   2518.2 ( 2.93x)
      b830ac82
    • Bogdan Gligorijević's avatar
      ipred_smooth · 44541dfa
      Bogdan Gligorijević authored
      Benchmarks:
      - Kendryte K230:
      intra_pred_smooth_w4_8bpc_c:        392.6 ( 1.00x)
      intra_pred_smooth_w4_8bpc_rvv:      311.3 ( 1.26x)
      intra_pred_smooth_w8_8bpc_c:       1204.1 ( 1.00x)
      intra_pred_smooth_w8_8bpc_rvv:      488.9 ( 2.46x)
      intra_pred_smooth_w16_8bpc_c:      3885.9 ( 1.00x)
      intra_pred_smooth_w16_8bpc_rvv:     796.6 ( 4.88x)
      intra_pred_smooth_w32_8bpc_c:      9305.7 ( 1.00x)
      intra_pred_smooth_w32_8bpc_rvv:    1806.7 ( 5.15x)
      intra_pred_smooth_w64_8bpc_c:     23043.0 ( 1.00x)
      intra_pred_smooth_w64_8bpc_rvv:    4344.3 ( 5.30x)
      
      - spacemiT K1:
      intra_pred_smooth_w4_8bpc_c:        384.1 ( 1.00x)
      intra_pred_smooth_w4_8bpc_rvv:      322.2 ( 1.19x)
      intra_pred_smooth_w8_8bpc_c:       1177.6 ( 1.00x)
      intra_pred_smooth_w8_8bpc_rvv:      507.1 ( 2.32x)
      intra_pred_smooth_w16_8bpc_c:      3801.2 ( 1.00x)
      intra_pred_smooth_w16_8bpc_rvv:     814.4 ( 4.67x)
      intra_pred_smooth_w32_8bpc_c:      9103.1 ( 1.00x)
      intra_pred_smooth_w32_8bpc_rvv:     980.8 ( 9.28x)
      intra_pred_smooth_w64_8bpc_c:     22540.1 ( 1.00x)
      intra_pred_smooth_w64_8bpc_rvv:    2319.3 ( 9.72x)
      44541dfa
    • Bogdan Gligorijević's avatar
      ipred cfl functions · d711f974
      Bogdan Gligorijević authored
      Benchmarks:
      
      - Kendryte K230:
      cfl_pred_cfl_128_w4_8bpc_c:         497.3 ( 1.00x)
      cfl_pred_cfl_128_w4_8bpc_rvv:       369.6 ( 1.35x)
      cfl_pred_cfl_128_w4_16bpc_c:        425.2 ( 1.00x)
      cfl_pred_cfl_128_w4_16bpc_rvv:      385.5 ( 1.10x)
      cfl_pred_cfl_128_w8_8bpc_c:        1544.2 ( 1.00x)
      cfl_pred_cfl_128_w8_8bpc_rvv:       584.2 ( 2.64x)
      cfl_pred_cfl_128_w8_16bpc_c:       1306.2 ( 1.00x)
      cfl_pred_cfl_128_w8_16bpc_rvv:      608.8 ( 2.15x)
      cfl_pred_cfl_128_w16_8bpc_c:       3085.6 ( 1.00x)
      cfl_pred_cfl_128_w16_8bpc_rvv:      584.2 ( 5.28x)
      cfl_pred_cfl_128_w16_16bpc_c:      2657.1 ( 1.00x)
      cfl_pred_cfl_128_w16_16bpc_rvv:     608.9 ( 4.36x)
      cfl_pred_cfl_128_w32_8bpc_c:       8405.6 ( 1.00x)
      cfl_pred_cfl_128_w32_8bpc_rvv:     1416.1 ( 5.94x)
      cfl_pred_cfl_128_w32_16bpc_c:      7199.9 ( 1.00x)
      cfl_pred_cfl_128_w32_16bpc_rvv:    1479.8 ( 4.87x)
      cfl_pred_cfl_left_w4_8bpc_c:        553.1 ( 1.00x)
      cfl_pred_cfl_left_w4_8bpc_rvv:      395.6 ( 1.40x)
      cfl_pred_cfl_left_w4_16bpc_c:       486.7 ( 1.00x)
      cfl_pred_cfl_left_w4_16bpc_rvv:     409.1 ( 1.19x)
      cfl_pred_cfl_left_w8_8bpc_c:       1610.8 ( 1.00x)
      cfl_pred_cfl_left_w8_8bpc_rvv:      610.4 ( 2.64x)
      cfl_pred_cfl_left_w8_16bpc_c:      1378.0 ( 1.00x)
      cfl_pred_cfl_left_w8_16bpc_rvv:     636.2 ( 2.17x)
      cfl_pred_cfl_left_w16_8bpc_c:      3154.4 ( 1.00x)
      cfl_pred_cfl_left_w16_8bpc_rvv:     610.4 ( 5.17x)
      cfl_pred_cfl_left_w16_16bpc_c:     2733.2 ( 1.00x)
      cfl_pred_cfl_left_w16_16bpc_rvv:    636.3 ( 4.30x)
      cfl_pred_cfl_left_w32_8bpc_c:      8451.7 ( 1.00x)
      cfl_pred_cfl_left_w32_8bpc_rvv:    1442.5 ( 5.86x)
      cfl_pred_cfl_left_w32_16bpc_c:     7267.2 ( 1.00x)
      cfl_pred_cfl_left_w32_16bpc_rvv:   1509.4 ( 4.81x)
      cfl_pred_cfl_top_w4_8bpc_c:         544.7 ( 1.00x)
      cfl_pred_cfl_top_w4_8bpc_rvv:       395.8 ( 1.38x)
      cfl_pred_cfl_top_w4_16bpc_c:        475.1 ( 1.00x)
      cfl_pred_cfl_top_w4_16bpc_rvv:      406.7 ( 1.17x)
      cfl_pred_cfl_top_w8_8bpc_c:        1599.3 ( 1.00x)
      cfl_pred_cfl_top_w8_8bpc_rvv:       610.4 ( 2.62x)
      cfl_pred_cfl_top_w8_16bpc_c:       1363.8 ( 1.00x)
      cfl_pred_cfl_top_w8_16bpc_rvv:      630.3 ( 2.16x)
      cfl_pred_cfl_top_w16_8bpc_c:       3161.0 ( 1.00x)
      cfl_pred_cfl_top_w16_8bpc_rvv:      610.5 ( 5.18x)
      cfl_pred_cfl_top_w16_16bpc_c:      2735.9 ( 1.00x)
      cfl_pred_cfl_top_w16_16bpc_rvv:     634.3 ( 4.31x)
      cfl_pred_cfl_top_w32_8bpc_c:       8564.4 ( 1.00x)
      cfl_pred_cfl_top_w32_8bpc_rvv:     1442.8 ( 5.94x)
      cfl_pred_cfl_top_w32_16bpc_c:      7294.9 ( 1.00x)
      cfl_pred_cfl_top_w32_16bpc_rvv:    1511.5 ( 4.83x)
      cfl_pred_cfl_w4_8bpc_c:             571.5 ( 1.00x)
      cfl_pred_cfl_w4_8bpc_rvv:           421.0 ( 1.36x)
      cfl_pred_cfl_w4_16bpc_c:            499.1 ( 1.00x)
      cfl_pred_cfl_w4_16bpc_rvv:          462.8 ( 1.08x)
      cfl_pred_cfl_w8_8bpc_c:            1642.0 ( 1.00x)
      cfl_pred_cfl_w8_8bpc_rvv:           635.8 ( 2.58x)
      cfl_pred_cfl_w8_16bpc_c:           1401.4 ( 1.00x)
      cfl_pred_cfl_w8_16bpc_rvv:          686.1 ( 2.04x)
      cfl_pred_cfl_w16_8bpc_c:           3204.3 ( 1.00x)
      cfl_pred_cfl_w16_8bpc_rvv:          635.8 ( 5.04x)
      cfl_pred_cfl_w16_16bpc_c:          2784.8 ( 1.00x)
      cfl_pred_cfl_w16_16bpc_rvv:         686.1 ( 4.06x)
      cfl_pred_cfl_w32_8bpc_c:           8623.9 ( 1.00x)
      cfl_pred_cfl_w32_8bpc_rvv:         1465.9 ( 5.88x)
      cfl_pred_cfl_w32_16bpc_c:          7357.8 ( 1.00x)
      cfl_pred_cfl_w32_16bpc_rvv:        1556.3 ( 4.73x)
      
      - Banana Pi BPI-F3:
      cfl_pred_cfl_128_w4_8bpc_c:         485.5 ( 1.00x)
      cfl_pred_cfl_128_w4_8bpc_rvv:       366.4 ( 1.33x)
      cfl_pred_cfl_128_w4_16bpc_c:        393.5 ( 1.00x)
      cfl_pred_cfl_128_w4_16bpc_rvv:      378.7 ( 1.04x)
      cfl_pred_cfl_128_w8_8bpc_c:        1507.9 ( 1.00x)
      cfl_pred_cfl_128_w8_8bpc_rvv:       577.4 ( 2.61x)
      cfl_pred_cfl_128_w8_16bpc_c:       1205.7 ( 1.00x)
      cfl_pred_cfl_128_w8_16bpc_rvv:      605.1 ( 1.99x)
      cfl_pred_cfl_128_w16_8bpc_c:       3019.3 ( 1.00x)
      cfl_pred_cfl_128_w16_8bpc_rvv:      577.4 ( 5.23x)
      cfl_pred_cfl_128_w16_16bpc_c:      2506.5 ( 1.00x)
      cfl_pred_cfl_128_w16_16bpc_rvv:     605.1 ( 4.14x)
      cfl_pred_cfl_128_w32_8bpc_c:       8170.0 ( 1.00x)
      cfl_pred_cfl_128_w32_8bpc_rvv:      715.6 (11.42x)
      cfl_pred_cfl_128_w32_16bpc_c:      6686.7 ( 1.00x)
      cfl_pred_cfl_128_w32_16bpc_rvv:     749.7 ( 8.92x)
      cfl_pred_cfl_left_w4_8bpc_c:        539.4 ( 1.00x)
      cfl_pred_cfl_left_w4_8bpc_rvv:      393.2 ( 1.37x)
      cfl_pred_cfl_left_w4_16bpc_c:       452.0 ( 1.00x)
      cfl_pred_cfl_left_w4_16bpc_rvv:     401.2 ( 1.13x)
      cfl_pred_cfl_left_w8_8bpc_c:       1572.4 ( 1.00x)
      cfl_pred_cfl_left_w8_8bpc_rvv:      604.1 ( 2.60x)
      cfl_pred_cfl_left_w8_16bpc_c:      1274.5 ( 1.00x)
      cfl_pred_cfl_left_w8_16bpc_rvv:     629.0 ( 2.03x)
      cfl_pred_cfl_left_w16_8bpc_c:      3096.0 ( 1.00x)
      cfl_pred_cfl_left_w16_8bpc_rvv:     604.1 ( 5.13x)
      cfl_pred_cfl_left_w16_16bpc_c:     2591.4 ( 1.00x)
      cfl_pred_cfl_left_w16_16bpc_rvv:    629.0 ( 4.12x)
      cfl_pred_cfl_left_w32_8bpc_c:      8266.0 ( 1.00x)
      cfl_pred_cfl_left_w32_8bpc_rvv:     742.4 (11.13x)
      cfl_pred_cfl_left_w32_16bpc_c:     6758.0 ( 1.00x)
      cfl_pred_cfl_left_w32_16bpc_rvv:    773.9 ( 8.73x)
      cfl_pred_cfl_top_w4_8bpc_c:         532.3 ( 1.00x)
      cfl_pred_cfl_top_w4_8bpc_rvv:       392.6 ( 1.36x)
      cfl_pred_cfl_top_w4_16bpc_c:        440.4 ( 1.00x)
      cfl_pred_cfl_top_w4_16bpc_rvv:      399.6 ( 1.10x)
      cfl_pred_cfl_top_w8_8bpc_c:        1563.3 ( 1.00x)
      cfl_pred_cfl_top_w8_8bpc_rvv:       603.6 ( 2.59x)
      cfl_pred_cfl_top_w8_16bpc_c:       1271.6 ( 1.00x)
      cfl_pred_cfl_top_w8_16bpc_rvv:      626.1 ( 2.03x)
      cfl_pred_cfl_top_w16_8bpc_c:       3098.6 ( 1.00x)
      cfl_pred_cfl_top_w16_8bpc_rvv:      603.6 ( 5.13x)
      cfl_pred_cfl_top_w16_16bpc_c:      2562.8 ( 1.00x)
      cfl_pred_cfl_top_w16_16bpc_rvv:     626.0 ( 4.09x)
      cfl_pred_cfl_top_w32_8bpc_c:       8278.1 ( 1.00x)
      cfl_pred_cfl_top_w32_8bpc_rvv:      741.8 (11.16x)
      cfl_pred_cfl_top_w32_16bpc_c:      6799.1 ( 1.00x)
      cfl_pred_cfl_top_w32_16bpc_rvv:     775.0 ( 8.77x)
      cfl_pred_cfl_w4_8bpc_c:             559.8 ( 1.00x)
      cfl_pred_cfl_w4_8bpc_rvv:           421.7 ( 1.33x)
      cfl_pred_cfl_w4_16bpc_c:            470.2 ( 1.00x)
      cfl_pred_cfl_w4_16bpc_rvv:          451.3 ( 1.04x)
      cfl_pred_cfl_w8_8bpc_c:            1605.5 ( 1.00x)
      cfl_pred_cfl_w8_8bpc_rvv:           632.8 ( 2.54x)
      cfl_pred_cfl_w8_16bpc_c:           1308.5 ( 1.00x)
      cfl_pred_cfl_w8_16bpc_rvv:          677.9 ( 1.93x)
      cfl_pred_cfl_w16_8bpc_c:           3135.0 ( 1.00x)
      cfl_pred_cfl_w16_8bpc_rvv:          632.9 ( 4.95x)
      cfl_pred_cfl_w16_16bpc_c:          2625.9 ( 1.00x)
      cfl_pred_cfl_w16_16bpc_rvv:         677.9 ( 3.87x)
      cfl_pred_cfl_w32_8bpc_c:           8376.6 ( 1.00x)
      cfl_pred_cfl_w32_8bpc_rvv:          770.4 (10.87x)
      cfl_pred_cfl_w32_16bpc_c:          6866.4 ( 1.00x)
      cfl_pred_cfl_w32_16bpc_rvv:         822.7 ( 8.35x)
      d711f974
    • Bogdan Gligorijević's avatar
      riscv64/cdef: filter functions · 2f5bfc37
      Bogdan Gligorijević authored
      Benchmarks:
      - Kendryte K230:
      cdef_filter_4x4_01_8bpc_c:       1339.4 ( 1.00x)
      cdef_filter_4x4_01_8bpc_rvv:      836.2 ( 1.60x)
      cdef_filter_4x4_01_16bpc_c:      1369.1 ( 1.00x)
      cdef_filter_4x4_01_16bpc_rvv:     824.7 ( 1.66x)
      cdef_filter_4x4_10_8bpc_c:        872.8 ( 1.00x)
      cdef_filter_4x4_10_8bpc_rvv:      523.9 ( 1.67x)
      cdef_filter_4x4_10_16bpc_c:       938.2 ( 1.00x)
      cdef_filter_4x4_10_16bpc_rvv:     517.1 ( 1.81x)
      cdef_filter_4x4_11_8bpc_c:       2668.3 ( 1.00x)
      cdef_filter_4x4_11_8bpc_rvv:     1285.0 ( 2.08x)
      cdef_filter_4x4_11_16bpc_c:      2922.1 ( 1.00x)
      cdef_filter_4x4_11_16bpc_rvv:    1291.0 ( 2.26x)
      cdef_filter_4x8_01_8bpc_c:       2489.1 ( 1.00x)
      cdef_filter_4x8_01_8bpc_rvv:     1594.3 ( 1.56x)
      cdef_filter_4x8_01_16bpc_c:      2528.1 ( 1.00x)
      cdef_filter_4x8_01_16bpc_rvv:    1566.6 ( 1.61x)
      cdef_filter_4x8_10_8bpc_c:       1576.9 ( 1.00x)
      cdef_filter_4x8_10_8bpc_rvv:      967.1 ( 1.63x)
      cdef_filter_4x8_10_16bpc_c:      1641.3 ( 1.00x)
      cdef_filter_4x8_10_16bpc_rvv:     947.1 ( 1.73x)
      cdef_filter_4x8_11_8bpc_c:       5164.0 ( 1.00x)
      cdef_filter_4x8_11_8bpc_rvv:     2490.7 ( 2.07x)
      cdef_filter_4x8_11_16bpc_c:      5732.3 ( 1.00x)
      cdef_filter_4x8_11_16bpc_rvv:    2499.2 ( 2.29x)
      cdef_filter_8x8_01_8bpc_c:       4742.3 ( 1.00x)
      cdef_filter_8x8_01_8bpc_rvv:     1628.6 ( 2.91x)
      cdef_filter_8x8_01_16bpc_c:      4785.0 ( 1.00x)
      cdef_filter_8x8_01_16bpc_rvv:    1595.5 ( 3.00x)
      cdef_filter_8x8_10_8bpc_c:       2962.4 ( 1.00x)
      cdef_filter_8x8_10_8bpc_rvv:     1000.8 ( 2.96x)
      cdef_filter_8x8_10_16bpc_c:      3022.4 ( 1.00x)
      cdef_filter_8x8_10_16bpc_rvv:     975.7 ( 3.10x)
      cdef_filter_8x8_11_8bpc_c:      12623.9 ( 1.00x)
      cdef_filter_8x8_11_8bpc_rvv:     2525.4 ( 5.00x)
      cdef_filter_8x8_11_16bpc_c:     12470.7 ( 1.00x)
      cdef_filter_8x8_11_16bpc_rvv:    2528.2 ( 4.93x)
      
      - Banana Pi BPI-F3:
      cdef_filter_4x4_01_8bpc_c:       1281.2 ( 1.00x)
      cdef_filter_4x4_01_8bpc_rvv:      813.0 ( 1.58x)
      cdef_filter_4x4_01_16bpc_c:      1300.8 ( 1.00x)
      cdef_filter_4x4_01_16bpc_rvv:     808.9 ( 1.61x)
      cdef_filter_4x4_10_8bpc_c:        843.0 ( 1.00x)
      cdef_filter_4x4_10_8bpc_rvv:      498.4 ( 1.69x)
      cdef_filter_4x4_10_16bpc_c:       903.6 ( 1.00x)
      cdef_filter_4x4_10_16bpc_rvv:     497.9 ( 1.81x)
      cdef_filter_4x4_11_8bpc_c:       2614.1 ( 1.00x)
      cdef_filter_4x4_11_8bpc_rvv:     1219.6 ( 2.14x)
      cdef_filter_4x4_11_16bpc_c:      2795.6 ( 1.00x)
      cdef_filter_4x4_11_16bpc_rvv:    1243.1 ( 2.25x)
      cdef_filter_4x8_01_8bpc_c:       2405.4 ( 1.00x)
      cdef_filter_4x8_01_8bpc_rvv:     1548.5 ( 1.55x)
      cdef_filter_4x8_01_16bpc_c:      2402.7 ( 1.00x)
      cdef_filter_4x8_01_16bpc_rvv:    1542.7 ( 1.56x)
      cdef_filter_4x8_10_8bpc_c:       1522.0 ( 1.00x)
      cdef_filter_4x8_10_8bpc_rvv:      917.4 ( 1.66x)
      cdef_filter_4x8_10_16bpc_c:      1589.2 ( 1.00x)
      cdef_filter_4x8_10_16bpc_rvv:     915.9 ( 1.74x)
      cdef_filter_4x8_11_8bpc_c:       5050.7 ( 1.00x)
      cdef_filter_4x8_11_8bpc_rvv:     2358.7 ( 2.14x)
      cdef_filter_4x8_11_16bpc_c:      5510.5 ( 1.00x)
      cdef_filter_4x8_11_16bpc_rvv:    2411.6 ( 2.28x)
      cdef_filter_8x8_01_8bpc_c:       4558.3 ( 1.00x)
      cdef_filter_8x8_01_8bpc_rvv:     1579.7 ( 2.89x)
      cdef_filter_8x8_01_16bpc_c:      4551.1 ( 1.00x)
      cdef_filter_8x8_01_16bpc_rvv:    1571.1 ( 2.90x)
      cdef_filter_8x8_10_8bpc_c:       2869.3 ( 1.00x)
      cdef_filter_8x8_10_8bpc_rvv:      948.4 ( 3.03x)
      cdef_filter_8x8_10_16bpc_c:      2928.6 ( 1.00x)
      cdef_filter_8x8_10_16bpc_rvv:     944.2 ( 3.10x)
      cdef_filter_8x8_11_8bpc_c:      12317.5 ( 1.00x)
      cdef_filter_8x8_11_8bpc_rvv:     2389.7 ( 5.15x)
      cdef_filter_8x8_11_16bpc_c:     11950.6 ( 1.00x)
      cdef_filter_8x8_11_16bpc_rvv:    2440.1 ( 4.90x)
      2f5bfc37
    • Bogdan Gligorijević's avatar
      pal_idx_finish · f223436b
      Bogdan Gligorijević authored
      Benchmarks:
      
      - Kendryte K230:
      pal_idx_finish_w4_c:       122.5 ( 1.00x)
      pal_idx_finish_w4_rvv:     107.2 ( 1.14x)
      pal_idx_finish_w8_c:       302.8 ( 1.00x)
      pal_idx_finish_w8_rvv:     197.9 ( 1.53x)
      pal_idx_finish_w16_c:      868.2 ( 1.00x)
      pal_idx_finish_w16_rvv:    438.5 ( 1.98x)
      pal_idx_finish_w32_c:     1966.5 ( 1.00x)
      pal_idx_finish_w32_rvv:    833.0 ( 2.36x)
      pal_idx_finish_w64_c:     4737.5 ( 1.00x)
      pal_idx_finish_w64_rvv:   1818.3 ( 2.61x)
      
      - Banana Pi BPI-F3:
      pal_idx_finish_w4_c:       122.4 ( 1.00x)
      pal_idx_finish_w4_rvv:     132.0 ( 0.93x)
      pal_idx_finish_w8_c:       289.4 ( 1.00x)
      pal_idx_finish_w8_rvv:     195.8 ( 1.48x)
      pal_idx_finish_w16_c:      788.0 ( 1.00x)
      pal_idx_finish_w16_rvv:    430.6 ( 1.83x)
      pal_idx_finish_w32_c:     1699.2 ( 1.00x)
      pal_idx_finish_w32_rvv:    816.3 ( 2.08x)
      pal_idx_finish_w64_c:     3977.7 ( 1.00x)
      pal_idx_finish_w64_rvv:   1779.4 ( 2.24x)
      f223436b
    • Nathan E. Egge's avatar
      38f74bdc
  7. Oct 07, 2024
    • Henrik Gramner's avatar
      x86: Make AVX2 SGR gatherless · 7072e79f
      Henrik Gramner authored
      Instead of using gathers we can calculate the value of
      sgr_x_by_x[min(z, 255)] by doing 256 / (z + 1) in floating-point
      with some clipping for z == 0 and z >= 255.
      
      As the required precision of the division is fairly small it can be
      performed using an approximate reciprocal, which is significantly
      faster than a regular division.
      
      Gather instructions are slow on all AMD CPU:s, and on most Intel
      CPU:s ever since µcode updates were issued as a workaround for
      the Gather Data Sampling side channel vulnerability.
      7072e79f
  8. Oct 02, 2024
  9. Sep 30, 2024
    • jinbo's avatar
      loongarch: minor improvement on decode_symbol_adapt · ed004fe9
      jinbo authored and Hecai Yuan's avatar Hecai Yuan committed
      Change-Id: I78fe788113ff2487ba1ce2e7d0c7d7c78c5a8c58
      ed004fe9
    • Hecai Yuan's avatar
      loongarch: rewrite optimization functions in loongarch/itx.S · 62a51df1
      Hecai Yuan authored and Hecai Yuan's avatar Hecai Yuan committed
      Change-Id: I1566e8145d36296f2c76107cf15fc2cc7ac0ecc7
      62a51df1
    • guxiwei's avatar
      LoongArch: Add save_tmvs_lsx · 757f294a
      guxiwei authored and Hecai Yuan's avatar Hecai Yuan committed
      The performance data is as follows:
      save_tmvs_c:        3938.6 ( 1.00x)
      save_tmvs_lsx:      1355.3 ( 2.91x)
      757f294a
    • jinbo's avatar
      loongarch: refactor loopfilter · 3d96175d
      jinbo authored and Hecai Yuan's avatar Hecai Yuan committed
      bench performance before:
      lpf_h_sb_y_w16_8bpc_c:      117.0 ( 1.00x)
      lpf_h_sb_y_w16_8bpc_lsx:     33.9 ( 3.46x)
      lpf_v_sb_y_w16_8bpc_c:      132.1 ( 1.00x)
      lpf_v_sb_y_w16_8bpc_lsx:     59.7 ( 2.21x)
      
      bench performance after:
      lpf_h_sb_y_w16_8bpc_c:      114.9 ( 1.00x)
      lpf_h_sb_y_w16_8bpc_lsx:     32.0 ( 3.59x)
      lpf_v_sb_y_w16_8bpc_c:      132.5 ( 1.00x)
      lpf_v_sb_y_w16_8bpc_lsx:     28.1 ( 4.72x)
      
      Change-Id: Ie64e164a9416c438f6b3881ce18fb42e2ddd073d
      3d96175d
    • Hecai Yuan's avatar
      loongarch: add lasx implementation of sgr_3x3 for 8 bpc · 70582027
      Hecai Yuan authored and Hecai Yuan's avatar Hecai Yuan committed
      sgr_3x3_8bpc_c:                                   27233.1 ( 1.00x)
      sgr_3x3_8bpc_lsx:                                 12874.7 ( 2.12x)
      sgr_3x3_8bpc_lasx:                                10183.7 ( 2.67x)
      
      Change-Id: I2aa469e8560733d6191396186bf776a12ad6e4a3
      70582027
    • Hecai Yuan's avatar
      loongarch: rewirte warp_8x8/8x8t_lsx for 8 bpc · 96d6e472
      Hecai Yuan authored and Hecai Yuan's avatar Hecai Yuan committed
      before:
      warp_8x8_8bpc_c:                                    109.8 ( 1.00x)
      warp_8x8_8bpc_lsx:                                   44.6 ( 2.46x)
      warp_8x8t_8bpc_c:                                    97.5 ( 1.00x)
      warp_8x8t_8bpc_lsx:                                  43.7 ( 2.23x)
      
      after:
      warp_8x8_8bpc_c:                                    109.8 ( 1.00x)
      warp_8x8_8bpc_lsx:                                   39.2 ( 2.80x)
      warp_8x8t_8bpc_c:                                    97.5 ( 1.00x)
      warp_8x8t_8bpc_lsx:                                  37.9 ( 2.57x)
      
      Change-Id: I11728c2c30821b8e2b1c85208710dfe5d1c1269c
      96d6e472
    • jinbo's avatar
      loongarch: Refine prep_8tap_8bpc_lasx · b9e9a0ef
      jinbo authored and Hecai Yuan's avatar Hecai Yuan committed
      mct_8tap_regular_w8_h_8bpc_c:                  47.1 ( 1.00x)
      mct_8tap_regular_w8_h_8bpc_lsx:                 6.3 ( 7.46x)
      mct_8tap_regular_w8_h_8bpc_lasx:                4.4 (10.80x)
      mct_8tap_regular_w8_hv_8bpc_c:                118.9 ( 1.00x)
      mct_8tap_regular_w8_hv_8bpc_lsx:               19.2 ( 6.20x)
      mct_8tap_regular_w8_hv_8bpc_lasx:              13.7 ( 8.69x)
      mct_8tap_regular_w8_v_8bpc_c:                  60.3 ( 1.00x)
      mct_8tap_regular_w8_v_8bpc_lsx:                 5.4 (11.08x)
      mct_8tap_regular_w8_v_8bpc_lasx:                3.3 (18.33x)
      
      Change-Id: I1140f6ffbd738166f2581bc9111ebbdf6f9fa72c
      b9e9a0ef
Loading