- Oct 17, 2024
-
-
Jean-Baptiste Kempf authored
-
- Oct 16, 2024
-
-
Nathan E. Egge authored
-
Nathan E. Egge authored
The vnclip instruction does a fixed-point saturating add then shift and can replace vwadd followed by vnsra in idct_4, idct_8, idct_16, iadst_8 and iadst_16. Including 572c5a66 (which applies the same change to iadst_4) these commits give the following average improvements across all modified 2D transform functions: Kendryte K230 SpacemiT K1 4x4 -5.50% -4.44% 8x8 -9.78% -7.62% 16x16 -9.70% -9.04% 4x8 -8.39% -7.54% 8x4 -8.10% -4.66% 4x16 -8.16% -7.74% 16x4 -8.07% -6.96% 8x16 -9.11% -7.43% 16x8 -9.87% -7.81% Kendryte K230 Old New Delta inv_txfm_add_4x4_adst_adst_0_8bpc_rvv 99.0 93.4 -5.66% inv_txfm_add_4x4_adst_adst_1_8bpc_rvv 99.0 93.4 -5.66% inv_txfm_add_4x4_adst_dct_0_8bpc_rvv 93.4 87.2 -6.64% inv_txfm_add_4x4_adst_dct_1_8bpc_rvv 93.5 87.2 -6.74% inv_txfm_add_4x4_adst_flipadst_0_8bpc_rvv 100.3 94.9 -5.38% inv_txfm_add_4x4_adst_flipadst_1_8bpc_rvv 100.3 94.9 -5.38% inv_txfm_add_4x4_adst_identity_0_8bpc_rvv 80.5 77.2 -4.10% inv_txfm_add_4x4_adst_identity_1_8bpc_rvv 80.5 77.2 -4.10% inv_txfm_add_4x4_dct_adst_0_8bpc_rvv 94.1 88.5 -5.95% inv_txfm_add_4x4_dct_adst_1_8bpc_rvv 94.1 88.5 -5.95% inv_txfm_add_4x4_dct_dct_0_8bpc_rvv 40.3 40.3 0.00% inv_txfm_add_4x4_dct_dct_1_8bpc_rvv 92.2 82.1 -10.95% inv_txfm_add_4x4_dct_flipadst_0_8bpc_rvv 95.3 89.9 -5.67% inv_txfm_add_4x4_dct_flipadst_1_8bpc_rvv 95.3 89.9 -5.67% inv_txfm_add_4x4_dct_identity_0_8bpc_rvv 75.5 73.3 -2.91% inv_txfm_add_4x4_dct_identity_1_8bpc_rvv 75.5 73.3 -2.91% inv_txfm_add_4x4_flipadst_adst_0_8bpc_rvv 100.3 94.7 -5.58% inv_txfm_add_4x4_flipadst_adst_1_8bpc_rvv 100.3 94.7 -5.58% inv_txfm_add_4x4_flipadst_dct_0_8bpc_rvv 94.8 88.4 -6.75% inv_txfm_add_4x4_flipadst_dct_1_8bpc_rvv 94.8 88.5 -6.65% inv_txfm_add_4x4_flipadst_flipadst_0_8bpc_rvv 105.0 96.0 -8.57% inv_txfm_add_4x4_flipadst_flipadst_1_8bpc_rvv 105.0 95.9 -8.67% inv_txfm_add_4x4_flipadst_identity_0_8bpc_rvv 81.6 78.5 -3.80% inv_txfm_add_4x4_flipadst_identity_1_8bpc_rvv 81.6 78.4 -3.92% inv_txfm_add_4x4_identity_adst_0_8bpc_rvv 80.3 77.8 -3.11% inv_txfm_add_4x4_identity_adst_1_8bpc_rvv 80.3 77.8 -3.11% inv_txfm_add_4x4_identity_dct_0_8bpc_rvv 77.2 71.7 -7.12% inv_txfm_add_4x4_identity_dct_1_8bpc_rvv 77.2 71.7 -7.12% inv_txfm_add_4x4_identity_flipadst_0_8bpc_rvv 81.5 79.2 -2.82% inv_txfm_add_4x4_identity_flipadst_1_8bpc_rvv 81.6 79.2 -2.94% inv_txfm_add_4x4_identity_identity_0_8bpc_rvv 62.8 61.6 -1.91% inv_txfm_add_4x4_identity_identity_1_8bpc_rvv 62.8 61.6 -1.91% inv_txfm_add_4x4_wht_wht_0_8bpc_rvv 67.8 67.8 0.00% inv_txfm_add_4x4_wht_wht_1_8bpc_rvv 67.8 67.8 0.00% inv_txfm_add_8x8_adst_adst_0_8bpc_rvv 403.1 356.1 -11.66% inv_txfm_add_8x8_adst_adst_1_8bpc_rvv 403.1 356.0 -11.68% inv_txfm_add_8x8_adst_dct_0_8bpc_rvv 360.2 323.2 -10.27% inv_txfm_add_8x8_adst_dct_1_8bpc_rvv 360.2 323.2 -10.27% inv_txfm_add_8x8_adst_flipadst_0_8bpc_rvv 405.2 358.4 -11.55% inv_txfm_add_8x8_adst_flipadst_1_8bpc_rvv 405.2 358.4 -11.55% inv_txfm_add_8x8_adst_identity_0_8bpc_rvv 284.3 261.0 -8.20% inv_txfm_add_8x8_adst_identity_1_8bpc_rvv 284.4 260.9 -8.26% inv_txfm_add_8x8_dct_adst_0_8bpc_rvv 360.2 322.0 -10.61% inv_txfm_add_8x8_dct_adst_1_8bpc_rvv 360.0 321.9 -10.58% inv_txfm_add_8x8_dct_dct_0_8bpc_rvv 76.6 77.0 0.52% inv_txfm_add_8x8_dct_dct_1_8bpc_rvv 317.2 289.0 -8.89% inv_txfm_add_8x8_dct_flipadst_0_8bpc_rvv 363.7 324.3 -10.83% inv_txfm_add_8x8_dct_flipadst_1_8bpc_rvv 363.8 324.3 -10.86% inv_txfm_add_8x8_dct_identity_0_8bpc_rvv 241.2 226.9 -5.93% inv_txfm_add_8x8_dct_identity_1_8bpc_rvv 241.3 227.0 -5.93% inv_txfm_add_8x8_flipadst_adst_0_8bpc_rvv 404.9 358.0 -11.58% inv_txfm_add_8x8_flipadst_adst_1_8bpc_rvv 405.0 358.1 -11.58% inv_txfm_add_8x8_flipadst_dct_0_8bpc_rvv 365.1 323.8 -11.31% inv_txfm_add_8x8_flipadst_dct_1_8bpc_rvv 365.2 323.9 -11.31% inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_rvv 407.2 359.6 -11.69% inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_rvv 406.4 359.5 -11.54% inv_txfm_add_8x8_flipadst_identity_0_8bpc_rvv 285.8 261.9 -8.36% inv_txfm_add_8x8_flipadst_identity_1_8bpc_rvv 285.9 261.8 -8.43% inv_txfm_add_8x8_identity_adst_0_8bpc_rvv 269.9 244.5 -9.41% inv_txfm_add_8x8_identity_adst_1_8bpc_rvv 269.8 244.5 -9.38% inv_txfm_add_8x8_identity_dct_0_8bpc_rvv 225.5 209.6 -7.05% inv_txfm_add_8x8_identity_dct_1_8bpc_rvv 225.6 209.5 -7.14% inv_txfm_add_8x8_identity_flipadst_0_8bpc_rvv 270.5 246.5 -8.87% inv_txfm_add_8x8_identity_flipadst_1_8bpc_rvv 270.5 246.5 -8.87% inv_txfm_add_8x8_identity_identity_0_8bpc_rvv 146.5 145.4 -0.75% inv_txfm_add_8x8_identity_identity_1_8bpc_rvv 146.4 145.4 -0.68% inv_txfm_add_16x16_adst_adst_0_8bpc_rvv 1363.4 1212.0 -11.10% inv_txfm_add_16x16_adst_adst_1_8bpc_rvv 1363.6 1212.2 -11.10% inv_txfm_add_16x16_adst_adst_2_8bpc_rvv 1813.7 1601.4 -11.71% inv_txfm_add_16x16_adst_dct_0_8bpc_rvv 1185.9 1074.6 -9.39% inv_txfm_add_16x16_adst_dct_1_8bpc_rvv 1186.0 1074.7 -9.38% inv_txfm_add_16x16_adst_dct_2_8bpc_rvv 1639.5 1468.9 -10.41% inv_txfm_add_16x16_adst_flipadst_0_8bpc_rvv 1374.8 1214.8 -11.64% inv_txfm_add_16x16_adst_flipadst_1_8bpc_rvv 1374.7 1214.6 -11.65% inv_txfm_add_16x16_adst_flipadst_2_8bpc_rvv 1819.3 1610.9 -11.45% inv_txfm_add_16x16_dct_adst_0_8bpc_rvv 1283.3 1139.1 -11.24% inv_txfm_add_16x16_dct_adst_1_8bpc_rvv 1283.2 1139.2 -11.22% inv_txfm_add_16x16_dct_adst_2_8bpc_rvv 1632.4 1471.9 -9.83% inv_txfm_add_16x16_dct_dct_0_8bpc_rvv 160.9 158.7 -1.37% inv_txfm_add_16x16_dct_dct_1_8bpc_rvv 1099.5 997.1 -9.31% inv_txfm_add_16x16_dct_dct_2_8bpc_rvv 1465.3 1335.2 -8.88% inv_txfm_add_16x16_dct_flipadst_0_8bpc_rvv 1286.8 1143.2 -11.16% inv_txfm_add_16x16_dct_flipadst_1_8bpc_rvv 1286.8 1143.3 -11.15% inv_txfm_add_16x16_dct_flipadst_2_8bpc_rvv 1638.6 1473.5 -10.08% inv_txfm_add_16x16_dct_identity_0_8bpc_rvv 806.6 783.3 -2.89% inv_txfm_add_16x16_dct_identity_1_8bpc_rvv 806.7 783.4 -2.89% inv_txfm_add_16x16_dct_identity_2_8bpc_rvv 1163.1 1105.3 -4.97% inv_txfm_add_16x16_flipadst_adst_0_8bpc_rvv 1374.3 1216.0 -11.52% inv_txfm_add_16x16_flipadst_adst_1_8bpc_rvv 1374.3 1216.2 -11.50% inv_txfm_add_16x16_flipadst_adst_2_8bpc_rvv 1817.5 1609.7 -11.43% inv_txfm_add_16x16_flipadst_dct_0_8bpc_rvv 1190.4 1073.8 -9.80% inv_txfm_add_16x16_flipadst_dct_1_8bpc_rvv 1190.4 1073.9 -9.79% inv_txfm_add_16x16_flipadst_dct_2_8bpc_rvv 1640.4 1472.6 -10.23% inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_rvv 1376.0 1224.2 -11.03% inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_rvv 1376.0 1224.1 -11.04% inv_txfm_add_16x16_flipadst_flipadst_2_8bpc_rvv 1829.3 1616.6 -11.63% inv_txfm_add_16x16_identity_dct_0_8bpc_rvv 952.9 882.0 -7.44% inv_txfm_add_16x16_identity_dct_1_8bpc_rvv 952.8 881.9 -7.44% inv_txfm_add_16x16_identity_dct_2_8bpc_rvv 1172.0 1100.1 -6.13% inv_txfm_add_16x16_identity_identity_0_8bpc_rvv 657.6 659.8 0.33% inv_txfm_add_16x16_identity_identity_1_8bpc_rvv 657.6 659.7 0.32% inv_txfm_add_16x16_identity_identity_2_8bpc_rvv 876.2 878.1 0.22% inv_txfm_add_4x8_adst_adst_0_8bpc_rvv 197.3 178.0 -9.78% inv_txfm_add_4x8_adst_adst_1_8bpc_rvv 197.4 178.0 -9.83% inv_txfm_add_4x8_adst_dct_0_8bpc_rvv 174.9 159.9 -8.58% inv_txfm_add_4x8_adst_dct_1_8bpc_rvv 174.9 159.9 -8.58% inv_txfm_add_4x8_adst_flipadst_0_8bpc_rvv 199.2 180.2 -9.54% inv_txfm_add_4x8_adst_flipadst_1_8bpc_rvv 199.2 180.2 -9.54% inv_txfm_add_4x8_adst_identity_0_8bpc_rvv 123.3 118.0 -4.30% inv_txfm_add_4x8_adst_identity_1_8bpc_rvv 123.3 118.0 -4.30% inv_txfm_add_4x8_dct_adst_0_8bpc_rvv 191.1 171.8 -10.10% inv_txfm_add_4x8_dct_adst_1_8bpc_rvv 191.1 171.7 -10.15% inv_txfm_add_4x8_dct_dct_0_8bpc_rvv 168.9 153.6 -9.06% inv_txfm_add_4x8_dct_dct_1_8bpc_rvv 169.0 153.6 -9.11% inv_txfm_add_4x8_dct_flipadst_0_8bpc_rvv 193.0 173.9 -9.90% inv_txfm_add_4x8_dct_flipadst_1_8bpc_rvv 193.0 173.9 -9.90% inv_txfm_add_4x8_dct_identity_0_8bpc_rvv 117.0 111.7 -4.53% inv_txfm_add_4x8_dct_identity_1_8bpc_rvv 117.0 111.7 -4.53% inv_txfm_add_4x8_flipadst_adst_0_8bpc_rvv 198.0 178.6 -9.80% inv_txfm_add_4x8_flipadst_adst_1_8bpc_rvv 198.0 178.6 -9.80% inv_txfm_add_4x8_flipadst_dct_0_8bpc_rvv 175.8 160.5 -8.70% inv_txfm_add_4x8_flipadst_dct_1_8bpc_rvv 175.8 160.5 -8.70% inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_rvv 199.9 180.5 -9.70% inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_rvv 199.9 180.5 -9.70% inv_txfm_add_4x8_flipadst_identity_0_8bpc_rvv 123.6 118.6 -4.05% inv_txfm_add_4x8_flipadst_identity_1_8bpc_rvv 123.6 118.6 -4.05% inv_txfm_add_4x8_identity_adst_0_8bpc_rvv 171.3 154.2 -9.98% inv_txfm_add_4x8_identity_adst_1_8bpc_rvv 171.3 154.2 -9.98% inv_txfm_add_4x8_identity_dct_0_8bpc_rvv 148.6 136.5 -8.14% inv_txfm_add_4x8_identity_dct_1_8bpc_rvv 148.6 136.5 -8.14% inv_txfm_add_4x8_identity_flipadst_0_8bpc_rvv 173.1 156.4 -9.65% inv_txfm_add_4x8_identity_flipadst_1_8bpc_rvv 173.2 156.4 -9.70% inv_txfm_add_4x8_identity_identity_0_8bpc_rvv 94.3 94.2 -0.11% inv_txfm_add_4x8_identity_identity_1_8bpc_rvv 94.2 94.2 0.00% inv_txfm_add_8x4_adst_adst_0_8bpc_rvv 201.2 188.4 -6.36% inv_txfm_add_8x4_adst_adst_1_8bpc_rvv 201.2 188.4 -6.36% inv_txfm_add_8x4_adst_dct_0_8bpc_rvv 194.9 175.7 -9.85% inv_txfm_add_8x4_adst_dct_1_8bpc_rvv 194.9 175.7 -9.85% inv_txfm_add_8x4_adst_flipadst_0_8bpc_rvv 202.4 182.3 -9.93% inv_txfm_add_8x4_adst_flipadst_1_8bpc_rvv 202.4 182.3 -9.93% inv_txfm_add_8x4_adst_identity_0_8bpc_rvv 170.1 155.7 -8.47% inv_txfm_add_8x4_adst_identity_1_8bpc_rvv 170.1 155.7 -8.47% inv_txfm_add_8x4_dct_adst_0_8bpc_rvv 178.0 162.1 -8.93% inv_txfm_add_8x4_dct_adst_1_8bpc_rvv 178.0 162.1 -8.93% inv_txfm_add_8x4_dct_dct_0_8bpc_rvv 172.8 157.0 -9.14% inv_txfm_add_8x4_dct_dct_1_8bpc_rvv 172.9 157.0 -9.20% inv_txfm_add_8x4_dct_flipadst_0_8bpc_rvv 180.3 163.7 -9.21% inv_txfm_add_8x4_dct_flipadst_1_8bpc_rvv 180.3 163.7 -9.21% inv_txfm_add_8x4_dct_identity_0_8bpc_rvv 147.9 137.9 -6.76% inv_txfm_add_8x4_dct_identity_1_8bpc_rvv 147.9 137.9 -6.76% inv_txfm_add_8x4_flipadst_adst_0_8bpc_rvv 202.4 182.3 -9.93% inv_txfm_add_8x4_flipadst_adst_1_8bpc_rvv 202.4 182.3 -9.93% inv_txfm_add_8x4_flipadst_dct_0_8bpc_rvv 196.3 175.9 -10.39% inv_txfm_add_8x4_flipadst_dct_1_8bpc_rvv 196.3 175.9 -10.39% inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_rvv 203.7 183.4 -9.97% inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_rvv 203.7 183.4 -9.97% inv_txfm_add_8x4_flipadst_identity_0_8bpc_rvv 171.1 155.9 -8.88% inv_txfm_add_8x4_flipadst_identity_1_8bpc_rvv 171.1 155.9 -8.88% inv_txfm_add_8x4_identity_adst_0_8bpc_rvv 126.8 120.9 -4.65% inv_txfm_add_8x4_identity_adst_1_8bpc_rvv 126.8 120.9 -4.65% inv_txfm_add_8x4_identity_dct_0_8bpc_rvv 121.5 117.0 -3.70% inv_txfm_add_8x4_identity_dct_1_8bpc_rvv 121.6 117.0 -3.78% inv_txfm_add_8x4_identity_flipadst_0_8bpc_rvv 129.1 122.3 -5.27% inv_txfm_add_8x4_identity_flipadst_1_8bpc_rvv 129.1 122.3 -5.27% inv_txfm_add_8x4_identity_identity_0_8bpc_rvv 98.5 95.7 -2.84% inv_txfm_add_8x4_identity_identity_1_8bpc_rvv 98.5 95.7 -2.84% inv_txfm_add_4x16_adst_adst_0_8bpc_rvv 384.4 344.6 -10.35% inv_txfm_add_4x16_adst_adst_1_8bpc_rvv 384.5 344.6 -10.38% inv_txfm_add_4x16_adst_adst_2_8bpc_rvv 429.3 387.3 -9.78% inv_txfm_add_4x16_adst_dct_0_8bpc_rvv 333.7 304.3 -8.81% inv_txfm_add_4x16_adst_dct_1_8bpc_rvv 333.7 304.2 -8.84% inv_txfm_add_4x16_adst_dct_2_8bpc_rvv 381.2 354.2 -7.08% inv_txfm_add_4x16_adst_flipadst_0_8bpc_rvv 385.7 349.1 -9.49% inv_txfm_add_4x16_adst_flipadst_1_8bpc_rvv 385.7 349.1 -9.49% inv_txfm_add_4x16_adst_flipadst_2_8bpc_rvv 433.0 389.3 -10.09% inv_txfm_add_4x16_adst_identity_0_8bpc_rvv 251.6 244.2 -2.94% inv_txfm_add_4x16_adst_identity_1_8bpc_rvv 251.5 244.1 -2.94% inv_txfm_add_4x16_adst_identity_2_8bpc_rvv 300.4 289.6 -3.60% inv_txfm_add_4x16_dct_adst_0_8bpc_rvv 378.5 335.6 -11.33% inv_txfm_add_4x16_dct_adst_1_8bpc_rvv 378.5 335.5 -11.36% inv_txfm_add_4x16_dct_adst_2_8bpc_rvv 420.6 369.5 -12.15% inv_txfm_add_4x16_dct_dct_0_8bpc_rvv 323.5 295.3 -8.72% inv_txfm_add_4x16_dct_dct_1_8bpc_rvv 323.2 295.2 -8.66% inv_txfm_add_4x16_dct_dct_2_8bpc_rvv 362.9 333.0 -8.24% inv_txfm_add_4x16_dct_flipadst_0_8bpc_rvv 375.3 339.4 -9.57% inv_txfm_add_4x16_dct_flipadst_1_8bpc_rvv 375.4 339.0 -9.70% inv_txfm_add_4x16_dct_flipadst_2_8bpc_rvv 414.8 372.2 -10.27% inv_txfm_add_4x16_dct_identity_0_8bpc_rvv 240.8 234.7 -2.53% inv_txfm_add_4x16_dct_identity_1_8bpc_rvv 240.7 234.7 -2.49% inv_txfm_add_4x16_dct_identity_2_8bpc_rvv 283.2 268.0 -5.37% inv_txfm_add_4x16_flipadst_adst_0_8bpc_rvv 384.2 345.8 -9.99% inv_txfm_add_4x16_flipadst_adst_1_8bpc_rvv 384.1 345.8 -9.97% inv_txfm_add_4x16_flipadst_adst_2_8bpc_rvv 432.5 387.7 -10.36% inv_txfm_add_4x16_flipadst_dct_0_8bpc_rvv 334.9 307.0 -8.33% inv_txfm_add_4x16_flipadst_dct_1_8bpc_rvv 335.0 307.1 -8.33% inv_txfm_add_4x16_flipadst_dct_2_8bpc_rvv 386.1 347.2 -10.08% inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_rvv 386.7 349.4 -9.65% inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_rvv 386.8 349.5 -9.64% inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_rvv 436.6 392.9 -10.01% inv_txfm_add_4x16_flipadst_identity_0_8bpc_rvv 252.4 247.4 -1.98% inv_txfm_add_4x16_flipadst_identity_1_8bpc_rvv 252.4 247.5 -1.94% inv_txfm_add_4x16_flipadst_identity_2_8bpc_rvv 302.1 286.7 -5.10% inv_txfm_add_4x16_identity_adst_0_8bpc_rvv 348.3 317.4 -8.87% inv_txfm_add_4x16_identity_adst_1_8bpc_rvv 348.4 317.5 -8.87% inv_txfm_add_4x16_identity_adst_2_8bpc_rvv 361.4 329.0 -8.97% inv_txfm_add_4x16_identity_dct_0_8bpc_rvv 301.8 275.8 -8.61% inv_txfm_add_4x16_identity_dct_1_8bpc_rvv 301.8 275.8 -8.61% inv_txfm_add_4x16_identity_dct_2_8bpc_rvv 312.0 287.4 -7.88% inv_txfm_add_4x16_identity_flipadst_0_8bpc_rvv 352.2 321.9 -8.60% inv_txfm_add_4x16_identity_flipadst_1_8bpc_rvv 352.2 322.0 -8.57% inv_txfm_add_4x16_identity_flipadst_2_8bpc_rvv 363.7 332.5 -8.58% inv_txfm_add_4x16_identity_identity_0_8bpc_rvv 215.8 215.0 -0.37% inv_txfm_add_4x16_identity_identity_1_8bpc_rvv 215.8 215.1 -0.32% inv_txfm_add_4x16_identity_identity_2_8bpc_rvv 228.0 227.0 -0.44% inv_txfm_add_16x4_adst_adst_0_8bpc_rvv 430.3 388.5 -9.71% inv_txfm_add_16x4_adst_adst_1_8bpc_rvv 430.3 388.5 -9.71% inv_txfm_add_16x4_adst_adst_2_8bpc_rvv 430.2 388.5 -9.69% inv_txfm_add_16x4_adst_dct_0_8bpc_rvv 412.1 374.1 -9.22% inv_txfm_add_16x4_adst_dct_1_8bpc_rvv 412.0 374.3 -9.15% inv_txfm_add_16x4_adst_dct_2_8bpc_rvv 412.1 374.2 -9.20% inv_txfm_add_16x4_adst_flipadst_0_8bpc_rvv 432.9 391.0 -9.68% inv_txfm_add_16x4_adst_flipadst_1_8bpc_rvv 432.8 391.1 -9.63% inv_txfm_add_16x4_adst_flipadst_2_8bpc_rvv 432.4 391.0 -9.57% inv_txfm_add_16x4_adst_identity_0_8bpc_rvv 358.4 332.1 -7.34% inv_txfm_add_16x4_adst_identity_1_8bpc_rvv 358.4 332.3 -7.28% inv_txfm_add_16x4_adst_identity_2_8bpc_rvv 358.5 332.5 -7.25% inv_txfm_add_16x4_dct_adst_0_8bpc_rvv 386.9 347.1 -10.29% inv_txfm_add_16x4_dct_adst_1_8bpc_rvv 386.8 347.1 -10.26% inv_txfm_add_16x4_dct_adst_2_8bpc_rvv 387.0 346.8 -10.39% inv_txfm_add_16x4_dct_dct_0_8bpc_rvv 363.3 330.9 -8.92% inv_txfm_add_16x4_dct_dct_1_8bpc_rvv 363.3 330.9 -8.92% inv_txfm_add_16x4_dct_dct_2_8bpc_rvv 363.2 331.0 -8.87% inv_txfm_add_16x4_dct_flipadst_0_8bpc_rvv 383.7 349.8 -8.84% inv_txfm_add_16x4_dct_flipadst_1_8bpc_rvv 384.3 349.8 -8.98% inv_txfm_add_16x4_dct_flipadst_2_8bpc_rvv 384.3 349.7 -9.00% inv_txfm_add_16x4_dct_identity_0_8bpc_rvv 310.2 288.4 -7.03% inv_txfm_add_16x4_dct_identity_1_8bpc_rvv 310.2 288.4 -7.03% inv_txfm_add_16x4_dct_identity_2_8bpc_rvv 310.3 288.5 -7.03% inv_txfm_add_16x4_flipadst_adst_0_8bpc_rvv 434.1 391.5 -9.81% inv_txfm_add_16x4_flipadst_adst_1_8bpc_rvv 434.1 392.0 -9.70% inv_txfm_add_16x4_flipadst_adst_2_8bpc_rvv 434.1 392.0 -9.70% inv_txfm_add_16x4_flipadst_dct_0_8bpc_rvv 423.5 375.5 -11.33% inv_txfm_add_16x4_flipadst_dct_1_8bpc_rvv 423.5 375.4 -11.36% inv_txfm_add_16x4_flipadst_dct_2_8bpc_rvv 423.5 375.5 -11.33% inv_txfm_add_16x4_flipadst_flipadst_0_8bpc_rvv 438.0 396.1 -9.57% inv_txfm_add_16x4_flipadst_flipadst_1_8bpc_rvv 438.1 396.0 -9.61% inv_txfm_add_16x4_flipadst_flipadst_2_8bpc_rvv 438.0 395.8 -9.63% inv_txfm_add_16x4_flipadst_identity_0_8bpc_rvv 361.9 333.0 -7.99% inv_txfm_add_16x4_flipadst_identity_1_8bpc_rvv 362.4 333.0 -8.11% inv_txfm_add_16x4_flipadst_identity_2_8bpc_rvv 362.4 333.0 -8.11% inv_txfm_add_16x4_identity_adst_0_8bpc_rvv 308.3 296.3 -3.89% inv_txfm_add_16x4_identity_adst_1_8bpc_rvv 308.4 296.4 -3.89% inv_txfm_add_16x4_identity_adst_2_8bpc_rvv 308.4 296.4 -3.89% inv_txfm_add_16x4_identity_dct_0_8bpc_rvv 289.9 279.9 -3.45% inv_txfm_add_16x4_identity_dct_1_8bpc_rvv 289.9 280.0 -3.41% inv_txfm_add_16x4_identity_dct_2_8bpc_rvv 290.0 279.9 -3.48% inv_txfm_add_16x4_identity_flipadst_0_8bpc_rvv 311.2 298.9 -3.95% inv_txfm_add_16x4_identity_flipadst_1_8bpc_rvv 311.1 298.9 -3.92% inv_txfm_add_16x4_identity_flipadst_2_8bpc_rvv 310.9 298.9 -3.86% inv_txfm_add_16x4_identity_identity_0_8bpc_rvv 238.4 243.2 2.01% inv_txfm_add_16x4_identity_identity_1_8bpc_rvv 238.4 243.2 2.01% inv_txfm_add_16x4_identity_identity_2_8bpc_rvv 238.5 243.2 1.97% inv_txfm_add_8x16_adst_adst_0_8bpc_rvv 701.5 624.2 -11.02% inv_txfm_add_8x16_adst_adst_1_8bpc_rvv 701.6 624.2 -11.03% inv_txfm_add_8x16_adst_adst_2_8bpc_rvv 853.5 755.2 -11.52% inv_txfm_add_8x16_adst_dct_0_8bpc_rvv 611.1 551.6 -9.74% inv_txfm_add_8x16_adst_dct_1_8bpc_rvv 611.2 551.7 -9.73% inv_txfm_add_8x16_adst_dct_2_8bpc_rvv 765.0 682.8 -10.75% inv_txfm_add_8x16_adst_flipadst_0_8bpc_rvv 703.4 629.3 -10.53% inv_txfm_add_8x16_adst_flipadst_1_8bpc_rvv 703.4 629.5 -10.51% inv_txfm_add_8x16_adst_flipadst_2_8bpc_rvv 858.1 763.9 -10.98% inv_txfm_add_8x16_adst_identity_0_8bpc_rvv 463.7 440.2 -5.07% inv_txfm_add_8x16_adst_identity_1_8bpc_rvv 464.3 440.2 -5.19% inv_txfm_add_8x16_adst_identity_2_8bpc_rvv 618.6 571.7 -7.58% inv_txfm_add_8x16_dct_adst_0_8bpc_rvv 660.3 590.5 -10.57% inv_txfm_add_8x16_dct_adst_1_8bpc_rvv 660.2 590.3 -10.59% inv_txfm_add_8x16_dct_adst_2_8bpc_rvv 776.2 687.9 -11.38% inv_txfm_add_8x16_dct_dct_0_8bpc_rvv 566.9 516.3 -8.93% inv_txfm_add_8x16_dct_dct_1_8bpc_rvv 567.1 516.4 -8.94% inv_txfm_add_8x16_dct_dct_2_8bpc_rvv 685.9 616.6 -10.10% inv_txfm_add_8x16_dct_flipadst_0_8bpc_rvv 663.3 593.5 -10.52% inv_txfm_add_8x16_dct_flipadst_1_8bpc_rvv 663.2 593.5 -10.51% inv_txfm_add_8x16_dct_flipadst_2_8bpc_rvv 771.7 690.5 -10.52% inv_txfm_add_8x16_dct_identity_0_8bpc_rvv 421.3 406.1 -3.61% inv_txfm_add_8x16_dct_identity_1_8bpc_rvv 421.3 406.1 -3.61% inv_txfm_add_8x16_dct_identity_2_8bpc_rvv 536.6 503.6 -6.15% inv_txfm_add_8x16_flipadst_adst_0_8bpc_rvv 703.3 627.1 -10.83% inv_txfm_add_8x16_flipadst_adst_1_8bpc_rvv 703.4 627.2 -10.83% inv_txfm_add_8x16_flipadst_adst_2_8bpc_rvv 857.7 763.7 -10.96% inv_txfm_add_8x16_flipadst_dct_0_8bpc_rvv 613.5 552.8 -9.89% inv_txfm_add_8x16_flipadst_dct_1_8bpc_rvv 613.4 552.7 -9.90% inv_txfm_add_8x16_flipadst_dct_2_8bpc_rvv 771.0 693.1 -10.10% inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_rvv 706.3 631.4 -10.60% inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_rvv 706.5 631.7 -10.59% inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_rvv 861.1 76.9 -11.17% inv_txfm_add_8x16_flipadst_identity_0_8bpc_rvv 467.0 443.0 -5.14% inv_txfm_add_8x16_flipadst_identity_1_8bpc_rvv 467.0 443.0 -5.14% inv_txfm_add_8x16_flipadst_identity_2_8bpc_rvv 623.7 575.1 -7.79% inv_txfm_add_8x16_identity_adst_0_8bpc_rvv 565.6 512.0 -9.48% inv_txfm_add_8x16_identity_adst_1_8bpc_rvv 565.6 512.9 -9.32% inv_txfm_add_8x16_identity_adst_2_8bpc_rvv 585.6 532.8 -9.02% inv_txfm_add_8x16_identity_dct_0_8bpc_rvv 476.4 439.9 -7.66% inv_txfm_add_8x16_identity_dct_1_8bpc_rvv 476.4 440.0 -7.64% inv_txfm_add_8x16_identity_dct_2_8bpc_rvv 496.3 459.5 -7.41% inv_txfm_add_8x16_identity_flipadst_0_8bpc_rvv 570.7 516.4 -9.51% inv_txfm_add_8x16_identity_flipadst_1_8bpc_rvv 570.6 516.3 -9.52% inv_txfm_add_8x16_identity_flipadst_2_8bpc_rvv 590.2 540.0 -8.51% inv_txfm_add_8x16_identity_identity_0_8bpc_rvv 330.9 329.9 -0.30% inv_txfm_add_8x16_identity_identity_1_8bpc_rvv 330.9 329.9 -0.30% inv_txfm_add_8x16_identity_identity_2_8bpc_rvv 350.8 349.7 -0.31% inv_txfm_add_16x8_adst_adst_0_8bpc_rvv 855.5 752.1 -12.09% inv_txfm_add_16x8_adst_adst_1_8bpc_rvv 855.5 751.9 -12.11% inv_txfm_add_16x8_adst_adst_2_8bpc_rvv 855.4 752.1 -12.08% inv_txfm_add_16x8_adst_dct_0_8bpc_rvv 765.4 685.5 -10.44% inv_txfm_add_16x8_adst_dct_1_8bpc_rvv 765.5 685.3 -10.48% inv_txfm_add_16x8_adst_dct_2_8bpc_rvv 765.5 685.5 -10.45% inv_txfm_add_16x8_adst_flipadst_0_8bpc_rvv 859.2 755.8 -12.03% inv_txfm_add_16x8_adst_flipadst_1_8bpc_rvv 859.1 756.0 -12.00% inv_txfm_add_16x8_adst_flipadst_2_8bpc_rvv 859.1 755.9 -12.01% inv_txfm_add_16x8_adst_identity_0_8bpc_rvv 612.8 561.9 -8.31% inv_txfm_add_16x8_adst_identity_1_8bpc_rvv 612.9 561.9 -8.32% inv_txfm_add_16x8_adst_identity_2_8bpc_rvv 612.8 561.9 -8.31% inv_txfm_add_16x8_dct_adst_0_8bpc_rvv 765.1 676.0 -11.65% inv_txfm_add_16x8_dct_adst_1_8bpc_rvv 765.0 676.2 -11.61% inv_txfm_add_16x8_dct_adst_2_8bpc_rvv 765.0 676.2 -11.61% inv_txfm_add_16x8_dct_dct_0_8bpc_rvv 674.5 612.0 -9.27% inv_txfm_add_16x8_dct_dct_1_8bpc_rvv 674.5 612.1 -9.25% inv_txfm_add_16x8_dct_dct_2_8bpc_rvv 674.6 612.0 -9.28% inv_txfm_add_16x8_dct_flipadst_0_8bpc_rvv 777.2 679.9 -12.52% inv_txfm_add_16x8_dct_flipadst_1_8bpc_rvv 777.1 680.1 -12.48% inv_txfm_add_16x8_dct_flipadst_2_8bpc_rvv 777.1 680.0 -12.50% inv_txfm_add_16x8_dct_identity_0_8bpc_rvv 522.2 488.2 -6.51% inv_txfm_add_16x8_dct_identity_1_8bpc_rvv 522.1 488.2 -6.49% inv_txfm_add_16x8_dct_identity_2_8bpc_rvv 522.1 487.5 -6.63% inv_txfm_add_16x8_flipadst_adst_0_8bpc_rvv 859.2 753.5 -12.30% inv_txfm_add_16x8_flipadst_adst_1_8bpc_rvv 859.2 753.6 -12.29% inv_txfm_add_16x8_flipadst_adst_2_8bpc_rvv 859.2 753.5 -12.30% inv_txfm_add_16x8_flipadst_dct_0_8bpc_rvv 768.9 689.0 -10.39% inv_txfm_add_16x8_flipadst_dct_1_8bpc_rvv 768.9 689.2 -10.37% inv_txfm_add_16x8_flipadst_dct_2_8bpc_rvv 768.8 689.2 -10.35% inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_rvv 863.0 758.7 -12.09% inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_rvv 862.9 758.7 -12.08% inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_rvv 863.0 758.6 -12.10% inv_txfm_add_16x8_flipadst_identity_0_8bpc_rvv 616.5 566.7 -8.08% inv_txfm_add_16x8_flipadst_identity_1_8bpc_rvv 616.6 566.6 -8.11% inv_txfm_add_16x8_flipadst_identity_2_8bpc_rvv 616.3 567.0 -8.00% inv_txfm_add_16x8_identity_adst_0_8bpc_rvv 618.1 564.5 -8.67% inv_txfm_add_16x8_identity_adst_1_8bpc_rvv 618.0 564.5 -8.66% inv_txfm_add_16x8_identity_adst_2_8bpc_rvv 617.7 564.6 -8.60% inv_txfm_add_16x8_identity_dct_0_8bpc_rvv 527.9 500.6 -5.17% inv_txfm_add_16x8_identity_dct_1_8bpc_rvv 527.8 500.7 -5.13% inv_txfm_add_16x8_identity_dct_2_8bpc_rvv 527.7 500.7 -5.12% inv_txfm_add_16x8_identity_flipadst_0_8bpc_rvv 622.3 568.5 -8.65% inv_txfm_add_16x8_identity_flipadst_1_8bpc_rvv 622.2 568.5 -8.63% inv_txfm_add_16x8_identity_flipadst_2_8bpc_rvv 622.3 568.4 -8.66% inv_txfm_add_16x8_identity_identity_0_8bpc_rvv 373.4 374.4 0.27% inv_txfm_add_16x8_identity_identity_1_8bpc_rvv 373.4 374.5 0.29% inv_txfm_add_16x8_identity_identity_2_8bpc_rvv 373.4 374.4 0.27% SpacemiT K1 Old New Delta inv_txfm_add_4x4_adst_adst_0_8bpc_rvv 101.0 96.8 -4.16% inv_txfm_add_4x4_adst_adst_1_8bpc_rvv 101.1 96.8 -4.25% inv_txfm_add_4x4_adst_dct_0_8bpc_rvv 96.8 91.7 -5.27% inv_txfm_add_4x4_adst_dct_1_8bpc_rvv 95.9 91.8 -4.28% inv_txfm_add_4x4_adst_flipadst_0_8bpc_rvv 102.2 97.9 -4.21% inv_txfm_add_4x4_adst_flipadst_1_8bpc_rvv 102.2 97.9 -4.21% inv_txfm_add_4x4_adst_identity_0_8bpc_rvv 82.4 80.4 -2.43% inv_txfm_add_4x4_adst_identity_1_8bpc_rvv 82.4 80.5 -2.31% inv_txfm_add_4x4_dct_adst_0_8bpc_rvv 97.3 92.6 -4.83% inv_txfm_add_4x4_dct_adst_1_8bpc_rvv 97.2 92.3 -5.04% inv_txfm_add_4x4_dct_dct_0_8bpc_rvv 41.2 41.3 0.24% inv_txfm_add_4x4_dct_dct_1_8bpc_rvv 96.0 87.5 -8.85% inv_txfm_add_4x4_dct_flipadst_0_8bpc_rvv 98.5 94.5 -4.06% inv_txfm_add_4x4_dct_flipadst_1_8bpc_rvv 98.6 94.7 -3.96% inv_txfm_add_4x4_dct_identity_0_8bpc_rvv 78.6 76.1 -3.18% inv_txfm_add_4x4_dct_identity_1_8bpc_rvv 78.6 76.0 -3.31% inv_txfm_add_4x4_flipadst_adst_0_8bpc_rvv 104.3 99.1 -4.99% inv_txfm_add_4x4_flipadst_adst_1_8bpc_rvv 104.4 99.1 -5.08% inv_txfm_add_4x4_flipadst_dct_0_8bpc_rvv 98.0 94.6 -3.47% inv_txfm_add_4x4_flipadst_dct_1_8bpc_rvv 98.1 94.4 -3.77% inv_txfm_add_4x4_flipadst_flipadst_0_8bpc_rvv 104.2 99.2 -4.80% inv_txfm_add_4x4_flipadst_flipadst_1_8bpc_rvv 104.3 99.2 -4.89% inv_txfm_add_4x4_flipadst_identity_0_8bpc_rvv 86.9 81.8 -5.87% inv_txfm_add_4x4_flipadst_identity_1_8bpc_rvv 87.0 81.9 -5.86% inv_txfm_add_4x4_identity_adst_0_8bpc_rvv 86.0 80.8 -6.05% inv_txfm_add_4x4_identity_adst_1_8bpc_rvv 85.9 81.4 -5.24% inv_txfm_add_4x4_identity_dct_0_8bpc_rvv 78.5 76.1 -3.06% inv_txfm_add_4x4_identity_dct_1_8bpc_rvv 78.6 76.1 -3.18% inv_txfm_add_4x4_identity_flipadst_0_8bpc_rvv 85.9 82.5 -3.96% inv_txfm_add_4x4_identity_flipadst_1_8bpc_rvv 85.9 82.3 -4.19% inv_txfm_add_4x4_identity_identity_0_8bpc_rvv 65.9 64.9 -1.52% inv_txfm_add_4x4_identity_identity_1_8bpc_rvv 65.9 64.8 -1.67% inv_txfm_add_4x4_wht_wht_0_8bpc_rvv 71.2 71.3 0.14% inv_txfm_add_4x4_wht_wht_1_8bpc_rvv 71.2 71.3 0.14% inv_txfm_add_8x8_adst_adst_0_8bpc_rvv 440.6 399.3 -9.37% inv_txfm_add_8x8_adst_adst_1_8bpc_rvv 440.6 399.3 -9.37% inv_txfm_add_8x8_adst_dct_0_8bpc_rvv 401.7 368.4 -8.29% inv_txfm_add_8x8_adst_dct_1_8bpc_rvv 401.8 368.4 -8.31% inv_txfm_add_8x8_adst_flipadst_0_8bpc_rvv 442.4 401.2 -9.31% inv_txfm_add_8x8_adst_flipadst_1_8bpc_rvv 442.4 401.1 -9.34% inv_txfm_add_8x8_adst_identity_0_8bpc_rvv 329.7 310.1 -5.94% inv_txfm_add_8x8_adst_identity_1_8bpc_rvv 329.7 310.1 -5.94% inv_txfm_add_8x8_dct_adst_0_8bpc_rvv 401.8 367.4 -8.56% inv_txfm_add_8x8_dct_adst_1_8bpc_rvv 401.7 367.3 -8.56% inv_txfm_add_8x8_dct_dct_0_8bpc_rvv 79.5 80.2 0.88% inv_txfm_add_8x8_dct_dct_1_8bpc_rvv 362.1 335.8 -7.26% inv_txfm_add_8x8_dct_flipadst_0_8bpc_rvv 405.0 369.2 -8.84% inv_txfm_add_8x8_dct_flipadst_1_8bpc_rvv 405.1 369.2 -8.86% inv_txfm_add_8x8_dct_identity_0_8bpc_rvv 290.9 278.2 -4.37% inv_txfm_add_8x8_dct_identity_1_8bpc_rvv 290.8 278.2 -4.33% inv_txfm_add_8x8_flipadst_adst_0_8bpc_rvv 442.5 401.1 -9.36% inv_txfm_add_8x8_flipadst_adst_1_8bpc_rvv 442.5 401.2 -9.33% inv_txfm_add_8x8_flipadst_dct_0_8bpc_rvv 405.8 369.2 -9.02% inv_txfm_add_8x8_flipadst_dct_1_8bpc_rvv 405.8 369.1 -9.04% inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_rvv 444.3 403.0 -9.30% inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_rvv 444.3 403.1 -9.27% inv_txfm_add_8x8_flipadst_identity_0_8bpc_rvv 331.6 310.9 -6.24% inv_txfm_add_8x8_flipadst_identity_1_8bpc_rvv 331.6 310.9 -6.24% inv_txfm_add_8x8_identity_adst_0_8bpc_rvv 313.3 292.6 -6.61% inv_txfm_add_8x8_identity_adst_1_8bpc_rvv 313.1 292.6 -6.55% inv_txfm_add_8x8_identity_dct_0_8bpc_rvv 274.5 260.6 -5.06% inv_txfm_add_8x8_identity_dct_1_8bpc_rvv 274.4 260.7 -4.99% inv_txfm_add_8x8_identity_flipadst_0_8bpc_rvv 315.3 294.4 -6.63% inv_txfm_add_8x8_identity_flipadst_1_8bpc_rvv 315.3 294.4 -6.63% inv_txfm_add_8x8_identity_identity_0_8bpc_rvv 202.5 202.5 0.00% inv_txfm_add_8x8_identity_identity_1_8bpc_rvv 202.6 202.5 -0.05% inv_txfm_add_16x16_adst_adst_0_8bpc_rvv 1418.8 1268.2 -10.61% inv_txfm_add_16x16_adst_adst_1_8bpc_rvv 1418.9 1268.3 -10.61% inv_txfm_add_16x16_adst_adst_2_8bpc_rvv 1943.3 1733.6 -10.79% inv_txfm_add_16x16_adst_dct_0_8bpc_rvv 1241.7 1134.6 -8.63% inv_txfm_add_16x16_adst_dct_1_8bpc_rvv 1241.5 1134.5 -8.62% inv_txfm_add_16x16_adst_dct_2_8bpc_rvv 1772.5 1599.8 -9.74% inv_txfm_add_16x16_adst_flipadst_0_8bpc_rvv 1429.8 1270.3 -11.16% inv_txfm_add_16x16_adst_flipadst_1_8bpc_rvv 1429.7 1270.1 -11.16% inv_txfm_add_16x16_adst_flipadst_2_8bpc_rvv 1951.1 1741.4 -10.75% inv_txfm_add_16x16_dct_adst_0_8bpc_rvv 1337.8 1195.8 -10.61% inv_txfm_add_16x16_dct_adst_1_8bpc_rvv 1337.5 1196.0 -10.58% inv_txfm_add_16x16_dct_adst_2_8bpc_rvv 1763.2 1604.6 -9.00% inv_txfm_add_16x16_dct_dct_0_8bpc_rvv 179.3 181.1 1.00% inv_txfm_add_16x16_dct_dct_1_8bpc_rvv 1153.8 1060.7 -8.07% inv_txfm_add_16x16_dct_dct_2_8bpc_rvv 1601.6 1470.6 -8.18% inv_txfm_add_16x16_dct_flipadst_0_8bpc_rvv 1340.7 1199.8 -10.51% inv_txfm_add_16x16_dct_flipadst_1_8bpc_rvv 1340.4 1199.8 -10.49% inv_txfm_add_16x16_dct_flipadst_2_8bpc_rvv 1771.2 1606.6 -9.29% inv_txfm_add_16x16_dct_identity_0_8bpc_rvv 877.9 854.9 -2.62% inv_txfm_add_16x16_dct_identity_1_8bpc_rvv 877.7 855.2 -2.56% inv_txfm_add_16x16_dct_identity_2_8bpc_rvv 1311.6 1254.1 -4.38% inv_txfm_add_16x16_flipadst_adst_0_8bpc_rvv 1428.2 1270.5 -11.04% inv_txfm_add_16x16_flipadst_adst_1_8bpc_rvv 1428.3 1270.6 -11.04% inv_txfm_add_16x16_flipadst_adst_2_8bpc_rvv 1947.3 1737.3 -10.78% inv_txfm_add_16x16_flipadst_dct_0_8bpc_rvv 1245.8 1133.5 -9.01% inv_txfm_add_16x16_flipadst_dct_1_8bpc_rvv 1246.0 1133.7 -9.01% inv_txfm_add_16x16_flipadst_dct_2_8bpc_rvv 1769.9 1603.9 -9.38% inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_rvv 1428.7 1279.7 -10.43% inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_rvv 1428.8 1279.5 -10.45% inv_txfm_add_16x16_flipadst_flipadst_2_8bpc_rvv 1960.8 1745.8 -10.96% inv_txfm_add_16x16_identity_dct_0_8bpc_rvv 1016.6 948.8 -6.67% inv_txfm_add_16x16_identity_dct_1_8bpc_rvv 1016.7 948.8 -6.68% inv_txfm_add_16x16_identity_dct_2_8bpc_rvv 1319.8 1247.7 -5.46% inv_txfm_add_16x16_identity_identity_0_8bpc_rvv 735.4 736.6 0.16% inv_txfm_add_16x16_identity_identity_1_8bpc_rvv 735.3 736.4 0.15% inv_txfm_add_16x16_identity_identity_2_8bpc_rvv 1037.8 1036.7 -0.11% inv_txfm_add_4x8_adst_adst_0_8bpc_rvv 197.2 179.9 -8.77% inv_txfm_add_4x8_adst_adst_1_8bpc_rvv 197.1 180.0 -8.68% inv_txfm_add_4x8_adst_dct_0_8bpc_rvv 177.5 164.2 -7.49% inv_txfm_add_4x8_adst_dct_1_8bpc_rvv 177.5 164.3 -7.44% inv_txfm_add_4x8_adst_flipadst_0_8bpc_rvv 199.3 181.8 -8.78% inv_txfm_add_4x8_adst_flipadst_1_8bpc_rvv 199.0 181.8 -8.64% inv_txfm_add_4x8_adst_identity_0_8bpc_rvv 126.7 121.8 -3.87% inv_txfm_add_4x8_adst_identity_1_8bpc_rvv 126.7 121.9 -3.79% inv_txfm_add_4x8_dct_adst_0_8bpc_rvv 189.8 172.4 -9.17% inv_txfm_add_4x8_dct_adst_1_8bpc_rvv 189.8 172.4 -9.17% inv_txfm_add_4x8_dct_dct_0_8bpc_rvv 170.2 156.8 -7.87% inv_txfm_add_4x8_dct_dct_1_8bpc_rvv 170.2 156.9 -7.81% inv_txfm_add_4x8_dct_flipadst_0_8bpc_rvv 192.6 174.2 -9.55% inv_txfm_add_4x8_dct_flipadst_1_8bpc_rvv 192.6 174.2 -9.55% inv_txfm_add_4x8_dct_identity_0_8bpc_rvv 119.4 114.3 -4.27% inv_txfm_add_4x8_dct_identity_1_8bpc_rvv 119.6 114.2 -4.52% inv_txfm_add_4x8_flipadst_adst_0_8bpc_rvv 197.7 180.5 -8.70% inv_txfm_add_4x8_flipadst_adst_1_8bpc_rvv 197.8 180.6 -8.70% inv_txfm_add_4x8_flipadst_dct_0_8bpc_rvv 178.3 165.0 -7.46% inv_txfm_add_4x8_flipadst_dct_1_8bpc_rvv 178.3 164.9 -7.52% inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_rvv 199.7 182.5 -8.61% inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_rvv 200.0 182.4 -8.80% inv_txfm_add_4x8_flipadst_identity_0_8bpc_rvv 127.2 122.3 -3.85% inv_txfm_add_4x8_flipadst_identity_1_8bpc_rvv 127.3 122.5 -3.77% inv_txfm_add_4x8_identity_adst_0_8bpc_rvv 172.1 155.0 -9.94% inv_txfm_add_4x8_identity_adst_1_8bpc_rvv 172.1 155.0 -9.94% inv_txfm_add_4x8_identity_dct_0_8bpc_rvv 148.7 139.4 -6.25% inv_txfm_add_4x8_identity_dct_1_8bpc_rvv 148.7 139.5 -6.19% inv_txfm_add_4x8_identity_flipadst_0_8bpc_rvv 171.7 156.8 -8.68% inv_txfm_add_4x8_identity_flipadst_1_8bpc_rvv 171.6 156.9 -8.57% inv_txfm_add_4x8_identity_identity_0_8bpc_rvv 96.8 96.8 0.00% inv_txfm_add_4x8_identity_identity_1_8bpc_rvv 96.7 96.7 0.00% inv_txfm_add_8x4_adst_adst_0_8bpc_rvv 228.1 220.0 -3.55% inv_txfm_add_8x4_adst_adst_1_8bpc_rvv 227.9 219.9 -3.51% inv_txfm_add_8x4_adst_dct_0_8bpc_rvv 219.4 206.4 -5.93% inv_txfm_add_8x4_adst_dct_1_8bpc_rvv 219.4 206.4 -5.93% inv_txfm_add_8x4_adst_flipadst_0_8bpc_rvv 229.4 214.7 -6.41% inv_txfm_add_8x4_adst_flipadst_1_8bpc_rvv 229.4 214.8 -6.36% inv_txfm_add_8x4_adst_identity_0_8bpc_rvv 195.6 187.6 -4.09% inv_txfm_add_8x4_adst_identity_1_8bpc_rvv 195.8 187.6 -4.19% inv_txfm_add_8x4_dct_adst_0_8bpc_rvv 207.0 195.2 -5.70% inv_txfm_add_8x4_dct_adst_1_8bpc_rvv 206.9 195.2 -5.65% inv_txfm_add_8x4_dct_dct_0_8bpc_rvv 199.4 188.2 -5.62% inv_txfm_add_8x4_dct_dct_1_8bpc_rvv 199.4 188.5 -5.47% inv_txfm_add_8x4_dct_flipadst_0_8bpc_rvv 209.5 196.5 -6.21% inv_txfm_add_8x4_dct_flipadst_1_8bpc_rvv 209.7 196.6 -6.25% inv_txfm_add_8x4_dct_identity_0_8bpc_rvv 175.7 169.5 -3.53% inv_txfm_add_8x4_dct_identity_1_8bpc_rvv 175.9 169.6 -3.58% inv_txfm_add_8x4_flipadst_adst_0_8bpc_rvv 229.0 214.7 -6.24% inv_txfm_add_8x4_flipadst_adst_1_8bpc_rvv 229.3 214.5 -6.45% inv_txfm_add_8x4_flipadst_dct_0_8bpc_rvv 220.9 206.7 -6.43% inv_txfm_add_8x4_flipadst_dct_1_8bpc_rvv 220.6 206.5 -6.39% inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_rvv 230.6 215.9 -6.37% inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_rvv 230.7 215.9 -6.42% inv_txfm_add_8x4_flipadst_identity_0_8bpc_rvv 196.9 188.9 -4.06% inv_txfm_add_8x4_flipadst_identity_1_8bpc_rvv 196.9 188.9 -4.06% inv_txfm_add_8x4_identity_adst_0_8bpc_rvv 157.6 154.7 -1.84% inv_txfm_add_8x4_identity_adst_1_8bpc_rvv 157.5 154.9 -1.65% inv_txfm_add_8x4_identity_dct_0_8bpc_rvv 150.0 147.9 -1.40% inv_txfm_add_8x4_identity_dct_1_8bpc_rvv 150.0 147.7 -1.53% inv_txfm_add_8x4_identity_flipadst_0_8bpc_rvv 159.6 155.9 -2.32% inv_txfm_add_8x4_identity_flipadst_1_8bpc_rvv 159.8 155.6 -2.63% inv_txfm_add_8x4_identity_identity_0_8bpc_rvv 128.6 128.8 0.16% inv_txfm_add_8x4_identity_identity_1_8bpc_rvv 128.4 129.3 0.70% inv_txfm_add_4x16_adst_adst_0_8bpc_rvv 373.8 335.9 -10.14% inv_txfm_add_4x16_adst_adst_1_8bpc_rvv 373.8 335.7 -10.19% inv_txfm_add_4x16_adst_adst_2_8bpc_rvv 417.4 380.0 -8.96% inv_txfm_add_4x16_adst_dct_0_8bpc_rvv 328.3 301.7 -8.10% inv_txfm_add_4x16_adst_dct_1_8bpc_rvv 328.0 302.0 -7.93% inv_txfm_add_4x16_adst_dct_2_8bpc_rvv 374.3 351.3 -6.14% inv_txfm_add_4x16_adst_flipadst_0_8bpc_rvv 374.5 339.8 -9.27% inv_txfm_add_4x16_adst_flipadst_1_8bpc_rvv 374.3 339.4 -9.32% inv_txfm_add_4x16_adst_flipadst_2_8bpc_rvv 422.0 383.8 -9.05% inv_txfm_add_4x16_adst_identity_0_8bpc_rvv 248.0 242.9 -2.06% inv_txfm_add_4x16_adst_identity_1_8bpc_rvv 248.0 242.2 -2.34% inv_txfm_add_4x16_adst_identity_2_8bpc_rvv 298.6 290.3 -2.78% inv_txfm_add_4x16_dct_adst_0_8bpc_rvv 370.5 329.4 -11.09% inv_txfm_add_4x16_dct_adst_1_8bpc_rvv 370.8 329.0 -11.27% inv_txfm_add_4x16_dct_adst_2_8bpc_rvv 409.1 360.9 -11.78% inv_txfm_add_4x16_dct_dct_0_8bpc_rvv 321.1 293.7 -8.53% inv_txfm_add_4x16_dct_dct_1_8bpc_rvv 321.0 294.3 -8.32% inv_txfm_add_4x16_dct_dct_2_8bpc_rvv 357.8 329.8 -7.83% inv_txfm_add_4x16_dct_flipadst_0_8bpc_rvv 369.7 332.9 -9.95% inv_txfm_add_4x16_dct_flipadst_1_8bpc_rvv 370.4 333.0 -10.10% inv_txfm_add_4x16_dct_flipadst_2_8bpc_rvv 405.5 364.9 -10.01% inv_txfm_add_4x16_dct_identity_0_8bpc_rvv 241.6 236.6 -2.07% inv_txfm_add_4x16_dct_identity_1_8bpc_rvv 241.8 235.6 -2.56% inv_txfm_add_4x16_dct_identity_2_8bpc_rvv 281.9 266.9 -5.32% inv_txfm_add_4x16_flipadst_adst_0_8bpc_rvv 371.9 337.3 -9.30% inv_txfm_add_4x16_flipadst_adst_1_8bpc_rvv 372.2 337.1 -9.43% inv_txfm_add_4x16_flipadst_adst_2_8bpc_rvv 419.8 381.5 -9.12% inv_txfm_add_4x16_flipadst_dct_0_8bpc_rvv 328.3 302.9 -7.74% inv_txfm_add_4x16_flipadst_dct_1_8bpc_rvv 328.4 303.3 -7.64% inv_txfm_add_4x16_flipadst_dct_2_8bpc_rvv 380.6 343.7 -9.70% inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_rvv 377.7 341.1 -9.69% inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_rvv 377.6 341.5 -9.56% inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_rvv 423.6 386.7 -8.71% inv_txfm_add_4x16_flipadst_identity_0_8bpc_rvv 250.0 245.7 -1.72% inv_txfm_add_4x16_flipadst_identity_1_8bpc_rvv 249.3 246.0 -1.32% inv_txfm_add_4x16_flipadst_identity_2_8bpc_rvv 296.4 284.7 -3.95% inv_txfm_add_4x16_identity_adst_0_8bpc_rvv 343.0 311.2 -9.27% inv_txfm_add_4x16_identity_adst_1_8bpc_rvv 342.9 311.0 -9.30% inv_txfm_add_4x16_identity_adst_2_8bpc_rvv 354.8 325.0 -8.40% inv_txfm_add_4x16_identity_dct_0_8bpc_rvv 298.9 274.9 -8.03% inv_txfm_add_4x16_identity_dct_1_8bpc_rvv 298.8 275.0 -7.97% inv_txfm_add_4x16_identity_dct_2_8bpc_rvv 310.3 289.1 -6.83% inv_txfm_add_4x16_identity_flipadst_0_8bpc_rvv 344.7 314.9 -8.65% inv_txfm_add_4x16_identity_flipadst_1_8bpc_rvv 344.5 314.8 -8.62% inv_txfm_add_4x16_identity_flipadst_2_8bpc_rvv 358.3 328.6 -8.29% inv_txfm_add_4x16_identity_identity_0_8bpc_rvv 219.6 216.1 -1.59% inv_txfm_add_4x16_identity_identity_1_8bpc_rvv 218.3 216.3 -0.92% inv_txfm_add_4x16_identity_identity_2_8bpc_rvv 231.3 229.6 -0.73% inv_txfm_add_16x4_adst_adst_0_8bpc_rvv 468.5 428.8 -8.47% inv_txfm_add_16x4_adst_adst_1_8bpc_rvv 468.5 428.9 -8.45% inv_txfm_add_16x4_adst_adst_2_8bpc_rvv 468.5 428.9 -8.45% inv_txfm_add_16x4_adst_dct_0_8bpc_rvv 453.8 414.5 -8.66% inv_txfm_add_16x4_adst_dct_1_8bpc_rvv 453.8 414.5 -8.66% inv_txfm_add_16x4_adst_dct_2_8bpc_rvv 453.9 414.4 -8.70% inv_txfm_add_16x4_adst_flipadst_0_8bpc_rvv 471.0 431.5 -8.39% inv_txfm_add_16x4_adst_flipadst_1_8bpc_rvv 471.0 431.3 -8.43% inv_txfm_add_16x4_adst_flipadst_2_8bpc_rvv 471.0 431.5 -8.39% inv_txfm_add_16x4_adst_identity_0_8bpc_rvv 402.2 375.0 -6.76% inv_txfm_add_16x4_adst_identity_1_8bpc_rvv 402.1 375.0 -6.74% inv_txfm_add_16x4_adst_identity_2_8bpc_rvv 402.0 375.3 -6.64% inv_txfm_add_16x4_dct_adst_0_8bpc_rvv 432.8 392.5 -9.31% inv_txfm_add_16x4_dct_adst_1_8bpc_rvv 432.8 392.5 -9.31% inv_txfm_add_16x4_dct_adst_2_8bpc_rvv 432.8 392.5 -9.31% inv_txfm_add_16x4_dct_dct_0_8bpc_rvv 407.9 378.3 -7.26% inv_txfm_add_16x4_dct_dct_1_8bpc_rvv 407.8 378.1 -7.28% inv_txfm_add_16x4_dct_dct_2_8bpc_rvv 407.8 378.1 -7.28% inv_txfm_add_16x4_dct_flipadst_0_8bpc_rvv 426.0 395.1 -7.25% inv_txfm_add_16x4_dct_flipadst_1_8bpc_rvv 425.9 395.0 -7.26% inv_txfm_add_16x4_dct_flipadst_2_8bpc_rvv 426.0 395.1 -7.25% inv_txfm_add_16x4_dct_identity_0_8bpc_rvv 357.1 338.7 -5.15% inv_txfm_add_16x4_dct_identity_1_8bpc_rvv 357.1 338.7 -5.15% inv_txfm_add_16x4_dct_identity_2_8bpc_rvv 357.2 338.7 -5.18% inv_txfm_add_16x4_flipadst_adst_0_8bpc_rvv 472.4 432.6 -8.43% inv_txfm_add_16x4_flipadst_adst_1_8bpc_rvv 472.2 432.6 -8.39% inv_txfm_add_16x4_flipadst_adst_2_8bpc_rvv 472.3 432.7 -8.38% inv_txfm_add_16x4_flipadst_dct_0_8bpc_rvv 464.3 418.2 -9.93% inv_txfm_add_16x4_flipadst_dct_1_8bpc_rvv 464.2 418.2 -9.91% inv_txfm_add_16x4_flipadst_dct_2_8bpc_rvv 464.2 418.2 -9.91% inv_txfm_add_16x4_flipadst_flipadst_0_8bpc_rvv 474.7 435.1 -8.34% inv_txfm_add_16x4_flipadst_flipadst_1_8bpc_rvv 474.8 435.1 -8.36% inv_txfm_add_16x4_flipadst_flipadst_2_8bpc_rvv 474.7 435.1 -8.34% inv_txfm_add_16x4_flipadst_identity_0_8bpc_rvv 405.9 378.8 -6.68% inv_txfm_add_16x4_flipadst_identity_1_8bpc_rvv 406.0 378.8 -6.70% inv_txfm_add_16x4_flipadst_identity_2_8bpc_rvv 406.0 378.8 -6.70% inv_txfm_add_16x4_identity_adst_0_8bpc_rvv 353.7 342.2 -3.25% inv_txfm_add_16x4_identity_adst_1_8bpc_rvv 353.8 342.3 -3.25% inv_txfm_add_16x4_identity_adst_2_8bpc_rvv 353.7 342.4 -3.19% inv_txfm_add_16x4_identity_dct_0_8bpc_rvv 338.1 327.9 -3.02% inv_txfm_add_16x4_identity_dct_1_8bpc_rvv 338.1 327.9 -3.02% inv_txfm_add_16x4_identity_dct_2_8bpc_rvv 338.2 327.9 -3.05% inv_txfm_add_16x4_identity_flipadst_0_8bpc_rvv 357.5 344.8 -3.55% inv_txfm_add_16x4_identity_flipadst_1_8bpc_rvv 357.5 344.9 -3.52% inv_txfm_add_16x4_identity_flipadst_2_8bpc_rvv 357.5 344.7 -3.58% inv_txfm_add_16x4_identity_identity_0_8bpc_rvv 287.1 297.0 3.45% inv_txfm_add_16x4_identity_identity_1_8bpc_rvv 287.2 297.0 3.41% inv_txfm_add_16x4_identity_identity_2_8bpc_rvv 287.2 297.0 3.41% inv_txfm_add_8x16_adst_adst_0_8bpc_rvv 774.3 704.8 -8.98% inv_txfm_add_8x16_adst_adst_1_8bpc_rvv 774.4 704.8 -8.99% inv_txfm_add_8x16_adst_adst_2_8bpc_rvv 929.5 839.9 -9.64% inv_txfm_add_8x16_adst_dct_0_8bpc_rvv 687.9 634.9 -7.70% inv_txfm_add_8x16_adst_dct_1_8bpc_rvv 688.0 634.8 -7.73% inv_txfm_add_8x16_adst_dct_2_8bpc_rvv 845.5 768.4 -9.12% inv_txfm_add_8x16_adst_flipadst_0_8bpc_rvv 779.5 708.5 -9.11% inv_txfm_add_8x16_adst_flipadst_1_8bpc_rvv 779.5 708.5 -9.11% inv_txfm_add_8x16_adst_flipadst_2_8bpc_rvv 933.3 849.9 -8.94% inv_txfm_add_8x16_adst_identity_0_8bpc_rvv 546.5 529.0 -3.20% inv_txfm_add_8x16_adst_identity_1_8bpc_rvv 546.5 529.0 -3.20% inv_txfm_add_8x16_adst_identity_2_8bpc_rvv 702.5 664.1 -5.47% inv_txfm_add_8x16_dct_adst_0_8bpc_rvv 739.9 672.7 -9.08% inv_txfm_add_8x16_dct_adst_1_8bpc_rvv 739.9 672.7 -9.08% inv_txfm_add_8x16_dct_adst_2_8bpc_rvv 863.1 776.1 -10.08% inv_txfm_add_8x16_dct_dct_0_8bpc_rvv 651.2 601.9 -7.57% inv_txfm_add_8x16_dct_dct_1_8bpc_rvv 651.2 601.8 -7.59% inv_txfm_add_8x16_dct_dct_2_8bpc_rvv 777.6 706.5 -9.14% inv_txfm_add_8x16_dct_flipadst_0_8bpc_rvv 742.4 678.9 -8.55% inv_txfm_add_8x16_dct_flipadst_1_8bpc_rvv 742.5 678.9 -8.57% inv_txfm_add_8x16_dct_flipadst_2_8bpc_rvv 858.8 779.3 -9.26% inv_txfm_add_8x16_dct_identity_0_8bpc_rvv 510.8 496.4 -2.82% inv_txfm_add_8x16_dct_identity_1_8bpc_rvv 510.6 496.5 -2.76% inv_txfm_add_8x16_dct_identity_2_8bpc_rvv 630.0 599.7 -4.81% inv_txfm_add_8x16_flipadst_adst_0_8bpc_rvv 778.3 707.2 -9.14% inv_txfm_add_8x16_flipadst_adst_1_8bpc_rvv 778.3 707.1 -9.15% inv_txfm_add_8x16_flipadst_adst_2_8bpc_rvv 934.4 843.5 -9.73% inv_txfm_add_8x16_flipadst_dct_0_8bpc_rvv 689.3 634.7 -7.92% inv_txfm_add_8x16_flipadst_dct_1_8bpc_rvv 689.2 634.8 -7.89% inv_txfm_add_8x16_flipadst_dct_2_8bpc_rvv 845.8 774.4 -8.44% inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_rvv 779.9 710.5 -8.90% inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_rvv 780.0 710.4 -8.92% inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_rvv 936.4 848.1 -9.43% inv_txfm_add_8x16_flipadst_identity_0_8bpc_rvv 550.4 531.3 -3.47% inv_txfm_add_8x16_flipadst_identity_1_8bpc_rvv 550.4 531.3 -3.47% inv_txfm_add_8x16_flipadst_identity_2_8bpc_rvv 705.3 669.4 -5.09% inv_txfm_add_8x16_identity_adst_0_8bpc_rvv 649.0 599.7 -7.60% inv_txfm_add_8x16_identity_adst_1_8bpc_rvv 649.0 599.7 -7.60% inv_txfm_add_8x16_identity_adst_2_8bpc_rvv 682.8 633.4 -7.23% inv_txfm_add_8x16_identity_dct_0_8bpc_rvv 562.1 527.9 -6.08% inv_txfm_add_8x16_identity_dct_1_8bpc_rvv 562.0 527.9 -6.07% inv_txfm_add_8x16_identity_dct_2_8bpc_rvv 597.4 561.5 -6.01% inv_txfm_add_8x16_identity_flipadst_0_8bpc_rvv 652.7 603.6 -7.52% inv_txfm_add_8x16_identity_flipadst_1_8bpc_rvv 652.8 603.6 -7.54% inv_txfm_add_8x16_identity_flipadst_2_8bpc_rvv 686.6 640.5 -6.71% inv_txfm_add_8x16_identity_identity_0_8bpc_rvv 421.6 424.4 0.66% inv_txfm_add_8x16_identity_identity_1_8bpc_rvv 421.7 424.4 0.64% inv_txfm_add_8x16_identity_identity_2_8bpc_rvv 455.5 458.1 0.57% inv_txfm_add_16x8_adst_adst_0_8bpc_rvv 935.2 843.2 -9.84% inv_txfm_add_16x8_adst_adst_1_8bpc_rvv 935.2 843.3 -9.83% inv_txfm_add_16x8_adst_adst_2_8bpc_rvv 935.2 843.1 -9.85% inv_txfm_add_16x8_adst_dct_0_8bpc_rvv 857.0 781.1 -8.86% inv_txfm_add_16x8_adst_dct_1_8bpc_rvv 856.9 781.1 -8.85% inv_txfm_add_16x8_adst_dct_2_8bpc_rvv 856.9 781.0 -8.86% inv_txfm_add_16x8_adst_flipadst_0_8bpc_rvv 938.9 846.8 -9.81% inv_txfm_add_16x8_adst_flipadst_1_8bpc_rvv 938.8 847.0 -9.78% inv_txfm_add_16x8_adst_flipadst_2_8bpc_rvv 938.9 847.0 -9.79% inv_txfm_add_16x8_adst_identity_0_8bpc_rvv 711.2 661.6 -6.97% inv_txfm_add_16x8_adst_identity_1_8bpc_rvv 711.2 661.6 -6.97% inv_txfm_add_16x8_adst_identity_2_8bpc_rvv 711.2 661.6 -6.97% inv_txfm_add_16x8_dct_adst_0_8bpc_rvv 846.1 771.5 -8.82% inv_txfm_add_16x8_dct_adst_1_8bpc_rvv 845.9 771.5 -8.80% inv_txfm_add_16x8_dct_adst_2_8bpc_rvv 846.2 772.1 -8.76% inv_txfm_add_16x8_dct_dct_0_8bpc_rvv 767.8 710.3 -7.49% inv_txfm_add_16x8_dct_dct_1_8bpc_rvv 767.8 710.4 -7.48% inv_txfm_add_16x8_dct_dct_2_8bpc_rvv 767.4 710.4 -7.43% inv_txfm_add_16x8_dct_flipadst_0_8bpc_rvv 856.6 775.6 -9.46% inv_txfm_add_16x8_dct_flipadst_1_8bpc_rvv 856.5 775.1 -9.50% inv_txfm_add_16x8_dct_flipadst_2_8bpc_rvv 856.6 775.2 -9.50% inv_txfm_add_16x8_dct_identity_0_8bpc_rvv 623.3 589.9 -5.36% inv_txfm_add_16x8_dct_identity_1_8bpc_rvv 623.3 590.0 -5.34% inv_txfm_add_16x8_dct_identity_2_8bpc_rvv 623.3 589.7 -5.39% inv_txfm_add_16x8_flipadst_adst_0_8bpc_rvv 939.8 846.9 -9.89% inv_txfm_add_16x8_flipadst_adst_1_8bpc_rvv 939.8 847.0 -9.87% inv_txfm_add_16x8_flipadst_adst_2_8bpc_rvv 939.9 846.9 -9.89% inv_txfm_add_16x8_flipadst_dct_0_8bpc_rvv 860.8 784.9 -8.82% inv_txfm_add_16x8_flipadst_dct_1_8bpc_rvv 860.7 784.8 -8.82% inv_txfm_add_16x8_flipadst_dct_2_8bpc_rvv 860.8 784.9 -8.82% inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_rvv 942.7 852.2 -9.60% inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_rvv 942.7 852.1 -9.61% inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_rvv 942.8 852.1 -9.62% inv_txfm_add_16x8_flipadst_identity_0_8bpc_rvv 714.9 667.0 -6.70% inv_txfm_add_16x8_flipadst_identity_1_8bpc_rvv 715.0 666.9 -6.73% inv_txfm_add_16x8_flipadst_identity_2_8bpc_rvv 715.0 666.9 -6.73% inv_txfm_add_16x8_identity_adst_0_8bpc_rvv 707.9 667.2 -5.75% inv_txfm_add_16x8_identity_adst_1_8bpc_rvv 707.9 667.3 -5.74% inv_txfm_add_16x8_identity_adst_2_8bpc_rvv 707.9 667.2 -5.75% inv_txfm_add_16x8_identity_dct_0_8bpc_rvv 630.6 604.8 -4.09% inv_txfm_add_16x8_identity_dct_1_8bpc_rvv 630.7 604.9 -4.09% inv_txfm_add_16x8_identity_dct_2_8bpc_rvv 630.6 604.8 -4.09% inv_txfm_add_16x8_identity_flipadst_0_8bpc_rvv 711.7 671.1 -5.70% inv_txfm_add_16x8_identity_flipadst_1_8bpc_rvv 711.9 671.1 -5.73% inv_txfm_add_16x8_identity_flipadst_2_8bpc_rvv 711.8 671.2 -5.70% inv_txfm_add_16x8_identity_identity_0_8bpc_rvv 485.2 486.2 0.21% inv_txfm_add_16x8_identity_identity_1_8bpc_rvv 485.2 486.3 0.23% inv_txfm_add_16x8_identity_identity_2_8bpc_rvv 485.2 486.3 0.23%
-
- Oct 14, 2024
-
-
Jean-Baptiste Kempf authored
-
- Oct 13, 2024
-
-
Luca Barbato authored
-
Nathan E. Egge authored
This fixes md5sum mismatch in profile0_core/streams/test11168_11073.obu.
-
- Oct 12, 2024
-
-
Hecai Yuan authored
-
- Oct 09, 2024
-
-
Bogdan Gligorijević authored
Benchmarks: - Kendryte K230: warp_8x8_8bpc_c: 4549.7 ( 1.00x) warp_8x8_8bpc_rvv: 2504.7 ( 1.82x) warp_8x8t_8bpc_c: 4414.7 ( 1.00x) warp_8x8t_8bpc_rvv: 2465.7 ( 1.79x) - Banana Pi BPI-F3: warp_8x8_8bpc_c: 4431.2 ( 1.00x) warp_8x8_8bpc_rvv: 3297.4 ( 1.34x) warp_8x8t_8bpc_c: 4299.3 ( 1.00x) warp_8x8t_8bpc_rvv: 3255.7 ( 1.32x)
-
Niklas Haas authored
To avoid read-after-write. Speedup is about 1% for width=4 on a K230.
-
Niklas Haas authored
This code compromises between the performance of a dedicated kernel per VLEN/width pair, and the flexibility of a fully VLEN-dynamic loop, by using a single special case for w=4, and subdividing the rest into the unrolled four line fast path, and the general-purpose slow path (for large width on small VLEN). Kendryte K230 avg_w4_8bpc_c: 346.8 ( 1.00x) avg_w4_8bpc_rvv: 50.3 ( 6.90x) avg_w8_8bpc_c: 1054.9 ( 1.00x) avg_w8_8bpc_rvv: 139.1 ( 7.58x) avg_w16_8bpc_c: 3396.3 ( 1.00x) avg_w16_8bpc_rvv: 350.6 ( 9.69x) avg_w32_8bpc_c: 13734.3 ( 1.00x) avg_w32_8bpc_rvv: 1226.3 (11.20x) avg_w64_8bpc_c: 33260.9 ( 1.00x) avg_w64_8bpc_rvv: 3869.4 ( 8.60x) avg_w128_8bpc_c: 83441.3 ( 1.00x) avg_w128_8bpc_rvv: 9765.1 ( 8.54x) w_avg_w4_8bpc_c: 444.3 ( 1.00x) w_avg_w4_8bpc_rvv: 75.8 ( 5.86x) w_avg_w8_8bpc_c: 1365.6 ( 1.00x) w_avg_w8_8bpc_rvv: 208.8 ( 6.54x) w_avg_w16_8bpc_c: 4420.8 ( 1.00x) w_avg_w16_8bpc_rvv: 570.7 ( 7.75x) w_avg_w32_8bpc_c: 18010.9 ( 1.00x) w_avg_w32_8bpc_rvv: 2074.4 ( 8.68x) w_avg_w64_8bpc_c: 43050.4 ( 1.00x) w_avg_w64_8bpc_rvv: 5799.5 ( 7.42x) w_avg_w128_8bpc_c: 107153.6 ( 1.00x) w_avg_w128_8bpc_rvv: 14272.0 ( 7.51x) mask_w4_8bpc_c: 497.6 ( 1.00x) mask_w4_8bpc_rvv: 88.5 ( 5.63x) mask_w8_8bpc_c: 1528.5 ( 1.00x) mask_w8_8bpc_rvv: 253.1 ( 6.04x) mask_w16_8bpc_c: 4953.8 ( 1.00x) mask_w16_8bpc_rvv: 679.0 ( 7.30x) mask_w32_8bpc_c: 20298.3 ( 1.00x) mask_w32_8bpc_rvv: 3012.9 ( 6.74x) mask_w64_8bpc_c: 49718.8 ( 1.00x) mask_w64_8bpc_rvv: 7291.7 ( 6.82x) mask_w128_8bpc_c: 126740.3 ( 1.00x) mask_w128_8bpc_rvv: 18351.1 ( 6.91x)
-
Niklas Haas authored
-
Nathan E. Egge authored
Kendryte K230 blend_v_w2_8bpc_c: 221.4 ( 1.00x) blend_v_w2_8bpc_rvv: 147.7 ( 1.50x) blend_v_w4_8bpc_c: 945.3 ( 1.00x) blend_v_w4_8bpc_rvv: 243.3 ( 3.89x) blend_v_w8_8bpc_c: 1786.9 ( 1.00x) blend_v_w8_8bpc_rvv: 256.1 ( 6.98x) blend_v_w16_8bpc_c: 3472.1 ( 1.00x) blend_v_w16_8bpc_rvv: 351.1 ( 9.89x) blend_v_w32_8bpc_c: 6832.1 ( 1.00x) blend_v_w32_8bpc_rvv: 635.4 (10.75x) SpacemiT K1 blend_v_w2_8bpc_c: 218.0 ( 1.00x) blend_v_w2_8bpc_rvv: 144.3 ( 1.51x) blend_v_w4_8bpc_c: 921.7 ( 1.00x) blend_v_w4_8bpc_rvv: 237.1 ( 3.89x) blend_v_w8_8bpc_c: 1739.8 ( 1.00x) blend_v_w8_8bpc_rvv: 237.4 ( 7.33x) blend_v_w16_8bpc_c: 3376.6 ( 1.00x) blend_v_w16_8bpc_rvv: 296.3 (11.40x) blend_v_w32_8bpc_c: 6647.2 ( 1.00x) blend_v_w32_8bpc_rvv: 408.1 (16.29x)
-
Nathan E. Egge authored
Kendryte K230 blend_h_w2_8bpc_c: 165.9 ( 1.00x) blend_h_w2_8bpc_rvv: 83.8 ( 1.98x) blend_h_w4_8bpc_c: 295.2 ( 1.00x) blend_h_w4_8bpc_rvv: 83.8 ( 3.52x) blend_h_w8_8bpc_c: 557.9 ( 1.00x) blend_h_w8_8bpc_rvv: 92.5 ( 6.03x) blend_h_w16_8bpc_c: 1078.8 ( 1.00x) blend_h_w16_8bpc_rvv: 117.3 ( 9.19x) blend_h_w32_8bpc_c: 2117.8 ( 1.00x) blend_h_w32_8bpc_rvv: 200.5 (10.57x) blend_h_w64_8bpc_c: 4194.7 ( 1.00x) blend_h_w64_8bpc_rvv: 363.2 (11.55x) blend_h_w128_8bpc_c: 10271.4 ( 1.00x) blend_h_w128_8bpc_rvv: 844.5 (12.16x) SpacemiT K1 blend_h_w2_8bpc_c: 162.5 ( 1.00x) blend_h_w2_8bpc_rvv: 83.9 ( 1.94x) blend_h_w4_8bpc_c: 288.6 ( 1.00x) blend_h_w4_8bpc_rvv: 83.7 ( 3.45x) blend_h_w8_8bpc_c: 544.7 ( 1.00x) blend_h_w8_8bpc_rvv: 84.0 ( 6.48x) blend_h_w16_8bpc_c: 1052.8 ( 1.00x) blend_h_w16_8bpc_rvv: 102.9 (10.23x) blend_h_w32_8bpc_c: 2068.0 ( 1.00x) blend_h_w32_8bpc_rvv: 131.4 (15.73x) blend_h_w64_8bpc_c: 4093.7 ( 1.00x) blend_h_w64_8bpc_rvv: 220.3 (18.58x) blend_h_w128_8bpc_c: 10023.1 ( 1.00x) blend_h_w128_8bpc_rvv: 467.3 (21.45x)
-
Nathan E. Egge authored
Kendryte K230 blend_w4_8bpc_c: 204.8 ( 1.00x) blend_w4_8bpc_rvv: 59.8 ( 3.42x) blend_w8_8bpc_c: 608.9 ( 1.00x) blend_w8_8bpc_rvv: 87.2 ( 6.98x) blend_w16_8bpc_c: 2362.4 ( 1.00x) blend_w16_8bpc_rvv: 225.2 (10.49x) blend_w32_8bpc_c: 5990.4 ( 1.00x) blend_w32_8bpc_rvv: 518.3 (11.56x) SpacemiT K1 blend_w4_8bpc_c: 201.6 ( 1.00x) blend_w4_8bpc_rvv: 58.0 ( 3.48x) blend_w8_8bpc_c: 595.1 ( 1.00x) blend_w8_8bpc_rvv: 82.1 ( 7.25x) blend_w16_8bpc_c: 2308.8 ( 1.00x) blend_w16_8bpc_rvv: 189.0 (12.22x) blend_w32_8bpc_c: 5853.1 ( 1.00x) blend_w32_8bpc_rvv: 339.5 (17.24x)
-
Nathan E. Egge authored
SpacemiT K1 blend_v_w2_8bpc_c: 217.0 ( 1.00x) blend_v_w2_8bpc_rvv: 143.3 ( 1.51x) blend_v_w4_8bpc_c: 921.6 ( 1.00x) blend_v_w4_8bpc_rvv: 236.3 ( 3.90x) blend_v_w8_8bpc_c: 1738.2 ( 1.00x) blend_v_w8_8bpc_rvv: 238.1 ( 7.30x) blend_v_w16_8bpc_c: 3376.1 ( 1.00x) blend_v_w16_8bpc_rvv: 298.0 (11.33x) blend_v_w32_8bpc_c: 6648.0 ( 1.00x) blend_v_w32_8bpc_rvv: 409.5 (16.24x)
-
Nathan E. Egge authored
SpacemiT K1 blend_h_w2_8bpc_c: 161.8 ( 1.00x) blend_h_w2_8bpc_rvv: 83.5 ( 1.94x) blend_h_w4_8bpc_c: 288.4 ( 1.00x) blend_h_w4_8bpc_rvv: 83.7 ( 3.45x) blend_h_w8_8bpc_c: 543.9 ( 1.00x) blend_h_w8_8bpc_rvv: 84.5 ( 6.44x) blend_h_w16_8bpc_c: 1051.6 ( 1.00x) blend_h_w16_8bpc_rvv: 103.8 (10.13x) blend_h_w32_8bpc_c: 2066.0 ( 1.00x) blend_h_w32_8bpc_rvv: 133.8 (15.44x) blend_h_w64_8bpc_c: 4092.7 ( 1.00x) blend_h_w64_8bpc_rvv: 225.2 (18.18x) blend_h_w128_8bpc_c: 10011.3 ( 1.00x) blend_h_w128_8bpc_rvv: 474.7 (21.09x)
-
Nathan E. Egge authored
SpacemiT K1 blend_w4_8bpc_c: 201.3 ( 1.00x) blend_w4_8bpc_rvv: 59.3 ( 3.40x) blend_w8_8bpc_c: 595.1 ( 1.00x) blend_w8_8bpc_rvv: 84.1 ( 7.07x) blend_w16_8bpc_c: 2309.0 ( 1.00x) blend_w16_8bpc_rvv: 190.5 (12.12x) blend_w32_8bpc_c: 5854.7 ( 1.00x) blend_w32_8bpc_rvv: 341.6 (17.14x)
-
Nathan E. Egge authored
-
Nathan E. Egge authored
Kendryte K230 blend_v_w2_8bpc_c: 219.6 ( 1.00x) blend_v_w2_8bpc_rvv: 141.8 ( 1.55x) blend_v_w4_8bpc_c: 942.9 ( 1.00x) blend_v_w4_8bpc_rvv: 240.9 ( 3.91x) blend_v_w8_8bpc_c: 1783.5 ( 1.00x) blend_v_w8_8bpc_rvv: 254.7 ( 7.00x) blend_v_w16_8bpc_c: 3466.5 ( 1.00x) blend_v_w16_8bpc_rvv: 350.5 ( 9.89x) blend_v_w32_8bpc_c: 6825.2 ( 1.00x) blend_v_w32_8bpc_rvv: 635.1 (10.75x)
-
Nathan E. Egge authored
Kendryte K230 blend_h_w2_8bpc_c: 165.4 ( 1.00x) blend_h_w2_8bpc_rvv: 79.4 ( 2.08x) blend_h_w4_8bpc_c: 294.6 ( 1.00x) blend_h_w4_8bpc_rvv: 81.5 ( 3.61x) blend_h_w8_8bpc_c: 556.9 ( 1.00x) blend_h_w8_8bpc_rvv: 90.2 ( 6.17x) blend_h_w16_8bpc_c: 1077.6 ( 1.00x) blend_h_w16_8bpc_rvv: 116.1 ( 9.29x) blend_h_w32_8bpc_c: 2116.2 ( 1.00x) blend_h_w32_8bpc_rvv: 200.5 (10.55x) blend_h_w64_8bpc_c: 4191.8 ( 1.00x) blend_h_w64_8bpc_rvv: 363.3 (11.54x) blend_h_w128_8bpc_c: 10264.6 ( 1.00x) blend_h_w128_8bpc_rvv: 844.1 (12.16x)
-
Nathan E. Egge authored
Kendryte K230 blend_w4_8bpc_c: 204.5 ( 1.00x) blend_w4_8bpc_rvv: 56.4 ( 3.62x) blend_w8_8bpc_c: 608.6 ( 1.00x) blend_w8_8bpc_rvv: 87.3 ( 6.97x) blend_w16_8bpc_c: 2363.8 ( 1.00x) blend_w16_8bpc_rvv: 225.1 (10.50x) blend_w32_8bpc_c: 5990.3 ( 1.00x) blend_w32_8bpc_rvv: 518.8 (11.55x)
-
Bogdan Gligorijević authored
Benchmark pending
-
Bogdan Gligorijević authored
Current benchmark: - Kendryte K230: inv_txfm_add_16x16_dct_dct_0_8bpc_c: 1729.4 ( 1.00x) inv_txfm_add_16x16_dct_dct_0_8bpc_rvv: 153.2 (11.29x) - spacemiT K1: inv_txfm_add_16x16_dct_dct_0_8bpc_c: 1533.4 ( 1.00x) inv_txfm_add_16x16_dct_dct_0_8bpc_rvv: 176.8 ( 8.67x)
-
Bogdan Gligorijević authored
Performance comparison: - SpacemiT K1: Master branch: itx_16x16: inv_txfm_add_16x16_dct_dct_0_8bpc_c: 1534.1 ( 1.00x) 1534.9 ( 1.00x) inv_txfm_add_16x16_dct_dct_0_8bpc_rvv: 1173.6 ( 1.31x) 173.1 ( 8.87x) - Kendryte K230: Master branch: itx_16x16: inv_txfm_add_16x16_dct_dct_0_8bpc_c: 1576.0 ( 1.00x) 1579.1 ( 1.00x) inv_txfm_add_16x16_dct_dct_0_8bpc_rvv: 1095.5 ( 1.44x) 146.8 (10.75x)
-
Bogdan Gligorijević authored
Benchmarks: - Kendryte K230: intra_pred_paeth_w4_8bpc_c: 412.9 ( 1.00x) intra_pred_paeth_w4_8bpc_rvv: 688.0 ( 0.60x) intra_pred_paeth_w8_8bpc_c: 1206.6 ( 1.00x) intra_pred_paeth_w8_8bpc_rvv: 1094.3 ( 1.10x) intra_pred_paeth_w16_8bpc_c: 3889.7 ( 1.00x) intra_pred_paeth_w16_8bpc_rvv: 1796.7 ( 2.16x) intra_pred_paeth_w32_8bpc_c: 9797.2 ( 1.00x) intra_pred_paeth_w32_8bpc_rvv: 4323.9 ( 2.27x) intra_pred_paeth_w64_8bpc_c: 24242.5 ( 1.00x) intra_pred_paeth_w64_8bpc_rvv: 10739.8 ( 2.26x) - Banana Pi BPI-F3 intra_pred_paeth_w4_8bpc_c: 395.1 ( 1.00x) intra_pred_paeth_w4_8bpc_rvv: 705.4 ( 0.56x) intra_pred_paeth_w8_8bpc_c: 1184.9 ( 1.00x) intra_pred_paeth_w8_8bpc_rvv: 1125.3 ( 1.05x) intra_pred_paeth_w16_8bpc_c: 3807.8 ( 1.00x) intra_pred_paeth_w16_8bpc_rvv: 1850.8 ( 2.06x) intra_pred_paeth_w32_8bpc_c: 9985.1 ( 1.00x) intra_pred_paeth_w32_8bpc_rvv: 2235.5 ( 4.47x) intra_pred_paeth_w64_8bpc_c: 24040.4 ( 1.00x) intra_pred_paeth_w64_8bpc_rvv: 5450.0 ( 4.41x)
-
Bogdan Gligorijević authored
Benchmarks: - Kendryte K230: pal_pred_w4_8bpc_c: 115.6 ( 1.00x) pal_pred_w4_8bpc_rvv: 331.4 ( 0.35x) pal_pred_w4_16bpc_c: 140.8 ( 1.00x) pal_pred_w4_16bpc_rvv: 247.9 ( 0.57x) pal_pred_w8_8bpc_c: 334.9 ( 1.00x) pal_pred_w8_8bpc_rvv: 520.8 ( 0.64x) pal_pred_w8_16bpc_c: 412.7 ( 1.00x) pal_pred_w8_16bpc_rvv: 386.2 ( 1.07x) pal_pred_w16_8bpc_c: 1044.4 ( 1.00x) pal_pred_w16_8bpc_rvv: 842.8 ( 1.24x) pal_pred_w16_16bpc_c: 1300.3 ( 1.00x) pal_pred_w16_16bpc_rvv: 619.9 ( 2.10x) pal_pred_w32_8bpc_c: 2452.8 ( 1.00x) pal_pred_w32_8bpc_rvv: 1016.1 ( 2.41x) pal_pred_w32_16bpc_c: 3072.1 ( 1.00x) pal_pred_w32_16bpc_rvv: 1440.5 ( 2.13x) pal_pred_w64_8bpc_c: 6015.8 ( 1.00x) pal_pred_w64_8bpc_rvv: 2505.5 ( 2.40x) pal_pred_w64_16bpc_c: 7552.4 ( 1.00x) pal_pred_w64_16bpc_rvv: 3512.7 ( 2.15x) - Banana Pi BPI-F3: pal_pred_w4_8bpc_c: 102.2 ( 1.00x) pal_pred_w4_8bpc_rvv: 511.2 ( 0.20x) pal_pred_w4_16bpc_c: 137.7 ( 1.00x) pal_pred_w4_16bpc_rvv: 330.9 ( 0.42x) pal_pred_w8_8bpc_c: 289.2 ( 1.00x) pal_pred_w8_8bpc_rvv: 819.6 ( 0.35x) pal_pred_w8_16bpc_c: 402.6 ( 1.00x) pal_pred_w8_16bpc_rvv: 520.7 ( 0.77x) pal_pred_w16_8bpc_c: 894.5 ( 1.00x) pal_pred_w16_8bpc_rvv: 1326.6 ( 0.67x) pal_pred_w16_16bpc_c: 1268.6 ( 1.00x) pal_pred_w16_16bpc_rvv: 845.8 ( 1.50x) pal_pred_w32_8bpc_c: 2094.5 ( 1.00x) pal_pred_w32_8bpc_rvv: 1610.9 ( 1.30x) pal_pred_w32_16bpc_c: 2999.4 ( 1.00x) pal_pred_w32_16bpc_rvv: 1029.8 ( 2.91x) pal_pred_w64_8bpc_c: 5128.0 ( 1.00x) pal_pred_w64_8bpc_rvv: 2000.8 ( 2.56x) pal_pred_w64_16bpc_c: 7375.0 ( 1.00x) pal_pred_w64_16bpc_rvv: 2518.2 ( 2.93x)
-
Bogdan Gligorijević authored
Benchmarks: - Kendryte K230: intra_pred_smooth_w4_8bpc_c: 392.6 ( 1.00x) intra_pred_smooth_w4_8bpc_rvv: 311.3 ( 1.26x) intra_pred_smooth_w8_8bpc_c: 1204.1 ( 1.00x) intra_pred_smooth_w8_8bpc_rvv: 488.9 ( 2.46x) intra_pred_smooth_w16_8bpc_c: 3885.9 ( 1.00x) intra_pred_smooth_w16_8bpc_rvv: 796.6 ( 4.88x) intra_pred_smooth_w32_8bpc_c: 9305.7 ( 1.00x) intra_pred_smooth_w32_8bpc_rvv: 1806.7 ( 5.15x) intra_pred_smooth_w64_8bpc_c: 23043.0 ( 1.00x) intra_pred_smooth_w64_8bpc_rvv: 4344.3 ( 5.30x) - spacemiT K1: intra_pred_smooth_w4_8bpc_c: 384.1 ( 1.00x) intra_pred_smooth_w4_8bpc_rvv: 322.2 ( 1.19x) intra_pred_smooth_w8_8bpc_c: 1177.6 ( 1.00x) intra_pred_smooth_w8_8bpc_rvv: 507.1 ( 2.32x) intra_pred_smooth_w16_8bpc_c: 3801.2 ( 1.00x) intra_pred_smooth_w16_8bpc_rvv: 814.4 ( 4.67x) intra_pred_smooth_w32_8bpc_c: 9103.1 ( 1.00x) intra_pred_smooth_w32_8bpc_rvv: 980.8 ( 9.28x) intra_pred_smooth_w64_8bpc_c: 22540.1 ( 1.00x) intra_pred_smooth_w64_8bpc_rvv: 2319.3 ( 9.72x)
-
Bogdan Gligorijević authored
Benchmarks: - Kendryte K230: cfl_pred_cfl_128_w4_8bpc_c: 497.3 ( 1.00x) cfl_pred_cfl_128_w4_8bpc_rvv: 369.6 ( 1.35x) cfl_pred_cfl_128_w4_16bpc_c: 425.2 ( 1.00x) cfl_pred_cfl_128_w4_16bpc_rvv: 385.5 ( 1.10x) cfl_pred_cfl_128_w8_8bpc_c: 1544.2 ( 1.00x) cfl_pred_cfl_128_w8_8bpc_rvv: 584.2 ( 2.64x) cfl_pred_cfl_128_w8_16bpc_c: 1306.2 ( 1.00x) cfl_pred_cfl_128_w8_16bpc_rvv: 608.8 ( 2.15x) cfl_pred_cfl_128_w16_8bpc_c: 3085.6 ( 1.00x) cfl_pred_cfl_128_w16_8bpc_rvv: 584.2 ( 5.28x) cfl_pred_cfl_128_w16_16bpc_c: 2657.1 ( 1.00x) cfl_pred_cfl_128_w16_16bpc_rvv: 608.9 ( 4.36x) cfl_pred_cfl_128_w32_8bpc_c: 8405.6 ( 1.00x) cfl_pred_cfl_128_w32_8bpc_rvv: 1416.1 ( 5.94x) cfl_pred_cfl_128_w32_16bpc_c: 7199.9 ( 1.00x) cfl_pred_cfl_128_w32_16bpc_rvv: 1479.8 ( 4.87x) cfl_pred_cfl_left_w4_8bpc_c: 553.1 ( 1.00x) cfl_pred_cfl_left_w4_8bpc_rvv: 395.6 ( 1.40x) cfl_pred_cfl_left_w4_16bpc_c: 486.7 ( 1.00x) cfl_pred_cfl_left_w4_16bpc_rvv: 409.1 ( 1.19x) cfl_pred_cfl_left_w8_8bpc_c: 1610.8 ( 1.00x) cfl_pred_cfl_left_w8_8bpc_rvv: 610.4 ( 2.64x) cfl_pred_cfl_left_w8_16bpc_c: 1378.0 ( 1.00x) cfl_pred_cfl_left_w8_16bpc_rvv: 636.2 ( 2.17x) cfl_pred_cfl_left_w16_8bpc_c: 3154.4 ( 1.00x) cfl_pred_cfl_left_w16_8bpc_rvv: 610.4 ( 5.17x) cfl_pred_cfl_left_w16_16bpc_c: 2733.2 ( 1.00x) cfl_pred_cfl_left_w16_16bpc_rvv: 636.3 ( 4.30x) cfl_pred_cfl_left_w32_8bpc_c: 8451.7 ( 1.00x) cfl_pred_cfl_left_w32_8bpc_rvv: 1442.5 ( 5.86x) cfl_pred_cfl_left_w32_16bpc_c: 7267.2 ( 1.00x) cfl_pred_cfl_left_w32_16bpc_rvv: 1509.4 ( 4.81x) cfl_pred_cfl_top_w4_8bpc_c: 544.7 ( 1.00x) cfl_pred_cfl_top_w4_8bpc_rvv: 395.8 ( 1.38x) cfl_pred_cfl_top_w4_16bpc_c: 475.1 ( 1.00x) cfl_pred_cfl_top_w4_16bpc_rvv: 406.7 ( 1.17x) cfl_pred_cfl_top_w8_8bpc_c: 1599.3 ( 1.00x) cfl_pred_cfl_top_w8_8bpc_rvv: 610.4 ( 2.62x) cfl_pred_cfl_top_w8_16bpc_c: 1363.8 ( 1.00x) cfl_pred_cfl_top_w8_16bpc_rvv: 630.3 ( 2.16x) cfl_pred_cfl_top_w16_8bpc_c: 3161.0 ( 1.00x) cfl_pred_cfl_top_w16_8bpc_rvv: 610.5 ( 5.18x) cfl_pred_cfl_top_w16_16bpc_c: 2735.9 ( 1.00x) cfl_pred_cfl_top_w16_16bpc_rvv: 634.3 ( 4.31x) cfl_pred_cfl_top_w32_8bpc_c: 8564.4 ( 1.00x) cfl_pred_cfl_top_w32_8bpc_rvv: 1442.8 ( 5.94x) cfl_pred_cfl_top_w32_16bpc_c: 7294.9 ( 1.00x) cfl_pred_cfl_top_w32_16bpc_rvv: 1511.5 ( 4.83x) cfl_pred_cfl_w4_8bpc_c: 571.5 ( 1.00x) cfl_pred_cfl_w4_8bpc_rvv: 421.0 ( 1.36x) cfl_pred_cfl_w4_16bpc_c: 499.1 ( 1.00x) cfl_pred_cfl_w4_16bpc_rvv: 462.8 ( 1.08x) cfl_pred_cfl_w8_8bpc_c: 1642.0 ( 1.00x) cfl_pred_cfl_w8_8bpc_rvv: 635.8 ( 2.58x) cfl_pred_cfl_w8_16bpc_c: 1401.4 ( 1.00x) cfl_pred_cfl_w8_16bpc_rvv: 686.1 ( 2.04x) cfl_pred_cfl_w16_8bpc_c: 3204.3 ( 1.00x) cfl_pred_cfl_w16_8bpc_rvv: 635.8 ( 5.04x) cfl_pred_cfl_w16_16bpc_c: 2784.8 ( 1.00x) cfl_pred_cfl_w16_16bpc_rvv: 686.1 ( 4.06x) cfl_pred_cfl_w32_8bpc_c: 8623.9 ( 1.00x) cfl_pred_cfl_w32_8bpc_rvv: 1465.9 ( 5.88x) cfl_pred_cfl_w32_16bpc_c: 7357.8 ( 1.00x) cfl_pred_cfl_w32_16bpc_rvv: 1556.3 ( 4.73x) - Banana Pi BPI-F3: cfl_pred_cfl_128_w4_8bpc_c: 485.5 ( 1.00x) cfl_pred_cfl_128_w4_8bpc_rvv: 366.4 ( 1.33x) cfl_pred_cfl_128_w4_16bpc_c: 393.5 ( 1.00x) cfl_pred_cfl_128_w4_16bpc_rvv: 378.7 ( 1.04x) cfl_pred_cfl_128_w8_8bpc_c: 1507.9 ( 1.00x) cfl_pred_cfl_128_w8_8bpc_rvv: 577.4 ( 2.61x) cfl_pred_cfl_128_w8_16bpc_c: 1205.7 ( 1.00x) cfl_pred_cfl_128_w8_16bpc_rvv: 605.1 ( 1.99x) cfl_pred_cfl_128_w16_8bpc_c: 3019.3 ( 1.00x) cfl_pred_cfl_128_w16_8bpc_rvv: 577.4 ( 5.23x) cfl_pred_cfl_128_w16_16bpc_c: 2506.5 ( 1.00x) cfl_pred_cfl_128_w16_16bpc_rvv: 605.1 ( 4.14x) cfl_pred_cfl_128_w32_8bpc_c: 8170.0 ( 1.00x) cfl_pred_cfl_128_w32_8bpc_rvv: 715.6 (11.42x) cfl_pred_cfl_128_w32_16bpc_c: 6686.7 ( 1.00x) cfl_pred_cfl_128_w32_16bpc_rvv: 749.7 ( 8.92x) cfl_pred_cfl_left_w4_8bpc_c: 539.4 ( 1.00x) cfl_pred_cfl_left_w4_8bpc_rvv: 393.2 ( 1.37x) cfl_pred_cfl_left_w4_16bpc_c: 452.0 ( 1.00x) cfl_pred_cfl_left_w4_16bpc_rvv: 401.2 ( 1.13x) cfl_pred_cfl_left_w8_8bpc_c: 1572.4 ( 1.00x) cfl_pred_cfl_left_w8_8bpc_rvv: 604.1 ( 2.60x) cfl_pred_cfl_left_w8_16bpc_c: 1274.5 ( 1.00x) cfl_pred_cfl_left_w8_16bpc_rvv: 629.0 ( 2.03x) cfl_pred_cfl_left_w16_8bpc_c: 3096.0 ( 1.00x) cfl_pred_cfl_left_w16_8bpc_rvv: 604.1 ( 5.13x) cfl_pred_cfl_left_w16_16bpc_c: 2591.4 ( 1.00x) cfl_pred_cfl_left_w16_16bpc_rvv: 629.0 ( 4.12x) cfl_pred_cfl_left_w32_8bpc_c: 8266.0 ( 1.00x) cfl_pred_cfl_left_w32_8bpc_rvv: 742.4 (11.13x) cfl_pred_cfl_left_w32_16bpc_c: 6758.0 ( 1.00x) cfl_pred_cfl_left_w32_16bpc_rvv: 773.9 ( 8.73x) cfl_pred_cfl_top_w4_8bpc_c: 532.3 ( 1.00x) cfl_pred_cfl_top_w4_8bpc_rvv: 392.6 ( 1.36x) cfl_pred_cfl_top_w4_16bpc_c: 440.4 ( 1.00x) cfl_pred_cfl_top_w4_16bpc_rvv: 399.6 ( 1.10x) cfl_pred_cfl_top_w8_8bpc_c: 1563.3 ( 1.00x) cfl_pred_cfl_top_w8_8bpc_rvv: 603.6 ( 2.59x) cfl_pred_cfl_top_w8_16bpc_c: 1271.6 ( 1.00x) cfl_pred_cfl_top_w8_16bpc_rvv: 626.1 ( 2.03x) cfl_pred_cfl_top_w16_8bpc_c: 3098.6 ( 1.00x) cfl_pred_cfl_top_w16_8bpc_rvv: 603.6 ( 5.13x) cfl_pred_cfl_top_w16_16bpc_c: 2562.8 ( 1.00x) cfl_pred_cfl_top_w16_16bpc_rvv: 626.0 ( 4.09x) cfl_pred_cfl_top_w32_8bpc_c: 8278.1 ( 1.00x) cfl_pred_cfl_top_w32_8bpc_rvv: 741.8 (11.16x) cfl_pred_cfl_top_w32_16bpc_c: 6799.1 ( 1.00x) cfl_pred_cfl_top_w32_16bpc_rvv: 775.0 ( 8.77x) cfl_pred_cfl_w4_8bpc_c: 559.8 ( 1.00x) cfl_pred_cfl_w4_8bpc_rvv: 421.7 ( 1.33x) cfl_pred_cfl_w4_16bpc_c: 470.2 ( 1.00x) cfl_pred_cfl_w4_16bpc_rvv: 451.3 ( 1.04x) cfl_pred_cfl_w8_8bpc_c: 1605.5 ( 1.00x) cfl_pred_cfl_w8_8bpc_rvv: 632.8 ( 2.54x) cfl_pred_cfl_w8_16bpc_c: 1308.5 ( 1.00x) cfl_pred_cfl_w8_16bpc_rvv: 677.9 ( 1.93x) cfl_pred_cfl_w16_8bpc_c: 3135.0 ( 1.00x) cfl_pred_cfl_w16_8bpc_rvv: 632.9 ( 4.95x) cfl_pred_cfl_w16_16bpc_c: 2625.9 ( 1.00x) cfl_pred_cfl_w16_16bpc_rvv: 677.9 ( 3.87x) cfl_pred_cfl_w32_8bpc_c: 8376.6 ( 1.00x) cfl_pred_cfl_w32_8bpc_rvv: 770.4 (10.87x) cfl_pred_cfl_w32_16bpc_c: 6866.4 ( 1.00x) cfl_pred_cfl_w32_16bpc_rvv: 822.7 ( 8.35x)
-
Bogdan Gligorijević authored
Benchmarks: - Kendryte K230: cdef_filter_4x4_01_8bpc_c: 1339.4 ( 1.00x) cdef_filter_4x4_01_8bpc_rvv: 836.2 ( 1.60x) cdef_filter_4x4_01_16bpc_c: 1369.1 ( 1.00x) cdef_filter_4x4_01_16bpc_rvv: 824.7 ( 1.66x) cdef_filter_4x4_10_8bpc_c: 872.8 ( 1.00x) cdef_filter_4x4_10_8bpc_rvv: 523.9 ( 1.67x) cdef_filter_4x4_10_16bpc_c: 938.2 ( 1.00x) cdef_filter_4x4_10_16bpc_rvv: 517.1 ( 1.81x) cdef_filter_4x4_11_8bpc_c: 2668.3 ( 1.00x) cdef_filter_4x4_11_8bpc_rvv: 1285.0 ( 2.08x) cdef_filter_4x4_11_16bpc_c: 2922.1 ( 1.00x) cdef_filter_4x4_11_16bpc_rvv: 1291.0 ( 2.26x) cdef_filter_4x8_01_8bpc_c: 2489.1 ( 1.00x) cdef_filter_4x8_01_8bpc_rvv: 1594.3 ( 1.56x) cdef_filter_4x8_01_16bpc_c: 2528.1 ( 1.00x) cdef_filter_4x8_01_16bpc_rvv: 1566.6 ( 1.61x) cdef_filter_4x8_10_8bpc_c: 1576.9 ( 1.00x) cdef_filter_4x8_10_8bpc_rvv: 967.1 ( 1.63x) cdef_filter_4x8_10_16bpc_c: 1641.3 ( 1.00x) cdef_filter_4x8_10_16bpc_rvv: 947.1 ( 1.73x) cdef_filter_4x8_11_8bpc_c: 5164.0 ( 1.00x) cdef_filter_4x8_11_8bpc_rvv: 2490.7 ( 2.07x) cdef_filter_4x8_11_16bpc_c: 5732.3 ( 1.00x) cdef_filter_4x8_11_16bpc_rvv: 2499.2 ( 2.29x) cdef_filter_8x8_01_8bpc_c: 4742.3 ( 1.00x) cdef_filter_8x8_01_8bpc_rvv: 1628.6 ( 2.91x) cdef_filter_8x8_01_16bpc_c: 4785.0 ( 1.00x) cdef_filter_8x8_01_16bpc_rvv: 1595.5 ( 3.00x) cdef_filter_8x8_10_8bpc_c: 2962.4 ( 1.00x) cdef_filter_8x8_10_8bpc_rvv: 1000.8 ( 2.96x) cdef_filter_8x8_10_16bpc_c: 3022.4 ( 1.00x) cdef_filter_8x8_10_16bpc_rvv: 975.7 ( 3.10x) cdef_filter_8x8_11_8bpc_c: 12623.9 ( 1.00x) cdef_filter_8x8_11_8bpc_rvv: 2525.4 ( 5.00x) cdef_filter_8x8_11_16bpc_c: 12470.7 ( 1.00x) cdef_filter_8x8_11_16bpc_rvv: 2528.2 ( 4.93x) - Banana Pi BPI-F3: cdef_filter_4x4_01_8bpc_c: 1281.2 ( 1.00x) cdef_filter_4x4_01_8bpc_rvv: 813.0 ( 1.58x) cdef_filter_4x4_01_16bpc_c: 1300.8 ( 1.00x) cdef_filter_4x4_01_16bpc_rvv: 808.9 ( 1.61x) cdef_filter_4x4_10_8bpc_c: 843.0 ( 1.00x) cdef_filter_4x4_10_8bpc_rvv: 498.4 ( 1.69x) cdef_filter_4x4_10_16bpc_c: 903.6 ( 1.00x) cdef_filter_4x4_10_16bpc_rvv: 497.9 ( 1.81x) cdef_filter_4x4_11_8bpc_c: 2614.1 ( 1.00x) cdef_filter_4x4_11_8bpc_rvv: 1219.6 ( 2.14x) cdef_filter_4x4_11_16bpc_c: 2795.6 ( 1.00x) cdef_filter_4x4_11_16bpc_rvv: 1243.1 ( 2.25x) cdef_filter_4x8_01_8bpc_c: 2405.4 ( 1.00x) cdef_filter_4x8_01_8bpc_rvv: 1548.5 ( 1.55x) cdef_filter_4x8_01_16bpc_c: 2402.7 ( 1.00x) cdef_filter_4x8_01_16bpc_rvv: 1542.7 ( 1.56x) cdef_filter_4x8_10_8bpc_c: 1522.0 ( 1.00x) cdef_filter_4x8_10_8bpc_rvv: 917.4 ( 1.66x) cdef_filter_4x8_10_16bpc_c: 1589.2 ( 1.00x) cdef_filter_4x8_10_16bpc_rvv: 915.9 ( 1.74x) cdef_filter_4x8_11_8bpc_c: 5050.7 ( 1.00x) cdef_filter_4x8_11_8bpc_rvv: 2358.7 ( 2.14x) cdef_filter_4x8_11_16bpc_c: 5510.5 ( 1.00x) cdef_filter_4x8_11_16bpc_rvv: 2411.6 ( 2.28x) cdef_filter_8x8_01_8bpc_c: 4558.3 ( 1.00x) cdef_filter_8x8_01_8bpc_rvv: 1579.7 ( 2.89x) cdef_filter_8x8_01_16bpc_c: 4551.1 ( 1.00x) cdef_filter_8x8_01_16bpc_rvv: 1571.1 ( 2.90x) cdef_filter_8x8_10_8bpc_c: 2869.3 ( 1.00x) cdef_filter_8x8_10_8bpc_rvv: 948.4 ( 3.03x) cdef_filter_8x8_10_16bpc_c: 2928.6 ( 1.00x) cdef_filter_8x8_10_16bpc_rvv: 944.2 ( 3.10x) cdef_filter_8x8_11_8bpc_c: 12317.5 ( 1.00x) cdef_filter_8x8_11_8bpc_rvv: 2389.7 ( 5.15x) cdef_filter_8x8_11_16bpc_c: 11950.6 ( 1.00x) cdef_filter_8x8_11_16bpc_rvv: 2440.1 ( 4.90x)
-
Bogdan Gligorijević authored
Benchmarks: - Kendryte K230: pal_idx_finish_w4_c: 122.5 ( 1.00x) pal_idx_finish_w4_rvv: 107.2 ( 1.14x) pal_idx_finish_w8_c: 302.8 ( 1.00x) pal_idx_finish_w8_rvv: 197.9 ( 1.53x) pal_idx_finish_w16_c: 868.2 ( 1.00x) pal_idx_finish_w16_rvv: 438.5 ( 1.98x) pal_idx_finish_w32_c: 1966.5 ( 1.00x) pal_idx_finish_w32_rvv: 833.0 ( 2.36x) pal_idx_finish_w64_c: 4737.5 ( 1.00x) pal_idx_finish_w64_rvv: 1818.3 ( 2.61x) - Banana Pi BPI-F3: pal_idx_finish_w4_c: 122.4 ( 1.00x) pal_idx_finish_w4_rvv: 132.0 ( 0.93x) pal_idx_finish_w8_c: 289.4 ( 1.00x) pal_idx_finish_w8_rvv: 195.8 ( 1.48x) pal_idx_finish_w16_c: 788.0 ( 1.00x) pal_idx_finish_w16_rvv: 430.6 ( 1.83x) pal_idx_finish_w32_c: 1699.2 ( 1.00x) pal_idx_finish_w32_rvv: 816.3 ( 2.08x) pal_idx_finish_w64_c: 3977.7 ( 1.00x) pal_idx_finish_w64_rvv: 1779.4 ( 2.24x)
-
Nathan E. Egge authored
-
- Oct 07, 2024
-
-
Henrik Gramner authored
Instead of using gathers we can calculate the value of sgr_x_by_x[min(z, 255)] by doing 256 / (z + 1) in floating-point with some clipping for z == 0 and z >= 255. As the required precision of the division is fairly small it can be performed using an approximate reciprocal, which is significantly faster than a regular division. Gather instructions are slow on all AMD CPU:s, and on most Intel CPU:s ever since µcode updates were issued as a workaround for the Gather Data Sampling side channel vulnerability.
-
- Oct 02, 2024
-
-
Luca Barbato authored
-
- Sep 30, 2024
-
-
Change-Id: I78fe788113ff2487ba1ce2e7d0c7d7c78c5a8c58
-
Change-Id: I1566e8145d36296f2c76107cf15fc2cc7ac0ecc7
-
The performance data is as follows: save_tmvs_c: 3938.6 ( 1.00x) save_tmvs_lsx: 1355.3 ( 2.91x)
-
bench performance before: lpf_h_sb_y_w16_8bpc_c: 117.0 ( 1.00x) lpf_h_sb_y_w16_8bpc_lsx: 33.9 ( 3.46x) lpf_v_sb_y_w16_8bpc_c: 132.1 ( 1.00x) lpf_v_sb_y_w16_8bpc_lsx: 59.7 ( 2.21x) bench performance after: lpf_h_sb_y_w16_8bpc_c: 114.9 ( 1.00x) lpf_h_sb_y_w16_8bpc_lsx: 32.0 ( 3.59x) lpf_v_sb_y_w16_8bpc_c: 132.5 ( 1.00x) lpf_v_sb_y_w16_8bpc_lsx: 28.1 ( 4.72x) Change-Id: Ie64e164a9416c438f6b3881ce18fb42e2ddd073d
-
sgr_3x3_8bpc_c: 27233.1 ( 1.00x) sgr_3x3_8bpc_lsx: 12874.7 ( 2.12x) sgr_3x3_8bpc_lasx: 10183.7 ( 2.67x) Change-Id: I2aa469e8560733d6191396186bf776a12ad6e4a3
-
before: warp_8x8_8bpc_c: 109.8 ( 1.00x) warp_8x8_8bpc_lsx: 44.6 ( 2.46x) warp_8x8t_8bpc_c: 97.5 ( 1.00x) warp_8x8t_8bpc_lsx: 43.7 ( 2.23x) after: warp_8x8_8bpc_c: 109.8 ( 1.00x) warp_8x8_8bpc_lsx: 39.2 ( 2.80x) warp_8x8t_8bpc_c: 97.5 ( 1.00x) warp_8x8t_8bpc_lsx: 37.9 ( 2.57x) Change-Id: I11728c2c30821b8e2b1c85208710dfe5d1c1269c
-
mct_8tap_regular_w8_h_8bpc_c: 47.1 ( 1.00x) mct_8tap_regular_w8_h_8bpc_lsx: 6.3 ( 7.46x) mct_8tap_regular_w8_h_8bpc_lasx: 4.4 (10.80x) mct_8tap_regular_w8_hv_8bpc_c: 118.9 ( 1.00x) mct_8tap_regular_w8_hv_8bpc_lsx: 19.2 ( 6.20x) mct_8tap_regular_w8_hv_8bpc_lasx: 13.7 ( 8.69x) mct_8tap_regular_w8_v_8bpc_c: 60.3 ( 1.00x) mct_8tap_regular_w8_v_8bpc_lsx: 5.4 (11.08x) mct_8tap_regular_w8_v_8bpc_lasx: 3.3 (18.33x) Change-Id: I1140f6ffbd738166f2581bc9111ebbdf6f9fa72c
-