riscv64/mc: Only process w*3/4 elements in blend_v
Setting VL for this function only impacts the 16bpc performance and only
on the SpacemiT K1 which has two vector units of length 128b each.
Kendryte K230 Before After Delta
blend_v_w2_8bpc_c: 220.0 ( 1.00x) 221.3 ( 1.00x) 0.59%
blend_v_w2_8bpc_rvv: 145.7 ( 1.51x) 148.2 ( 1.49x) 1.72%
blend_v_w4_8bpc_c: 942.1 ( 1.00x) 943.7 ( 1.00x) 0.17%
blend_v_w4_8bpc_rvv: 240.4 ( 3.92x) 242.9 ( 3.89x) 1.04%
blend_v_w8_8bpc_c: 1782.3 ( 1.00x) 1783.8 ( 1.00x) 0.08%
blend_v_w8_8bpc_rvv: 252.6 ( 7.06x) 254.9 ( 7.00x) 0.91%
blend_v_w16_8bpc_c: 3650.9 ( 1.00x) 3647.0 ( 1.00x) -0.11%
blend_v_w16_8bpc_rvv: 495.5 ( 7.37x) 494.4 ( 7.38x) -0.22%
blend_v_w32_8bpc_c: 7013.0 ( 1.00x) 7018.2 ( 1.00x) 0.07%
blend_v_w32_8bpc_rvv: 807.9 ( 8.68x) 802.0 ( 8.75x) -0.73%
blend_v_w2_16bpc_c: 226.1 ( 1.00x) 225.5 ( 1.00x) -0.27%
blend_v_w2_16bpc_rvv: 148.6 ( 1.52x) 148.9 ( 1.51x) 0.20%
blend_v_w4_16bpc_c: 1010.7 ( 1.00x) 1006.7 ( 1.00x) -0.40%
blend_v_w4_16bpc_rvv: 306.7 ( 3.30x) 307.4 ( 3.27x) 0.23%
blend_v_w8_16bpc_c: 1990.2 ( 1.00x) 1996.1 ( 1.00x) 0.30%
blend_v_w8_16bpc_rvv: 519.5 ( 3.83x) 523.4 ( 3.81x) 0.75%
blend_v_w16_16bpc_c: 3744.5 ( 1.00x) 3742.4 ( 1.00x) -0.06%
blend_v_w16_16bpc_rvv: 899.6 ( 4.16x) 906.4 ( 4.13x) 0.76%
blend_v_w32_16bpc_c: 7047.5 ( 1.00x) 7079.3 ( 1.00x) 0.45%
blend_v_w32_16bpc_rvv: 1475.5 ( 4.78x) 1483.3 ( 4.77x) 0.53%
SpacemiT K1 Before After Delta
blend_v_w2_8bpc_c: 216.3 ( 1.00x) 214.4 ( 1.00x) -0.88%
blend_v_w2_8bpc_rvv: 144.0 ( 1.50x) 143.6 ( 1.49x) -0.28%
blend_v_w4_8bpc_c: 919.8 ( 1.00x) 918.1 ( 1.00x) -0.18%
blend_v_w4_8bpc_rvv: 236.6 ( 3.89x) 236.4 ( 3.88x) -0.08%
blend_v_w8_8bpc_c: 1739.3 ( 1.00x) 1736.8 ( 1.00x) -0.14%
blend_v_w8_8bpc_rvv: 236.8 ( 7.34x) 236.3 ( 7.35x) -0.21%
blend_v_w16_8bpc_c: 3374.7 ( 1.00x) 3374.9 ( 1.00x) 0.01%
blend_v_w16_8bpc_rvv: 297.0 (11.36x) 296.8 (11.37x) -0.07%
blend_v_w32_8bpc_c: 6647.5 ( 1.00x) 6645.5 ( 1.00x) -0.03%
blend_v_w32_8bpc_rvv: 403.3 (16.48x) 402.4 (16.51x) -0.22%
blend_v_w2_16bpc_c: 221.4 ( 1.00x) 220.1 ( 1.00x) -0.59%
blend_v_w2_16bpc_rvv: 146.3 ( 1.51x) 147.3 ( 1.49x) 0.68%
blend_v_w4_16bpc_c: 973.3 ( 1.00x) 972.7 ( 1.00x) -0.06%
blend_v_w4_16bpc_rvv: 280.3 ( 3.47x) 282.1 ( 3.45x) 0.64%
blend_v_w8_16bpc_c: 1814.8 ( 1.00x) 1816.2 ( 1.00x) 0.08%
blend_v_w8_16bpc_rvv: 376.6 ( 4.82x) 376.9 ( 4.82x) 0.08%
blend_v_w16_16bpc_c: 3485.5 ( 1.00x) 3485.5 ( 1.00x) 0.00%
blend_v_w16_16bpc_rvv: 531.1 ( 6.56x) 525.6 ( 6.63x) -1.04%
blend_v_w32_16bpc_c: 6788.3 ( 1.00x) 6778.8 ( 1.00x) -0.14%
blend_v_w32_16bpc_rvv: 904.5 ( 7.51x) 854.6 ( 7.93x) -5.52%
Edited by Nathan E. Egge
Merge request reports
Activity
Filter activity
changed milestone to %1.5.1
added RISC-V performance labels
added 1 commit
- a17c8625 - riscv64/mc: Only process w*3/4 elements in blend_v
enabled an automatic merge when the pipeline for a17c8625 succeeds
Please register or sign in to reply