arm32: filmgrain: Add NEON implementation of gen_grain for 8 bpc
Relative speedup over C code: Cortex A7 A8 A9 A53 A72 A73 gen_grain_uv_ar0_8bpc_420_neon: 6.13 7.81 8.17 6.78 6.62 11.13 gen_grain_uv_ar0_8bpc_422_neon: 6.34 7.64 8.00 6.83 6.93 10.31 gen_grain_uv_ar0_8bpc_444_neon: 7.09 8.29 8.55 7.95 7.89 11.05 gen_grain_uv_ar1_8bpc_420_neon: 3.39 2.26 3.06 4.13 3.41 4.95 gen_grain_uv_ar1_8bpc_422_neon: 3.40 2.23 3.02 4.18 3.36 4.73 gen_grain_uv_ar1_8bpc_444_neon: 3.46 2.18 2.95 4.46 3.57 4.91 gen_grain_uv_ar2_8bpc_420_neon: 3.88 3.00 3.32 4.74 3.57 5.31 gen_grain_uv_ar2_8bpc_422_neon: 3.92 3.04 3.36 4.82 3.57 5.06 gen_grain_uv_ar2_8bpc_444_neon: 4.32 3.14 3.62 5.56 3.90 5.43 gen_grain_uv_ar3_8bpc_420_neon: 4.35 3.53 4.05 5.35 4.44 5.56 gen_grain_uv_ar3_8bpc_422_neon: 4.38 3.49 4.17 5.41 4.48 5.36 gen_grain_uv_ar3_8bpc_444_neon: 4.84 3.70 4.36 5.95 4.87 5.82 gen_grain_y_ar0_8bpc_neon: 5.18 5.57 7.65 5.93 7.13 9.01 gen_grain_y_ar1_8bpc_neon: 2.64 1.66 2.48 3.32 3.15 3.77 gen_grain_y_ar2_8bpc_neon: 3.57 2.64 3.21 4.59 3.68 4.64 gen_grain_y_ar3_8bpc_neon: 4.27 3.93 4.12 5.41 4.63 5.17 (A73 is benched against C code compiled with a different C compiler, which can explain the slightly differing numbers there.) Absolute numbers: Cortex A7 A8 A9 A53 A72 A73 gen_grain_uv_ar0_8bpc_420_neon: 19614.6 13396.4 12320.4 15030.7 8288.1 8754.4 gen_grain_uv_ar0_8bpc_422_neon: 34660.9 24315.5 22225.3 26809.2 14549.8 15804.6 gen_grain_uv_ar0_8bpc_444_neon: 55625.6 39914.5 37100.2 44658.3 22917.3 27369.6 gen_grain_uv_ar1_8bpc_420_neon: 50049.5 63179.4 44793.1 36406.7 22690.3 25401.9 gen_grain_uv_ar1_8bpc_422_neon: 93289.5 117755.0 82815.4 67081.4 43133.1 46698.0 gen_grain_uv_ar1_8bpc_444_neon: 170880.0 223259.2 156241.5 122760.0 78655.6 85604.9 gen_grain_uv_ar2_8bpc_420_neon: 68185.5 78123.2 61457.3 47886.7 31526.2 36519.6 gen_grain_uv_ar2_8bpc_422_neon: 129195.2 148653.9 114133.2 89822.7 60242.6 70160.1 gen_grain_uv_ar2_8bpc_444_neon: 233133.7 272277.4 214108.7 161589.5 109069.3 127763.7 gen_grain_uv_ar3_8bpc_420_neon: 96374.4 94372.2 79663.8 70832.0 43065.3 50593.9 gen_grain_uv_ar3_8bpc_422_neon: 186324.8 184321.8 151490.1 136200.1 83758.0 98378.7 gen_grain_uv_ar3_8bpc_444_neon: 335596.6 336811.6 279755.5 247251.5 151657.2 178906.0 gen_grain_y_ar0_8bpc_neon: 46109.3 36022.2 28476.2 36478.5 18740.1 20660.4 gen_grain_y_ar1_8bpc_neon: 165054.2 217090.4 152578.9 118409.4 74357.2 83794.5 gen_grain_y_ar2_8bpc_neon: 226576.9 268320.3 210924.6 157829.4 105956.5 124293.2 gen_grain_y_ar3_8bpc_neon: 328337.2 330421.3 275110.1 242097.3 148538.7 177270.8 Corresponding numbers for the original arm64 version: Cortex A53 A72 A73 gen_grain_uv_ar0_8bpc_420_neon: 14874.7 7765.5 8536.0 gen_grain_uv_ar0_8bpc_422_neon: 26510.9 13685.3 15308.2 gen_grain_uv_ar0_8bpc_444_neon: 43189.6 21565.3 24312.0 gen_grain_uv_ar1_8bpc_420_neon: 33715.7 21669.8 22758.3 gen_grain_uv_ar1_8bpc_422_neon: 63955.3 41581.4 42852.5 gen_grain_uv_ar1_8bpc_444_neon: 117390.1 76503.5 78446.4 gen_grain_uv_ar2_8bpc_420_neon: 42779.0 27794.3 29677.9 gen_grain_uv_ar2_8bpc_422_neon: 82283.8 53446.7 58232.2 gen_grain_uv_ar2_8bpc_444_neon: 147773.8 98492.7 103754.1 gen_grain_uv_ar3_8bpc_420_neon: 56698.8 35697.1 40695.9 gen_grain_uv_ar3_8bpc_422_neon: 110132.4 69829.1 79196.8 gen_grain_uv_ar3_8bpc_444_neon: 196642.7 124174.9 141812.5 gen_grain_y_ar0_8bpc_neon: 36461.0 17782.0 19827.0 gen_grain_y_ar1_8bpc_neon: 113202.7 72457.7 75995.8 gen_grain_y_ar2_8bpc_neon: 142894.0 94450.9 100304.5 gen_grain_y_ar3_8bpc_neon: 191697.7 120674.9 137223.8
Loading
Please register or sign in to comment