- Jan 03, 2022
-
-
- Jan 01, 2022
-
-
Niklas Haas authored
This change is motivated by a desire to be able to toggle between CPU and GPU film gain synthesis in players such as VLC. Because VLC initializes the codec before the vout (and, indeed, the active vout module may change in the middle of decoding), it cannot make the decision of whether to apply film grain in libdav1d as part of codec initialization. It needs to be decided on a frame-by-frame basis depending on whether the currently active vout supports film grain synthesis or not. Using the new API, users like VLC can simply set `apply_grain` to 0 and then manually call `dav1d_apply_grain` whenever the vout does not support GPU film grain synthesis. As a side note, `dav1d_apply_grain` could also technically be called from dedicated worker threads, something that libdav1d does not currently do internally. The alternative to this solution would have been to allow changing Dav1dSettings at runtime, but that would be more invasive and a proper API would also need to take other settings into consideration, some of which can't be changed as easily as `apply_grain`. This commit represents a stop-gap solution. Bump the minor version to allow clients to depend on this API.
-
- Dec 29, 2021
-
-
Matthias Dressel authored
-
Matthias Dressel authored
Co-authored-by:
Rudi Heitbaum <rudi@heitbaum.com>
-
- Dec 28, 2021
-
-
Matthias Dressel authored
Changed in d85fdf52
-
- Dec 13, 2021
-
-
-
Victorien Le Couviour--Tuffet authored
mc_scaled_8tap_regular_w2_8bpc_c: 1070.7 mc_scaled_8tap_regular_w2_8bpc_ssse3: 253.0 mc_scaled_8tap_regular_w2_dy1_8bpc_c: 1079.9 mc_scaled_8tap_regular_w2_dy1_8bpc_ssse3: 114.8 mc_scaled_8tap_regular_w2_dy2_8bpc_c: 1466.1 mc_scaled_8tap_regular_w2_dy2_8bpc_ssse3: 145.7 mc_scaled_8tap_regular_w4_8bpc_c: 1965.4 mc_scaled_8tap_regular_w4_8bpc_ssse3: 251.4 mc_scaled_8tap_regular_w4_dy1_8bpc_c: 1989.4 mc_scaled_8tap_regular_w4_dy1_8bpc_ssse3: 166.1 mc_scaled_8tap_regular_w4_dy2_8bpc_c: 2728.8 mc_scaled_8tap_regular_w4_dy2_8bpc_ssse3: 163.4 mc_scaled_8tap_regular_w8_8bpc_c: 3670.1 mc_scaled_8tap_regular_w8_8bpc_ssse3: 477.0 mc_scaled_8tap_regular_w8_dy1_8bpc_c: 3651.1 mc_scaled_8tap_regular_w8_dy1_8bpc_ssse3: 464.8 mc_scaled_8tap_regular_w8_dy2_8bpc_c: 5079.6 mc_scaled_8tap_regular_w8_dy2_8bpc_ssse3: 494.0 mc_scaled_8tap_regular_w16_8bpc_c: 8366.9 mc_scaled_8tap_regular_w16_8bpc_ssse3: 1197.4 mc_scaled_8tap_regular_w16_dy1_8bpc_c: 9088.5 mc_scaled_8tap_regular_w16_dy1_8bpc_ssse3: 1212.6 mc_scaled_8tap_regular_w16_dy2_8bpc_c: 13166.1 mc_scaled_8tap_regular_w16_dy2_8bpc_ssse3: 1301.4 mc_scaled_8tap_regular_w32_8bpc_c: 29883.7 mc_scaled_8tap_regular_w32_8bpc_ssse3: 3990.3 mc_scaled_8tap_regular_w32_dy1_8bpc_c: 23404.1 mc_scaled_8tap_regular_w32_dy1_8bpc_ssse3: 3617.4 mc_scaled_8tap_regular_w32_dy2_8bpc_c: 36248.3 mc_scaled_8tap_regular_w32_dy2_8bpc_ssse3: 3949.3 mc_scaled_8tap_regular_w64_8bpc_c: 57228.6 mc_scaled_8tap_regular_w64_8bpc_ssse3: 9359.4 mc_scaled_8tap_regular_w64_dy1_8bpc_c: 87271.8 mc_scaled_8tap_regular_w64_dy1_8bpc_ssse3: 12472.7 mc_scaled_8tap_regular_w64_dy2_8bpc_c: 135050.9 mc_scaled_8tap_regular_w64_dy2_8bpc_ssse3: 13585.4 mc_scaled_8tap_regular_w128_8bpc_c: 219123.0 mc_scaled_8tap_regular_w128_8bpc_ssse3: 31867.7 mc_scaled_8tap_regular_w128_dy1_8bpc_c: 240143.3 mc_scaled_8tap_regular_w128_dy1_8bpc_ssse3: 35275.7 mc_scaled_8tap_regular_w128_dy2_8bpc_c: 376357.7 mc_scaled_8tap_regular_w128_dy2_8bpc_ssse3: 39411.4 mct_scaled_8tap_regular_w4_8bpc_c: 1178.7 mct_scaled_8tap_regular_w4_8bpc_ssse3: 176.8 mct_scaled_8tap_regular_w4_dy1_8bpc_c: 1354.8 mct_scaled_8tap_regular_w4_dy1_8bpc_ssse3: 131.5 mct_scaled_8tap_regular_w4_dy2_8bpc_c: 1832.2 mct_scaled_8tap_regular_w4_dy2_8bpc_ssse3: 123.0 mct_scaled_8tap_regular_w8_8bpc_c: 3547.6 mct_scaled_8tap_regular_w8_8bpc_ssse3: 526.0 mct_scaled_8tap_regular_w8_dy1_8bpc_c: 3683.8 mct_scaled_8tap_regular_w8_dy1_8bpc_ssse3: 513.8 mct_scaled_8tap_regular_w8_dy2_8bpc_c: 5260.7 mct_scaled_8tap_regular_w8_dy2_8bpc_ssse3: 566.1 mct_scaled_8tap_regular_w16_8bpc_c: 8424.5 mct_scaled_8tap_regular_w16_8bpc_ssse3: 1340.0 mct_scaled_8tap_regular_w16_dy1_8bpc_c: 9515.8 mct_scaled_8tap_regular_w16_dy1_8bpc_ssse3: 1337.0 mct_scaled_8tap_regular_w16_dy2_8bpc_c: 14247.3 mct_scaled_8tap_regular_w16_dy2_8bpc_ssse3: 1492.7 mct_scaled_8tap_regular_w32_8bpc_c: 32059.9 mct_scaled_8tap_regular_w32_8bpc_ssse3: 5177.5 mct_scaled_8tap_regular_w32_dy1_8bpc_c: 32557.6 mct_scaled_8tap_regular_w32_dy1_8bpc_ssse3: 4889.9 mct_scaled_8tap_regular_w32_dy2_8bpc_c: 50844.2 mct_scaled_8tap_regular_w32_dy2_8bpc_ssse3: 5667.1 mct_scaled_8tap_regular_w64_8bpc_c: 59903.1 mct_scaled_8tap_regular_w64_8bpc_ssse3: 10453.6 mct_scaled_8tap_regular_w64_dy1_8bpc_c: 80298.8 mct_scaled_8tap_regular_w64_dy1_8bpc_ssse3: 12597.8 mct_scaled_8tap_regular_w64_dy2_8bpc_c: 127244.8 mct_scaled_8tap_regular_w64_dy2_8bpc_ssse3: 14677.9 mct_scaled_8tap_regular_w128_8bpc_c: 280097.0 mct_scaled_8tap_regular_w128_8bpc_ssse3: 41989.3 mct_scaled_8tap_regular_w128_dy1_8bpc_c: 208913.2 mct_scaled_8tap_regular_w128_dy1_8bpc_ssse3: 35525.2 mct_scaled_8tap_regular_w128_dy2_8bpc_c: 341367.6 mct_scaled_8tap_regular_w128_dy2_8bpc_ssse3: 41449.0
-
- Dec 04, 2021
-
-
Matthias Dressel authored
inv_txfm_add_16x8_adst_adst_0_12bpc_c: 4517.9 inv_txfm_add_16x8_adst_adst_0_12bpc_avx2: 432.4 inv_txfm_add_16x8_adst_adst_1_12bpc_c: 4510.9 inv_txfm_add_16x8_adst_adst_1_12bpc_avx2: 432.4 inv_txfm_add_16x8_adst_adst_2_12bpc_c: 4498.6 inv_txfm_add_16x8_adst_adst_2_12bpc_avx2: 432.4 inv_txfm_add_16x8_adst_dct_0_12bpc_c: 4553.8 inv_txfm_add_16x8_adst_dct_0_12bpc_avx2: 389.1 inv_txfm_add_16x8_adst_dct_1_12bpc_c: 4543.3 inv_txfm_add_16x8_adst_dct_1_12bpc_avx2: 389.1 inv_txfm_add_16x8_adst_dct_2_12bpc_c: 4538.4 inv_txfm_add_16x8_adst_dct_2_12bpc_avx2: 389.1 inv_txfm_add_16x8_adst_flipadst_0_12bpc_c: 4532.6 inv_txfm_add_16x8_adst_flipadst_0_12bpc_avx2: 435.4 inv_txfm_add_16x8_adst_flipadst_1_12bpc_c: 4520.4 inv_txfm_add_16x8_adst_flipadst_1_12bpc_avx2: 435.4 inv_txfm_add_16x8_adst_flipadst_2_12bpc_c: 4516.2 inv_txfm_add_16x8_adst_flipadst_2_12bpc_avx2: 435.4 inv_txfm_add_16x8_adst_identity_0_12bpc_c: 3502.3 inv_txfm_add_16x8_adst_identity_0_12bpc_avx2: 255.9 inv_txfm_add_16x8_adst_identity_1_12bpc_c: 3492.9 inv_txfm_add_16x8_adst_identity_1_12bpc_avx2: 256.3 inv_txfm_add_16x8_adst_identity_2_12bpc_c: 3471.4 inv_txfm_add_16x8_adst_identity_2_12bpc_avx2: 256.7 inv_txfm_add_16x8_dct_adst_0_12bpc_c: 4563.2 inv_txfm_add_16x8_dct_adst_0_12bpc_avx2: 383.6 inv_txfm_add_16x8_dct_adst_1_12bpc_c: 4573.1 inv_txfm_add_16x8_dct_adst_1_12bpc_avx2: 383.9 inv_txfm_add_16x8_dct_adst_2_12bpc_c: 4562.2 inv_txfm_add_16x8_dct_adst_2_12bpc_avx2: 383.7 inv_txfm_add_16x8_dct_dct_0_12bpc_c: 514.0 inv_txfm_add_16x8_dct_dct_0_12bpc_avx2: 25.0 inv_txfm_add_16x8_dct_dct_1_12bpc_c: 4540.5 inv_txfm_add_16x8_dct_dct_1_12bpc_avx2: 340.4 inv_txfm_add_16x8_dct_dct_2_12bpc_c: 4563.0 inv_txfm_add_16x8_dct_dct_2_12bpc_avx2: 339.3 inv_txfm_add_16x8_dct_flipadst_0_12bpc_c: 4568.0 inv_txfm_add_16x8_dct_flipadst_0_12bpc_avx2: 385.9 inv_txfm_add_16x8_dct_flipadst_1_12bpc_c: 4577.5 inv_txfm_add_16x8_dct_flipadst_1_12bpc_avx2: 385.8 inv_txfm_add_16x8_dct_flipadst_2_12bpc_c: 4573.8 inv_txfm_add_16x8_dct_flipadst_2_12bpc_avx2: 385.8 inv_txfm_add_16x8_dct_identity_0_12bpc_c: 3549.9 inv_txfm_add_16x8_dct_identity_0_12bpc_avx2: 212.1 inv_txfm_add_16x8_dct_identity_1_12bpc_c: 3538.7 inv_txfm_add_16x8_dct_identity_1_12bpc_avx2: 212.1 inv_txfm_add_16x8_dct_identity_2_12bpc_c: 3539.7 inv_txfm_add_16x8_dct_identity_2_12bpc_avx2: 212.1 inv_txfm_add_16x8_flipadst_adst_0_12bpc_c: 4495.3 inv_txfm_add_16x8_flipadst_adst_0_12bpc_avx2: 431.4 inv_txfm_add_16x8_flipadst_adst_1_12bpc_c: 4496.3 inv_txfm_add_16x8_flipadst_adst_1_12bpc_avx2: 431.4 inv_txfm_add_16x8_flipadst_adst_2_12bpc_c: 4499.2 inv_txfm_add_16x8_flipadst_adst_2_12bpc_avx2: 431.3 inv_txfm_add_16x8_flipadst_dct_0_12bpc_c: 4506.9 inv_txfm_add_16x8_flipadst_dct_0_12bpc_avx2: 386.3 inv_txfm_add_16x8_flipadst_dct_1_12bpc_c: 4512.9 inv_txfm_add_16x8_flipadst_dct_1_12bpc_avx2: 386.0 inv_txfm_add_16x8_flipadst_dct_2_12bpc_c: 4503.2 inv_txfm_add_16x8_flipadst_dct_2_12bpc_avx2: 386.0 inv_txfm_add_16x8_flipadst_flipadst_0_12bpc_c: 4509.1 inv_txfm_add_16x8_flipadst_flipadst_0_12bpc_avx2: 432.2 inv_txfm_add_16x8_flipadst_flipadst_1_12bpc_c: 4519.0 inv_txfm_add_16x8_flipadst_flipadst_1_12bpc_avx2: 432.1 inv_txfm_add_16x8_flipadst_flipadst_2_12bpc_c: 4518.3 inv_txfm_add_16x8_flipadst_flipadst_2_12bpc_avx2: 432.1 inv_txfm_add_16x8_flipadst_identity_0_12bpc_c: 3511.0 inv_txfm_add_16x8_flipadst_identity_0_12bpc_avx2: 257.1 inv_txfm_add_16x8_flipadst_identity_1_12bpc_c: 3518.5 inv_txfm_add_16x8_flipadst_identity_1_12bpc_avx2: 257.2 inv_txfm_add_16x8_flipadst_identity_2_12bpc_c: 3521.7 inv_txfm_add_16x8_flipadst_identity_2_12bpc_avx2: 257.1 inv_txfm_add_16x8_identity_adst_0_12bpc_c: 3166.8 inv_txfm_add_16x8_identity_adst_0_12bpc_avx2: 268.6 inv_txfm_add_16x8_identity_adst_1_12bpc_c: 3157.9 inv_txfm_add_16x8_identity_adst_1_12bpc_avx2: 268.6 inv_txfm_add_16x8_identity_adst_2_12bpc_c: 3156.5 inv_txfm_add_16x8_identity_adst_2_12bpc_avx2: 268.6 inv_txfm_add_16x8_identity_dct_0_12bpc_c: 3187.4 inv_txfm_add_16x8_identity_dct_0_12bpc_avx2: 224.4 inv_txfm_add_16x8_identity_dct_1_12bpc_c: 3185.8 inv_txfm_add_16x8_identity_dct_1_12bpc_avx2: 224.4 inv_txfm_add_16x8_identity_dct_2_12bpc_c: 3190.8 inv_txfm_add_16x8_identity_dct_2_12bpc_avx2: 224.4 inv_txfm_add_16x8_identity_flipadst_0_12bpc_c: 3167.7 inv_txfm_add_16x8_identity_flipadst_0_12bpc_avx2: 269.7 inv_txfm_add_16x8_identity_flipadst_1_12bpc_c: 3174.1 inv_txfm_add_16x8_identity_flipadst_1_12bpc_avx2: 269.8 inv_txfm_add_16x8_identity_flipadst_2_12bpc_c: 3174.7 inv_txfm_add_16x8_identity_flipadst_2_12bpc_avx2: 269.7 inv_txfm_add_16x8_identity_identity_0_12bpc_c: 2153.3 inv_txfm_add_16x8_identity_identity_0_12bpc_avx2: 99.1 inv_txfm_add_16x8_identity_identity_1_12bpc_c: 2143.6 inv_txfm_add_16x8_identity_identity_1_12bpc_avx2: 99.3 inv_txfm_add_16x8_identity_identity_2_12bpc_c: 2145.9 inv_txfm_add_16x8_identity_identity_2_12bpc_avx2: 98.6
-
Matthias Dressel authored
inv_txfm_add_8x16_adst_adst_0_12bpc_c: 4440.4 inv_txfm_add_8x16_adst_adst_0_12bpc_avx2: 354.3 inv_txfm_add_8x16_adst_adst_1_12bpc_c: 4437.3 inv_txfm_add_8x16_adst_adst_1_12bpc_avx2: 354.3 inv_txfm_add_8x16_adst_adst_2_12bpc_c: 4438.8 inv_txfm_add_8x16_adst_adst_2_12bpc_avx2: 442.6 inv_txfm_add_8x16_adst_dct_0_12bpc_c: 4507.3 inv_txfm_add_8x16_adst_dct_0_12bpc_avx2: 310.0 inv_txfm_add_8x16_adst_dct_1_12bpc_c: 4500.3 inv_txfm_add_8x16_adst_dct_1_12bpc_avx2: 310.0 inv_txfm_add_8x16_adst_dct_2_12bpc_c: 4516.1 inv_txfm_add_8x16_adst_dct_2_12bpc_avx2: 399.5 inv_txfm_add_8x16_adst_flipadst_0_12bpc_c: 4457.3 inv_txfm_add_8x16_adst_flipadst_0_12bpc_avx2: 355.6 inv_txfm_add_8x16_adst_flipadst_1_12bpc_c: 4441.3 inv_txfm_add_8x16_adst_flipadst_1_12bpc_avx2: 355.6 inv_txfm_add_8x16_adst_flipadst_2_12bpc_c: 4448.9 inv_txfm_add_8x16_adst_flipadst_2_12bpc_avx2: 445.5 inv_txfm_add_8x16_adst_identity_0_12bpc_c: 3204.0 inv_txfm_add_8x16_adst_identity_0_12bpc_avx2: 173.1 inv_txfm_add_8x16_adst_identity_1_12bpc_c: 3207.1 inv_txfm_add_8x16_adst_identity_1_12bpc_avx2: 173.6 inv_txfm_add_8x16_adst_identity_2_12bpc_c: 3210.4 inv_txfm_add_8x16_adst_identity_2_12bpc_avx2: 261.2 inv_txfm_add_8x16_dct_adst_0_12bpc_c: 4484.2 inv_txfm_add_8x16_dct_adst_0_12bpc_avx2: 334.0 inv_txfm_add_8x16_dct_adst_1_12bpc_c: 4503.8 inv_txfm_add_8x16_dct_adst_1_12bpc_avx2: 334.6 inv_txfm_add_8x16_dct_adst_2_12bpc_c: 4490.7 inv_txfm_add_8x16_dct_adst_2_12bpc_avx2: 395.6 inv_txfm_add_8x16_dct_dct_0_12bpc_c: 419.9 inv_txfm_add_8x16_dct_dct_0_12bpc_avx2: 37.6 inv_txfm_add_8x16_dct_dct_1_12bpc_c: 4482.6 inv_txfm_add_8x16_dct_dct_1_12bpc_avx2: 284.6 inv_txfm_add_8x16_dct_dct_2_12bpc_c: 4468.7 inv_txfm_add_8x16_dct_dct_2_12bpc_avx2: 348.3 inv_txfm_add_8x16_dct_flipadst_0_12bpc_c: 4468.4 inv_txfm_add_8x16_dct_flipadst_0_12bpc_avx2: 333.6 inv_txfm_add_8x16_dct_flipadst_1_12bpc_c: 4463.5 inv_txfm_add_8x16_dct_flipadst_1_12bpc_avx2: 333.5 inv_txfm_add_8x16_dct_flipadst_2_12bpc_c: 4459.4 inv_txfm_add_8x16_dct_flipadst_2_12bpc_avx2: 397.4 inv_txfm_add_8x16_dct_identity_0_12bpc_c: 3237.1 inv_txfm_add_8x16_dct_identity_0_12bpc_avx2: 149.6 inv_txfm_add_8x16_dct_identity_1_12bpc_c: 3229.9 inv_txfm_add_8x16_dct_identity_1_12bpc_avx2: 148.6 inv_txfm_add_8x16_dct_identity_2_12bpc_c: 3225.6 inv_txfm_add_8x16_dct_identity_2_12bpc_avx2: 211.3 inv_txfm_add_8x16_flipadst_adst_0_12bpc_c: 4532.1 inv_txfm_add_8x16_flipadst_adst_0_12bpc_avx2: 356.2 inv_txfm_add_8x16_flipadst_adst_1_12bpc_c: 4527.6 inv_txfm_add_8x16_flipadst_adst_1_12bpc_avx2: 356.1 inv_txfm_add_8x16_flipadst_adst_2_12bpc_c: 4532.5 inv_txfm_add_8x16_flipadst_adst_2_12bpc_avx2: 440.0 inv_txfm_add_8x16_flipadst_dct_0_12bpc_c: 4571.6 inv_txfm_add_8x16_flipadst_dct_0_12bpc_avx2: 310.3 inv_txfm_add_8x16_flipadst_dct_1_12bpc_c: 4554.5 inv_txfm_add_8x16_flipadst_dct_1_12bpc_avx2: 309.7 inv_txfm_add_8x16_flipadst_dct_2_12bpc_c: 4554.3 inv_txfm_add_8x16_flipadst_dct_2_12bpc_avx2: 399.9 inv_txfm_add_8x16_flipadst_flipadst_0_12bpc_c: 4497.2 inv_txfm_add_8x16_flipadst_flipadst_0_12bpc_avx2: 355.9 inv_txfm_add_8x16_flipadst_flipadst_1_12bpc_c: 4486.2 inv_txfm_add_8x16_flipadst_flipadst_1_12bpc_avx2: 355.6 inv_txfm_add_8x16_flipadst_flipadst_2_12bpc_c: 4493.4 inv_txfm_add_8x16_flipadst_flipadst_2_12bpc_avx2: 446.0 inv_txfm_add_8x16_flipadst_identity_0_12bpc_c: 3265.7 inv_txfm_add_8x16_flipadst_identity_0_12bpc_avx2: 173.8 inv_txfm_add_8x16_flipadst_identity_1_12bpc_c: 3270.8 inv_txfm_add_8x16_flipadst_identity_1_12bpc_avx2: 173.5 inv_txfm_add_8x16_flipadst_identity_2_12bpc_c: 3271.8 inv_txfm_add_8x16_flipadst_identity_2_12bpc_avx2: 261.6 inv_txfm_add_8x16_identity_adst_0_12bpc_c: 3295.3 inv_txfm_add_8x16_identity_adst_0_12bpc_avx2: 302.5 inv_txfm_add_8x16_identity_adst_1_12bpc_c: 3303.1 inv_txfm_add_8x16_identity_adst_1_12bpc_avx2: 303.0 inv_txfm_add_8x16_identity_adst_2_12bpc_c: 3304.6 inv_txfm_add_8x16_identity_adst_2_12bpc_avx2: 303.1 inv_txfm_add_8x16_identity_dct_0_12bpc_c: 3298.9 inv_txfm_add_8x16_identity_dct_0_12bpc_avx2: 257.8 inv_txfm_add_8x16_identity_dct_1_12bpc_c: 3308.1 inv_txfm_add_8x16_identity_dct_1_12bpc_avx2: 259.2 inv_txfm_add_8x16_identity_dct_2_12bpc_c: 3306.6 inv_txfm_add_8x16_identity_dct_2_12bpc_avx2: 259.2 inv_txfm_add_8x16_identity_flipadst_0_12bpc_c: 3294.7 inv_txfm_add_8x16_identity_flipadst_0_12bpc_avx2: 302.2 inv_txfm_add_8x16_identity_flipadst_1_12bpc_c: 3292.5 inv_txfm_add_8x16_identity_flipadst_1_12bpc_avx2: 302.2 inv_txfm_add_8x16_identity_flipadst_2_12bpc_c: 3275.4 inv_txfm_add_8x16_identity_flipadst_2_12bpc_avx2: 303.3 inv_txfm_add_8x16_identity_identity_0_12bpc_c: 2044.6 inv_txfm_add_8x16_identity_identity_0_12bpc_avx2: 116.2 inv_txfm_add_8x16_identity_identity_1_12bpc_c: 2059.9 inv_txfm_add_8x16_identity_identity_1_12bpc_avx2: 117.0 inv_txfm_add_8x16_identity_identity_2_12bpc_c: 2048.4 inv_txfm_add_8x16_identity_identity_2_12bpc_avx2: 116.2
-
- Dec 03, 2021
-
-
Some cdef asm functions accesses memory before the start of the buffer. There are two lr line buffers allocated, but only one of them had the correct padding applied.
-
It is often necessary to narrow the elements in a pair of Neon vectors to half the current width, before combining the results. This is usually achieved with a pair of XTN/XTN2 instructions. However, it is possible to achieve the same outcome with a single 'unzip' (UZP1) instruction. This patch changes all sequential AArch64 Neon XTN, XTN2 instruction pairs to use a single UZP1 instruction. Change-Id: I2a9fad3082d2cf363b1edce9ef0b8d547ec6c41a
-
The CMLT instruction has twice the throughput of SSHR on all modern out-of-order Arm cores. The Software Optimization Guides (SWOG) for the Cortex-A76, Cortex-A77 and Neoverse-N1 cores are being updated to reflect this. (The current version of the SWOG for these cores states that CMLT and SSHR both have the same execution throughput.) This patch changes all instances of sign computation to use CMLT instead of SSHR. Change-Id: Ice5747fee4e3bdd98ae8fbc036d735f55e492249
-
- Dec 02, 2021
-
-
Writing to an ymm register is not necessary to trigger state transitions, it turns out that a read is sufficient. Work around the issue by using EVEX-only registers, which are not affected by state transitions, for those reads.
-
- Nov 29, 2021
-
-
Henrik Gramner authored
-
Matthias Dressel authored
inv_txfm_add_16x4_adst_adst_0_12bpc_c: 1756.6 inv_txfm_add_16x4_adst_adst_0_12bpc_avx2: 182.4 inv_txfm_add_16x4_adst_adst_1_12bpc_c: 1756.0 inv_txfm_add_16x4_adst_adst_1_12bpc_avx2: 182.5 inv_txfm_add_16x4_adst_adst_2_12bpc_c: 1763.2 inv_txfm_add_16x4_adst_adst_2_12bpc_avx2: 182.4 inv_txfm_add_16x4_adst_dct_0_12bpc_c: 1863.6 inv_txfm_add_16x4_adst_dct_0_12bpc_avx2: 176.0 inv_txfm_add_16x4_adst_dct_1_12bpc_c: 1864.1 inv_txfm_add_16x4_adst_dct_1_12bpc_avx2: 176.0 inv_txfm_add_16x4_adst_dct_2_12bpc_c: 1861.3 inv_txfm_add_16x4_adst_dct_2_12bpc_avx2: 176.0 inv_txfm_add_16x4_adst_flipadst_0_12bpc_c: 1768.6 inv_txfm_add_16x4_adst_flipadst_0_12bpc_avx2: 184.1 inv_txfm_add_16x4_adst_flipadst_1_12bpc_c: 1768.8 inv_txfm_add_16x4_adst_flipadst_1_12bpc_avx2: 184.5 inv_txfm_add_16x4_adst_flipadst_2_12bpc_c: 1769.3 inv_txfm_add_16x4_adst_flipadst_2_12bpc_avx2: 184.7 inv_txfm_add_16x4_adst_identity_0_12bpc_c: 1686.6 inv_txfm_add_16x4_adst_identity_0_12bpc_avx2: 145.4 inv_txfm_add_16x4_adst_identity_1_12bpc_c: 1685.8 inv_txfm_add_16x4_adst_identity_1_12bpc_avx2: 145.8 inv_txfm_add_16x4_adst_identity_2_12bpc_c: 1681.7 inv_txfm_add_16x4_adst_identity_2_12bpc_avx2: 145.8 inv_txfm_add_16x4_dct_adst_0_12bpc_c: 1783.4 inv_txfm_add_16x4_dct_adst_0_12bpc_avx2: 167.7 inv_txfm_add_16x4_dct_adst_1_12bpc_c: 1789.1 inv_txfm_add_16x4_dct_adst_1_12bpc_avx2: 167.9 inv_txfm_add_16x4_dct_adst_2_12bpc_c: 1788.0 inv_txfm_add_16x4_dct_adst_2_12bpc_avx2: 169.8 inv_txfm_add_16x4_dct_dct_0_12bpc_c: 209.5 inv_txfm_add_16x4_dct_dct_0_12bpc_avx2: 21.6 inv_txfm_add_16x4_dct_dct_1_12bpc_c: 1894.3 inv_txfm_add_16x4_dct_dct_1_12bpc_avx2: 156.8 inv_txfm_add_16x4_dct_dct_2_12bpc_c: 1892.0 inv_txfm_add_16x4_dct_dct_2_12bpc_avx2: 156.8 inv_txfm_add_16x4_dct_flipadst_0_12bpc_c: 1784.7 inv_txfm_add_16x4_dct_flipadst_0_12bpc_avx2: 167.2 inv_txfm_add_16x4_dct_flipadst_1_12bpc_c: 1796.7 inv_txfm_add_16x4_dct_flipadst_1_12bpc_avx2: 168.6 inv_txfm_add_16x4_dct_flipadst_2_12bpc_c: 1788.9 inv_txfm_add_16x4_dct_flipadst_2_12bpc_avx2: 168.9 inv_txfm_add_16x4_dct_identity_0_12bpc_c: 1712.7 inv_txfm_add_16x4_dct_identity_0_12bpc_avx2: 128.8 inv_txfm_add_16x4_dct_identity_1_12bpc_c: 1714.8 inv_txfm_add_16x4_dct_identity_1_12bpc_avx2: 128.8 inv_txfm_add_16x4_dct_identity_2_12bpc_c: 1710.2 inv_txfm_add_16x4_dct_identity_2_12bpc_avx2: 128.8 inv_txfm_add_16x4_flipadst_adst_0_12bpc_c: 1763.6 inv_txfm_add_16x4_flipadst_adst_0_12bpc_avx2: 186.6 inv_txfm_add_16x4_flipadst_adst_1_12bpc_c: 1761.1 inv_txfm_add_16x4_flipadst_adst_1_12bpc_avx2: 185.6 inv_txfm_add_16x4_flipadst_adst_2_12bpc_c: 1761.8 inv_txfm_add_16x4_flipadst_adst_2_12bpc_avx2: 187.0 inv_txfm_add_16x4_flipadst_dct_0_12bpc_c: 1864.4 inv_txfm_add_16x4_flipadst_dct_0_12bpc_avx2: 176.8 inv_txfm_add_16x4_flipadst_dct_1_12bpc_c: 1862.7 inv_txfm_add_16x4_flipadst_dct_1_12bpc_avx2: 176.8 inv_txfm_add_16x4_flipadst_dct_2_12bpc_c: 1860.2 inv_txfm_add_16x4_flipadst_dct_2_12bpc_avx2: 176.8 inv_txfm_add_16x4_flipadst_flipadst_0_12bpc_c: 1760.4 inv_txfm_add_16x4_flipadst_flipadst_0_12bpc_avx2: 185.3 inv_txfm_add_16x4_flipadst_flipadst_1_12bpc_c: 1761.8 inv_txfm_add_16x4_flipadst_flipadst_1_12bpc_avx2: 185.3 inv_txfm_add_16x4_flipadst_flipadst_2_12bpc_c: 1766.5 inv_txfm_add_16x4_flipadst_flipadst_2_12bpc_avx2: 184.9 inv_txfm_add_16x4_flipadst_identity_0_12bpc_c: 1673.0 inv_txfm_add_16x4_flipadst_identity_0_12bpc_avx2: 143.1 inv_txfm_add_16x4_flipadst_identity_1_12bpc_c: 1673.2 inv_txfm_add_16x4_flipadst_identity_1_12bpc_avx2: 143.1 inv_txfm_add_16x4_flipadst_identity_2_12bpc_c: 1681.6 inv_txfm_add_16x4_flipadst_identity_2_12bpc_avx2: 143.2 inv_txfm_add_16x4_identity_adst_0_12bpc_c: 1128.7 inv_txfm_add_16x4_identity_adst_0_12bpc_avx2: 102.8 inv_txfm_add_16x4_identity_adst_1_12bpc_c: 1131.3 inv_txfm_add_16x4_identity_adst_1_12bpc_avx2: 101.3 inv_txfm_add_16x4_identity_adst_2_12bpc_c: 1127.5 inv_txfm_add_16x4_identity_adst_2_12bpc_avx2: 99.1 inv_txfm_add_16x4_identity_dct_0_12bpc_c: 1228.3 inv_txfm_add_16x4_identity_dct_0_12bpc_avx2: 88.3 inv_txfm_add_16x4_identity_dct_1_12bpc_c: 1220.5 inv_txfm_add_16x4_identity_dct_1_12bpc_avx2: 88.0 inv_txfm_add_16x4_identity_dct_2_12bpc_c: 1227.3 inv_txfm_add_16x4_identity_dct_2_12bpc_avx2: 88.1 inv_txfm_add_16x4_identity_flipadst_0_12bpc_c: 1142.4 inv_txfm_add_16x4_identity_flipadst_0_12bpc_avx2: 100.3 inv_txfm_add_16x4_identity_flipadst_1_12bpc_c: 1134.1 inv_txfm_add_16x4_identity_flipadst_1_12bpc_avx2: 100.3 inv_txfm_add_16x4_identity_flipadst_2_12bpc_c: 1136.4 inv_txfm_add_16x4_identity_flipadst_2_12bpc_avx2: 100.3 inv_txfm_add_16x4_identity_identity_0_12bpc_c: 1056.1 inv_txfm_add_16x4_identity_identity_0_12bpc_avx2: 61.6 inv_txfm_add_16x4_identity_identity_1_12bpc_c: 1064.6 inv_txfm_add_16x4_identity_identity_1_12bpc_avx2: 62.9 inv_txfm_add_16x4_identity_identity_2_12bpc_c: 1067.5 inv_txfm_add_16x4_identity_identity_2_12bpc_avx2: 63.5
-
Matthias Dressel authored
inv_txfm_add_4x16_adst_adst_0_12bpc_c: 1799.1 inv_txfm_add_4x16_adst_adst_0_12bpc_avx2: 178.8 inv_txfm_add_4x16_adst_adst_1_12bpc_c: 1795.0 inv_txfm_add_4x16_adst_adst_1_12bpc_avx2: 179.1 inv_txfm_add_4x16_adst_adst_2_12bpc_c: 1806.6 inv_txfm_add_4x16_adst_adst_2_12bpc_avx2: 179.3 inv_txfm_add_4x16_adst_dct_0_12bpc_c: 1824.8 inv_txfm_add_4x16_adst_dct_0_12bpc_avx2: 166.8 inv_txfm_add_4x16_adst_dct_1_12bpc_c: 1828.2 inv_txfm_add_4x16_adst_dct_1_12bpc_avx2: 166.7 inv_txfm_add_4x16_adst_dct_2_12bpc_c: 1830.9 inv_txfm_add_4x16_adst_dct_2_12bpc_avx2: 165.6 inv_txfm_add_4x16_adst_flipadst_0_12bpc_c: 1797.9 inv_txfm_add_4x16_adst_flipadst_0_12bpc_avx2: 179.6 inv_txfm_add_4x16_adst_flipadst_1_12bpc_c: 1795.9 inv_txfm_add_4x16_adst_flipadst_1_12bpc_avx2: 180.6 inv_txfm_add_4x16_adst_flipadst_2_12bpc_c: 1791.6 inv_txfm_add_4x16_adst_flipadst_2_12bpc_avx2: 180.1 inv_txfm_add_4x16_adst_identity_0_12bpc_c: 1163.7 inv_txfm_add_4x16_adst_identity_0_12bpc_avx2: 78.6 inv_txfm_add_4x16_adst_identity_1_12bpc_c: 1163.4 inv_txfm_add_4x16_adst_identity_1_12bpc_avx2: 78.9 inv_txfm_add_4x16_adst_identity_2_12bpc_c: 1164.3 inv_txfm_add_4x16_adst_identity_2_12bpc_avx2: 78.8 inv_txfm_add_4x16_dct_adst_0_12bpc_c: 1914.8 inv_txfm_add_4x16_dct_adst_0_12bpc_avx2: 177.0 inv_txfm_add_4x16_dct_adst_1_12bpc_c: 1904.8 inv_txfm_add_4x16_dct_adst_1_12bpc_avx2: 177.3 inv_txfm_add_4x16_dct_adst_2_12bpc_c: 1905.4 inv_txfm_add_4x16_dct_adst_2_12bpc_avx2: 176.4 inv_txfm_add_4x16_dct_dct_0_12bpc_c: 217.1 inv_txfm_add_4x16_dct_dct_0_12bpc_avx2: 26.6 inv_txfm_add_4x16_dct_dct_1_12bpc_c: 1955.1 inv_txfm_add_4x16_dct_dct_1_12bpc_avx2: 162.3 inv_txfm_add_4x16_dct_dct_2_12bpc_c: 1948.9 inv_txfm_add_4x16_dct_dct_2_12bpc_avx2: 162.2 inv_txfm_add_4x16_dct_flipadst_0_12bpc_c: 1922.8 inv_txfm_add_4x16_dct_flipadst_0_12bpc_avx2: 180.6 inv_txfm_add_4x16_dct_flipadst_1_12bpc_c: 1919.7 inv_txfm_add_4x16_dct_flipadst_1_12bpc_avx2: 180.1 inv_txfm_add_4x16_dct_flipadst_2_12bpc_c: 1912.0 inv_txfm_add_4x16_dct_flipadst_2_12bpc_avx2: 180.1 inv_txfm_add_4x16_dct_identity_0_12bpc_c: 1276.4 inv_txfm_add_4x16_dct_identity_0_12bpc_avx2: 75.4 inv_txfm_add_4x16_dct_identity_1_12bpc_c: 1277.5 inv_txfm_add_4x16_dct_identity_1_12bpc_avx2: 75.4 inv_txfm_add_4x16_dct_identity_2_12bpc_c: 1270.1 inv_txfm_add_4x16_dct_identity_2_12bpc_avx2: 75.3 inv_txfm_add_4x16_flipadst_adst_0_12bpc_c: 1802.8 inv_txfm_add_4x16_flipadst_adst_0_12bpc_avx2: 180.8 inv_txfm_add_4x16_flipadst_adst_1_12bpc_c: 1804.8 inv_txfm_add_4x16_flipadst_adst_1_12bpc_avx2: 180.7 inv_txfm_add_4x16_flipadst_adst_2_12bpc_c: 1800.6 inv_txfm_add_4x16_flipadst_adst_2_12bpc_avx2: 181.2 inv_txfm_add_4x16_flipadst_dct_0_12bpc_c: 1842.5 inv_txfm_add_4x16_flipadst_dct_0_12bpc_avx2: 165.1 inv_txfm_add_4x16_flipadst_dct_1_12bpc_c: 1837.8 inv_txfm_add_4x16_flipadst_dct_1_12bpc_avx2: 164.4 inv_txfm_add_4x16_flipadst_dct_2_12bpc_c: 1841.6 inv_txfm_add_4x16_flipadst_dct_2_12bpc_avx2: 166.1 inv_txfm_add_4x16_flipadst_flipadst_0_12bpc_c: 1812.4 inv_txfm_add_4x16_flipadst_flipadst_0_12bpc_avx2: 182.0 inv_txfm_add_4x16_flipadst_flipadst_1_12bpc_c: 1803.9 inv_txfm_add_4x16_flipadst_flipadst_1_12bpc_avx2: 181.2 inv_txfm_add_4x16_flipadst_flipadst_2_12bpc_c: 1809.9 inv_txfm_add_4x16_flipadst_flipadst_2_12bpc_avx2: 183.2 inv_txfm_add_4x16_flipadst_identity_0_12bpc_c: 1170.5 inv_txfm_add_4x16_flipadst_identity_0_12bpc_avx2: 78.4 inv_txfm_add_4x16_flipadst_identity_1_12bpc_c: 1172.1 inv_txfm_add_4x16_flipadst_identity_1_12bpc_avx2: 80.0 inv_txfm_add_4x16_flipadst_identity_2_12bpc_c: 1170.9 inv_txfm_add_4x16_flipadst_identity_2_12bpc_avx2: 78.6 inv_txfm_add_4x16_identity_adst_0_12bpc_c: 1705.4 inv_txfm_add_4x16_identity_adst_0_12bpc_avx2: 162.6 inv_txfm_add_4x16_identity_adst_1_12bpc_c: 1714.5 inv_txfm_add_4x16_identity_adst_1_12bpc_avx2: 162.6 inv_txfm_add_4x16_identity_adst_2_12bpc_c: 1703.1 inv_txfm_add_4x16_identity_adst_2_12bpc_avx2: 162.5 inv_txfm_add_4x16_identity_dct_0_12bpc_c: 1775.0 inv_txfm_add_4x16_identity_dct_0_12bpc_avx2: 150.5 inv_txfm_add_4x16_identity_dct_1_12bpc_c: 1753.0 inv_txfm_add_4x16_identity_dct_1_12bpc_avx2: 150.6 inv_txfm_add_4x16_identity_dct_2_12bpc_c: 1759.6 inv_txfm_add_4x16_identity_dct_2_12bpc_avx2: 149.8 inv_txfm_add_4x16_identity_flipadst_0_12bpc_c: 1727.5 inv_txfm_add_4x16_identity_flipadst_0_12bpc_avx2: 160.3 inv_txfm_add_4x16_identity_flipadst_1_12bpc_c: 1739.8 inv_txfm_add_4x16_identity_flipadst_1_12bpc_avx2: 160.9 inv_txfm_add_4x16_identity_flipadst_2_12bpc_c: 1728.3 inv_txfm_add_4x16_identity_flipadst_2_12bpc_avx2: 159.9 inv_txfm_add_4x16_identity_identity_0_12bpc_c: 1098.6 inv_txfm_add_4x16_identity_identity_0_12bpc_avx2: 60.4 inv_txfm_add_4x16_identity_identity_1_12bpc_c: 1095.4 inv_txfm_add_4x16_identity_identity_1_12bpc_avx2: 61.3 inv_txfm_add_4x16_identity_identity_2_12bpc_c: 1111.6 inv_txfm_add_4x16_identity_identity_2_12bpc_avx2: 60.6
-
Matthias Dressel authored
WHT uses no SSSE3 instructions. The 16bpc variant is already SSE2.
-
- Nov 18, 2021
-
-
The previous code could cause padded pixels along the right edge to be slightly off in some obscure cases.
-
- Nov 15, 2021
-
-
Henrik Gramner authored
-
Henrik Gramner authored
-
Henrik Gramner authored
-
Henrik Gramner authored
-
- Nov 13, 2021
-
-
Matthias Dressel authored
inv_txfm_add_8x8_adst_adst_0_12bpc_c: 1997.9 inv_txfm_add_8x8_adst_adst_0_12bpc_avx2: 185.7 inv_txfm_add_8x8_adst_adst_1_12bpc_c: 2009.8 inv_txfm_add_8x8_adst_adst_1_12bpc_avx2: 185.7 inv_txfm_add_8x8_adst_dct_0_12bpc_c: 1991.0 inv_txfm_add_8x8_adst_dct_0_12bpc_avx2: 161.3 inv_txfm_add_8x8_adst_dct_1_12bpc_c: 1977.0 inv_txfm_add_8x8_adst_dct_1_12bpc_avx2: 161.4 inv_txfm_add_8x8_adst_flipadst_0_12bpc_c: 2017.6 inv_txfm_add_8x8_adst_flipadst_0_12bpc_avx2: 184.2 inv_txfm_add_8x8_adst_flipadst_1_12bpc_c: 2018.9 inv_txfm_add_8x8_adst_flipadst_1_12bpc_avx2: 184.2 inv_txfm_add_8x8_adst_identity_0_12bpc_c: 1407.2 inv_txfm_add_8x8_adst_identity_0_12bpc_avx2: 95.7 inv_txfm_add_8x8_adst_identity_1_12bpc_c: 1405.9 inv_txfm_add_8x8_adst_identity_1_12bpc_avx2: 95.8 inv_txfm_add_8x8_dct_adst_0_12bpc_c: 2024.2 inv_txfm_add_8x8_dct_adst_0_12bpc_avx2: 156.9 inv_txfm_add_8x8_dct_adst_1_12bpc_c: 2018.8 inv_txfm_add_8x8_dct_adst_1_12bpc_avx2: 160.1 inv_txfm_add_8x8_dct_dct_0_12bpc_c: 213.0 inv_txfm_add_8x8_dct_dct_0_12bpc_avx2: 24.8 inv_txfm_add_8x8_dct_dct_1_12bpc_c: 2008.6 inv_txfm_add_8x8_dct_dct_1_12bpc_avx2: 139.0 inv_txfm_add_8x8_dct_flipadst_0_12bpc_c: 2012.3 inv_txfm_add_8x8_dct_flipadst_0_12bpc_avx2: 159.2 inv_txfm_add_8x8_dct_flipadst_1_12bpc_c: 2005.1 inv_txfm_add_8x8_dct_flipadst_1_12bpc_avx2: 158.7 inv_txfm_add_8x8_dct_identity_0_12bpc_c: 1470.4 inv_txfm_add_8x8_dct_identity_0_12bpc_avx2: 71.7 inv_txfm_add_8x8_dct_identity_1_12bpc_c: 1477.8 inv_txfm_add_8x8_dct_identity_1_12bpc_avx2: 70.7 inv_txfm_add_8x8_flipadst_adst_0_12bpc_c: 2006.1 inv_txfm_add_8x8_flipadst_adst_0_12bpc_avx2: 183.6 inv_txfm_add_8x8_flipadst_adst_1_12bpc_c: 1987.6 inv_txfm_add_8x8_flipadst_adst_1_12bpc_avx2: 183.6 inv_txfm_add_8x8_flipadst_dct_0_12bpc_c: 1986.6 inv_txfm_add_8x8_flipadst_dct_0_12bpc_avx2: 163.0 inv_txfm_add_8x8_flipadst_dct_1_12bpc_c: 1979.3 inv_txfm_add_8x8_flipadst_dct_1_12bpc_avx2: 163.1 inv_txfm_add_8x8_flipadst_flipadst_0_12bpc_c: 2004.0 inv_txfm_add_8x8_flipadst_flipadst_0_12bpc_avx2: 184.3 inv_txfm_add_8x8_flipadst_flipadst_1_12bpc_c: 2003.9 inv_txfm_add_8x8_flipadst_flipadst_1_12bpc_avx2: 184.3 inv_txfm_add_8x8_flipadst_identity_0_12bpc_c: 1433.5 inv_txfm_add_8x8_flipadst_identity_0_12bpc_avx2: 95.3 inv_txfm_add_8x8_flipadst_identity_1_12bpc_c: 1425.4 inv_txfm_add_8x8_flipadst_identity_1_12bpc_avx2: 96.3 inv_txfm_add_8x8_identity_adst_0_12bpc_c: 1456.5 inv_txfm_add_8x8_identity_adst_0_12bpc_avx2: 115.8 inv_txfm_add_8x8_identity_adst_1_12bpc_c: 1453.5 inv_txfm_add_8x8_identity_adst_1_12bpc_avx2: 115.8 inv_txfm_add_8x8_identity_dct_0_12bpc_c: 1450.0 inv_txfm_add_8x8_identity_dct_0_12bpc_avx2: 93.5 inv_txfm_add_8x8_identity_dct_1_12bpc_c: 1447.5 inv_txfm_add_8x8_identity_dct_1_12bpc_avx2: 94.3 inv_txfm_add_8x8_identity_flipadst_0_12bpc_c: 1451.7 inv_txfm_add_8x8_identity_flipadst_0_12bpc_avx2: 114.0 inv_txfm_add_8x8_identity_flipadst_1_12bpc_c: 1456.4 inv_txfm_add_8x8_identity_flipadst_1_12bpc_avx2: 114.0 inv_txfm_add_8x8_identity_identity_0_12bpc_c: 892.3 inv_txfm_add_8x8_identity_identity_0_12bpc_avx2: 33.7 inv_txfm_add_8x8_identity_identity_1_12bpc_c: 897.2 inv_txfm_add_8x8_identity_identity_1_12bpc_avx2: 33.1
-
Matthias Dressel authored
inv_txfm_add_8x4_adst_adst_0_12bpc_c: 882.1 inv_txfm_add_8x4_adst_adst_0_12bpc_avx2: 113.7 inv_txfm_add_8x4_adst_adst_1_12bpc_c: 882.5 inv_txfm_add_8x4_adst_adst_1_12bpc_avx2: 113.8 inv_txfm_add_8x4_adst_dct_0_12bpc_c: 928.0 inv_txfm_add_8x4_adst_dct_0_12bpc_avx2: 109.2 inv_txfm_add_8x4_adst_dct_1_12bpc_c: 924.9 inv_txfm_add_8x4_adst_dct_1_12bpc_avx2: 109.2 inv_txfm_add_8x4_adst_flipadst_0_12bpc_c: 889.9 inv_txfm_add_8x4_adst_flipadst_0_12bpc_avx2: 114.3 inv_txfm_add_8x4_adst_flipadst_1_12bpc_c: 886.0 inv_txfm_add_8x4_adst_flipadst_1_12bpc_avx2: 114.8 inv_txfm_add_8x4_adst_identity_0_12bpc_c: 832.2 inv_txfm_add_8x4_adst_identity_0_12bpc_avx2: 88.8 inv_txfm_add_8x4_adst_identity_1_12bpc_c: 834.6 inv_txfm_add_8x4_adst_identity_1_12bpc_avx2: 89.0 inv_txfm_add_8x4_dct_adst_0_12bpc_c: 870.3 inv_txfm_add_8x4_dct_adst_0_12bpc_avx2: 96.3 inv_txfm_add_8x4_dct_adst_1_12bpc_c: 884.6 inv_txfm_add_8x4_dct_adst_1_12bpc_avx2: 96.3 inv_txfm_add_8x4_dct_dct_0_12bpc_c: 116.1 inv_txfm_add_8x4_dct_dct_0_12bpc_avx2: 24.5 inv_txfm_add_8x4_dct_dct_1_12bpc_c: 925.1 inv_txfm_add_8x4_dct_dct_1_12bpc_avx2: 92.3 inv_txfm_add_8x4_dct_flipadst_0_12bpc_c: 882.7 inv_txfm_add_8x4_dct_flipadst_0_12bpc_avx2: 97.0 inv_txfm_add_8x4_dct_flipadst_1_12bpc_c: 882.1 inv_txfm_add_8x4_dct_flipadst_1_12bpc_avx2: 97.0 inv_txfm_add_8x4_dct_identity_0_12bpc_c: 827.5 inv_txfm_add_8x4_dct_identity_0_12bpc_avx2: 72.4 inv_txfm_add_8x4_dct_identity_1_12bpc_c: 827.8 inv_txfm_add_8x4_dct_identity_1_12bpc_avx2: 73.8 inv_txfm_add_8x4_flipadst_adst_0_12bpc_c: 899.5 inv_txfm_add_8x4_flipadst_adst_0_12bpc_avx2: 113.2 inv_txfm_add_8x4_flipadst_adst_1_12bpc_c: 898.8 inv_txfm_add_8x4_flipadst_adst_1_12bpc_avx2: 113.3 inv_txfm_add_8x4_flipadst_dct_0_12bpc_c: 945.7 inv_txfm_add_8x4_flipadst_dct_0_12bpc_avx2: 108.3 inv_txfm_add_8x4_flipadst_dct_1_12bpc_c: 945.6 inv_txfm_add_8x4_flipadst_dct_1_12bpc_avx2: 108.3 inv_txfm_add_8x4_flipadst_flipadst_0_12bpc_c: 903.6 inv_txfm_add_8x4_flipadst_flipadst_0_12bpc_avx2: 113.9 inv_txfm_add_8x4_flipadst_flipadst_1_12bpc_c: 902.8 inv_txfm_add_8x4_flipadst_flipadst_1_12bpc_avx2: 114.2 inv_txfm_add_8x4_flipadst_identity_0_12bpc_c: 856.6 inv_txfm_add_8x4_flipadst_identity_0_12bpc_avx2: 88.3 inv_txfm_add_8x4_flipadst_identity_1_12bpc_c: 848.8 inv_txfm_add_8x4_flipadst_identity_1_12bpc_avx2: 87.4 inv_txfm_add_8x4_identity_adst_0_12bpc_c: 583.2 inv_txfm_add_8x4_identity_adst_0_12bpc_avx2: 69.6 inv_txfm_add_8x4_identity_adst_1_12bpc_c: 584.3 inv_txfm_add_8x4_identity_adst_1_12bpc_avx2: 69.6 inv_txfm_add_8x4_identity_dct_0_12bpc_c: 632.9 inv_txfm_add_8x4_identity_dct_0_12bpc_avx2: 65.3 inv_txfm_add_8x4_identity_dct_1_12bpc_c: 629.6 inv_txfm_add_8x4_identity_dct_1_12bpc_avx2: 65.8 inv_txfm_add_8x4_identity_flipadst_0_12bpc_c: 587.0 inv_txfm_add_8x4_identity_flipadst_0_12bpc_avx2: 71.0 inv_txfm_add_8x4_identity_flipadst_1_12bpc_c: 586.9 inv_txfm_add_8x4_identity_flipadst_1_12bpc_avx2: 71.0 inv_txfm_add_8x4_identity_identity_0_12bpc_c: 533.0 inv_txfm_add_8x4_identity_identity_0_12bpc_avx2: 45.3 inv_txfm_add_8x4_identity_identity_1_12bpc_c: 539.7 inv_txfm_add_8x4_identity_identity_1_12bpc_avx2: 45.9
-
Matthias Dressel authored
inv_txfm_add_4x8_adst_adst_0_12bpc_c: 900.8 inv_txfm_add_4x8_adst_adst_0_12bpc_avx2: 118.8 inv_txfm_add_4x8_adst_adst_1_12bpc_c: 893.7 inv_txfm_add_4x8_adst_adst_1_12bpc_avx2: 118.8 inv_txfm_add_4x8_adst_dct_0_12bpc_c: 890.2 inv_txfm_add_4x8_adst_dct_0_12bpc_avx2: 104.8 inv_txfm_add_4x8_adst_dct_1_12bpc_c: 887.4 inv_txfm_add_4x8_adst_dct_1_12bpc_avx2: 104.8 inv_txfm_add_4x8_adst_flipadst_0_12bpc_c: 919.6 inv_txfm_add_4x8_adst_flipadst_0_12bpc_avx2: 116.6 inv_txfm_add_4x8_adst_flipadst_1_12bpc_c: 912.1 inv_txfm_add_4x8_adst_flipadst_1_12bpc_avx2: 116.6 inv_txfm_add_4x8_adst_identity_0_12bpc_c: 613.5 inv_txfm_add_4x8_adst_identity_0_12bpc_avx2: 42.8 inv_txfm_add_4x8_adst_identity_1_12bpc_c: 608.7 inv_txfm_add_4x8_adst_identity_1_12bpc_avx2: 43.3 inv_txfm_add_4x8_dct_adst_0_12bpc_c: 951.7 inv_txfm_add_4x8_dct_adst_0_12bpc_avx2: 113.8 inv_txfm_add_4x8_dct_adst_1_12bpc_c: 949.0 inv_txfm_add_4x8_dct_adst_1_12bpc_avx2: 113.1 inv_txfm_add_4x8_dct_dct_0_12bpc_c: 118.6 inv_txfm_add_4x8_dct_dct_0_12bpc_avx2: 24.5 inv_txfm_add_4x8_dct_dct_1_12bpc_c: 942.4 inv_txfm_add_4x8_dct_dct_1_12bpc_avx2: 99.2 inv_txfm_add_4x8_dct_flipadst_0_12bpc_c: 959.3 inv_txfm_add_4x8_dct_flipadst_0_12bpc_avx2: 113.9 inv_txfm_add_4x8_dct_flipadst_1_12bpc_c: 964.1 inv_txfm_add_4x8_dct_flipadst_1_12bpc_avx2: 114.3 inv_txfm_add_4x8_dct_identity_0_12bpc_c: 659.9 inv_txfm_add_4x8_dct_identity_0_12bpc_avx2: 41.9 inv_txfm_add_4x8_dct_identity_1_12bpc_c: 658.6 inv_txfm_add_4x8_dct_identity_1_12bpc_avx2: 41.6 inv_txfm_add_4x8_flipadst_adst_0_12bpc_c: 906.6 inv_txfm_add_4x8_flipadst_adst_0_12bpc_avx2: 117.3 inv_txfm_add_4x8_flipadst_adst_1_12bpc_c: 907.7 inv_txfm_add_4x8_flipadst_adst_1_12bpc_avx2: 117.3 inv_txfm_add_4x8_flipadst_dct_0_12bpc_c: 890.3 inv_txfm_add_4x8_flipadst_dct_0_12bpc_avx2: 104.6 inv_txfm_add_4x8_flipadst_dct_1_12bpc_c: 895.6 inv_txfm_add_4x8_flipadst_dct_1_12bpc_avx2: 104.6 inv_txfm_add_4x8_flipadst_flipadst_0_12bpc_c: 902.9 inv_txfm_add_4x8_flipadst_flipadst_0_12bpc_avx2: 116.5 inv_txfm_add_4x8_flipadst_flipadst_1_12bpc_c: 915.0 inv_txfm_add_4x8_flipadst_flipadst_1_12bpc_avx2: 116.4 inv_txfm_add_4x8_flipadst_identity_0_12bpc_c: 618.6 inv_txfm_add_4x8_flipadst_identity_0_12bpc_avx2: 45.3 inv_txfm_add_4x8_flipadst_identity_1_12bpc_c: 618.1 inv_txfm_add_4x8_flipadst_identity_1_12bpc_avx2: 44.0 inv_txfm_add_4x8_identity_adst_0_12bpc_c: 829.7 inv_txfm_add_4x8_identity_adst_0_12bpc_avx2: 107.4 inv_txfm_add_4x8_identity_adst_1_12bpc_c: 831.7 inv_txfm_add_4x8_identity_adst_1_12bpc_avx2: 107.8 inv_txfm_add_4x8_identity_dct_0_12bpc_c: 823.2 inv_txfm_add_4x8_identity_dct_0_12bpc_avx2: 90.7 inv_txfm_add_4x8_identity_dct_1_12bpc_c: 824.1 inv_txfm_add_4x8_identity_dct_1_12bpc_avx2: 90.7 inv_txfm_add_4x8_identity_flipadst_0_12bpc_c: 853.4 inv_txfm_add_4x8_identity_flipadst_0_12bpc_avx2: 106.8 inv_txfm_add_4x8_identity_flipadst_1_12bpc_c: 852.2 inv_txfm_add_4x8_identity_flipadst_1_12bpc_avx2: 106.8 inv_txfm_add_4x8_identity_identity_0_12bpc_c: 543.2 inv_txfm_add_4x8_identity_identity_0_12bpc_avx2: 36.4 inv_txfm_add_4x8_identity_identity_1_12bpc_c: 544.8 inv_txfm_add_4x8_identity_identity_1_12bpc_avx2: 36.6
-
- Nov 12, 2021
-
-
Ronald S. Bultje authored
Credit to oss-fuzz.
-
- Nov 11, 2021
-
-
Ronald S. Bultje authored
Credit to oss-fuzz.
-
- Nov 10, 2021
-
-
Henrik Gramner authored
Also fix some incorrect comments.
-
- Nov 05, 2021
-
-
Matthias Dressel authored
Bidirectional control and invisible characters can be used to hide malicious code. Ref: CVE-2021-42574, CVE-2021-42694
-
- Nov 02, 2021
-
-
Matthias Dressel authored
Values need to be clipped after Hadamard rotations.
-
- Nov 01, 2021
-
-
Victorien Le Couviour--Tuffet authored
Credit to Oss-Fuzz.
-
- Oct 31, 2021
-
-
Niklas Haas authored
The signature of pl_allocate/release_dav1dpic takes a void *cookie, which the compiler warns about if we don't implicitly cast.
-
- Oct 29, 2021
-
-
Victorien Le Couviour--Tuffet authored
-
Victorien Le Couviour--Tuffet authored
-
Victorien Le Couviour--Tuffet authored
-
- Oct 28, 2021
-
-
Martin Storsjö authored
Use the check result instead of hardcoding what OSes have got the function. This also requires checking for the pthread_np.h header and including it while testing for functions in meson, but allows getting rid of the hardcoded OS conditions in the source. This fixes building for Android, if _GNU_SOURCE happens to be defined. (It gets defined if building with a slightly nonstandard cross file that defines "system = 'linux'", but it could also have been set by the caller.)
-
- Oct 27, 2021
-
-
Salome Thirot authored
Add Branch Target Identifiers (BTIs) to all functions defined in AArch64 assembly files. BTI support is turned on or off at compile time based on the presence of the __ARM_FEATURE_BTI_DEFAULT feature macro. A binary compiled with BTI support can be executed on an Armv8-A processor without BTI support because the instructions are defined in NOP space. Signed-off-by:
Jonathan Wright <jonathan.wright@arm.com> Signed-off-by:
Matthew Dalzell <matthew.dalzell@arm.com> Signed-off-by:
Salome Thirot <salome.thirot@arm.com>
-
Salome Thirot authored
Using ret x<n> instead of br x<n> removes the need for a BTI landing pad at the target address in x<n>. Using 'ret' instead of 'br' does not have any performance implications. Signed-off-by:
Jonathan Wright <jonathan.wright@arm.com> Signed-off-by:
Matthew Dalzell <matthew.dalzell@arm.com> Signed-off-by:
Salome Thirot <salome.thirot@arm.com>
-
- Oct 18, 2021
-
-
Matthias Dressel authored
Refactors itx into separate 10, 12 bit functions to prevent conditional jumps. inv_txfm_add_4x4_adst_adst_0_12bpc_c: 370.9 inv_txfm_add_4x4_adst_adst_0_12bpc_avx2: 68.6 inv_txfm_add_4x4_adst_adst_1_12bpc_c: 371.0 inv_txfm_add_4x4_adst_adst_1_12bpc_avx2: 68.7 inv_txfm_add_4x4_adst_dct_0_12bpc_c: 413.1 inv_txfm_add_4x4_adst_dct_0_12bpc_avx2: 69.2 inv_txfm_add_4x4_adst_dct_1_12bpc_c: 412.7 inv_txfm_add_4x4_adst_dct_1_12bpc_avx2: 68.8 inv_txfm_add_4x4_adst_flipadst_0_12bpc_c: 378.5 inv_txfm_add_4x4_adst_flipadst_0_12bpc_avx2: 74.9 inv_txfm_add_4x4_adst_flipadst_1_12bpc_c: 378.1 inv_txfm_add_4x4_adst_flipadst_1_12bpc_avx2: 74.6 inv_txfm_add_4x4_adst_identity_0_12bpc_c: 347.8 inv_txfm_add_4x4_adst_identity_0_12bpc_avx2: 48.8 inv_txfm_add_4x4_adst_identity_1_12bpc_c: 342.7 inv_txfm_add_4x4_adst_identity_1_12bpc_avx2: 49.0 inv_txfm_add_4x4_dct_adst_0_12bpc_c: 399.2 inv_txfm_add_4x4_dct_adst_0_12bpc_avx2: 73.1 inv_txfm_add_4x4_dct_adst_1_12bpc_c: 398.7 inv_txfm_add_4x4_dct_adst_1_12bpc_avx2: 72.2 inv_txfm_add_4x4_dct_dct_0_12bpc_c: 69.6 inv_txfm_add_4x4_dct_dct_0_12bpc_avx2: 32.9 inv_txfm_add_4x4_dct_dct_1_12bpc_c: 420.5 inv_txfm_add_4x4_dct_dct_1_12bpc_avx2: 72.2 inv_txfm_add_4x4_dct_flipadst_0_12bpc_c: 405.5 inv_txfm_add_4x4_dct_flipadst_0_12bpc_avx2: 75.9 inv_txfm_add_4x4_dct_flipadst_1_12bpc_c: 404.2 inv_txfm_add_4x4_dct_flipadst_1_12bpc_avx2: 75.6 inv_txfm_add_4x4_dct_identity_0_12bpc_c: 374.1 inv_txfm_add_4x4_dct_identity_0_12bpc_avx2: 51.6 inv_txfm_add_4x4_dct_identity_1_12bpc_c: 368.0 inv_txfm_add_4x4_dct_identity_1_12bpc_avx2: 51.8 inv_txfm_add_4x4_flipadst_adst_0_12bpc_c: 368.0 inv_txfm_add_4x4_flipadst_adst_0_12bpc_avx2: 69.2 inv_txfm_add_4x4_flipadst_adst_1_12bpc_c: 370.7 inv_txfm_add_4x4_flipadst_adst_1_12bpc_avx2: 70.4 inv_txfm_add_4x4_flipadst_dct_0_12bpc_c: 393.7 inv_txfm_add_4x4_flipadst_dct_0_12bpc_avx2: 70.1 inv_txfm_add_4x4_flipadst_dct_1_12bpc_c: 392.9 inv_txfm_add_4x4_flipadst_dct_1_12bpc_avx2: 69.6 inv_txfm_add_4x4_flipadst_flipadst_0_12bpc_c: 382.2 inv_txfm_add_4x4_flipadst_flipadst_0_12bpc_avx2: 74.6 inv_txfm_add_4x4_flipadst_flipadst_1_12bpc_c: 381.3 inv_txfm_add_4x4_flipadst_flipadst_1_12bpc_avx2: 74.9 inv_txfm_add_4x4_flipadst_identity_0_12bpc_c: 346.7 inv_txfm_add_4x4_flipadst_identity_0_12bpc_avx2: 48.2 inv_txfm_add_4x4_flipadst_identity_1_12bpc_c: 347.9 inv_txfm_add_4x4_flipadst_identity_1_12bpc_avx2: 48.7 inv_txfm_add_4x4_identity_adst_0_12bpc_c: 344.7 inv_txfm_add_4x4_identity_adst_0_12bpc_avx2: 59.8 inv_txfm_add_4x4_identity_adst_1_12bpc_c: 340.5 inv_txfm_add_4x4_identity_adst_1_12bpc_avx2: 59.2 inv_txfm_add_4x4_identity_dct_0_12bpc_c: 369.8 inv_txfm_add_4x4_identity_dct_0_12bpc_avx2: 59.3 inv_txfm_add_4x4_identity_dct_1_12bpc_c: 369.5 inv_txfm_add_4x4_identity_dct_1_12bpc_avx2: 59.2 inv_txfm_add_4x4_identity_flipadst_0_12bpc_c: 353.4 inv_txfm_add_4x4_identity_flipadst_0_12bpc_avx2: 65.6 inv_txfm_add_4x4_identity_flipadst_1_12bpc_c: 350.9 inv_txfm_add_4x4_identity_flipadst_1_12bpc_avx2: 65.9 inv_txfm_add_4x4_identity_identity_0_12bpc_c: 326.1 inv_txfm_add_4x4_identity_identity_0_12bpc_avx2: 39.5 inv_txfm_add_4x4_identity_identity_1_12bpc_c: 321.6 inv_txfm_add_4x4_identity_identity_1_12bpc_avx2: 39.5
-
Matthias Dressel authored
Use numerical GPR references everywhere for consistency.
-