- Feb 07, 2022
-
-
Ronald S. Bultje authored
Fixes inconsistent output frame count depending on --threads=X value for the sample in #244.
-
- Feb 05, 2022
-
-
James Almer authored
Prevents dav1d_get_picture() from returning EAGAIN when no frame delay was requested (threads=1, or threads > 1 and max_frame_delay=1). Signed-off-by:
James Almer <jamrial@gmail.com>
-
- Feb 03, 2022
-
-
Right now, --alllayers 0 will only output operating points that exactly match the largest one in the sequence header. However, in certain cases, the largest one might not be available, and a smaller one should be returned to the user instead. This matches update_frame_buffers() in aomdec to output only the latest frame if --alllayers 0 is specified. Signed-off-by:
James Almer <jamrial@gmail.com>
-
James Almer authored
-
- Feb 02, 2022
-
-
Joe Drago authored
-
-
- Jan 31, 2022
-
-
Avoids issues when dynamically linking with third party libraries in tools/examples.
-
- Jan 30, 2022
-
-
Martin Storsjö authored
-
- Jan 25, 2022
-
-
Marvin Scholz authored
-
- Jan 24, 2022
-
-
Matthias Dressel authored
inv_txfm_add_16x16_adst_adst_0_12bpc_c: 8990.0 inv_txfm_add_16x16_adst_adst_0_12bpc_avx2: 646.1 inv_txfm_add_16x16_adst_adst_1_12bpc_c: 8965.3 inv_txfm_add_16x16_adst_adst_1_12bpc_avx2: 646.9 inv_txfm_add_16x16_adst_adst_2_12bpc_c: 8983.2 inv_txfm_add_16x16_adst_adst_2_12bpc_avx2: 870.1 inv_txfm_add_16x16_adst_dct_0_12bpc_c: 9058.2 inv_txfm_add_16x16_adst_dct_0_12bpc_avx2: 548.8 inv_txfm_add_16x16_adst_dct_1_12bpc_c: 9092.7 inv_txfm_add_16x16_adst_dct_1_12bpc_avx2: 549.3 inv_txfm_add_16x16_adst_dct_2_12bpc_c: 9086.7 inv_txfm_add_16x16_adst_dct_2_12bpc_avx2: 775.5 inv_txfm_add_16x16_adst_flipadst_0_12bpc_c: 9083.4 inv_txfm_add_16x16_adst_flipadst_0_12bpc_avx2: 645.6 inv_txfm_add_16x16_adst_flipadst_1_12bpc_c: 8998.3 inv_txfm_add_16x16_adst_flipadst_1_12bpc_avx2: 646.2 inv_txfm_add_16x16_adst_flipadst_2_12bpc_c: 9014.7 inv_txfm_add_16x16_adst_flipadst_2_12bpc_avx2: 873.8 inv_txfm_add_16x16_dct_adst_0_12bpc_c: 9080.1 inv_txfm_add_16x16_dct_adst_0_12bpc_avx2: 598.2 inv_txfm_add_16x16_dct_adst_1_12bpc_c: 9103.3 inv_txfm_add_16x16_dct_adst_1_12bpc_avx2: 598.1 inv_txfm_add_16x16_dct_adst_2_12bpc_c: 9089.5 inv_txfm_add_16x16_dct_adst_2_12bpc_avx2: 764.4 inv_txfm_add_16x16_dct_dct_0_12bpc_c: 1042.1 inv_txfm_add_16x16_dct_dct_0_12bpc_avx2: 28.6 inv_txfm_add_16x16_dct_dct_1_12bpc_c: 9164.6 inv_txfm_add_16x16_dct_dct_1_12bpc_avx2: 500.8 inv_txfm_add_16x16_dct_dct_2_12bpc_c: 9161.9 inv_txfm_add_16x16_dct_dct_2_12bpc_avx2: 678.2 inv_txfm_add_16x16_dct_flipadst_0_12bpc_c: 9104.9 inv_txfm_add_16x16_dct_flipadst_0_12bpc_avx2: 601.8 inv_txfm_add_16x16_dct_flipadst_1_12bpc_c: 9248.6 inv_txfm_add_16x16_dct_flipadst_1_12bpc_avx2: 599.2 inv_txfm_add_16x16_dct_flipadst_2_12bpc_c: 9087.4 inv_txfm_add_16x16_dct_flipadst_2_12bpc_avx2: 770.1 inv_txfm_add_16x16_dct_identity_0_12bpc_c: 6570.4 inv_txfm_add_16x16_dct_identity_0_12bpc_avx2: 243.9 inv_txfm_add_16x16_dct_identity_1_12bpc_c: 6615.4 inv_txfm_add_16x16_dct_identity_1_12bpc_avx2: 246.0 inv_txfm_add_16x16_dct_identity_2_12bpc_c: 6553.4 inv_txfm_add_16x16_dct_identity_2_12bpc_avx2: 435.0 inv_txfm_add_16x16_flipadst_adst_0_12bpc_c: 8982.1 inv_txfm_add_16x16_flipadst_adst_0_12bpc_avx2: 647.2 inv_txfm_add_16x16_flipadst_adst_1_12bpc_c: 8978.9 inv_txfm_add_16x16_flipadst_adst_1_12bpc_avx2: 647.2 inv_txfm_add_16x16_flipadst_adst_2_12bpc_c: 8964.0 inv_txfm_add_16x16_flipadst_adst_2_12bpc_avx2: 868.4 inv_txfm_add_16x16_flipadst_dct_0_12bpc_c: 9083.5 inv_txfm_add_16x16_flipadst_dct_0_12bpc_avx2: 550.0 inv_txfm_add_16x16_flipadst_dct_1_12bpc_c: 9070.4 inv_txfm_add_16x16_flipadst_dct_1_12bpc_avx2: 550.2 inv_txfm_add_16x16_flipadst_dct_2_12bpc_c: 9085.8 inv_txfm_add_16x16_flipadst_dct_2_12bpc_avx2: 779.7 inv_txfm_add_16x16_flipadst_flipadst_0_12bpc_c: 8977.1 inv_txfm_add_16x16_flipadst_flipadst_0_12bpc_avx2: 657.3 inv_txfm_add_16x16_flipadst_flipadst_1_12bpc_c: 9002.0 inv_txfm_add_16x16_flipadst_flipadst_1_12bpc_avx2: 657.3 inv_txfm_add_16x16_flipadst_flipadst_2_12bpc_c: 9008.4 inv_txfm_add_16x16_flipadst_flipadst_2_12bpc_avx2: 872.0 inv_txfm_add_16x16_identity_dct_0_12bpc_c: 6504.7 inv_txfm_add_16x16_identity_dct_0_12bpc_avx2: 387.5 inv_txfm_add_16x16_identity_dct_1_12bpc_c: 6548.3 inv_txfm_add_16x16_identity_dct_1_12bpc_avx2: 387.5 inv_txfm_add_16x16_identity_dct_2_12bpc_c: 6512.4 inv_txfm_add_16x16_identity_dct_2_12bpc_avx2: 387.5 inv_txfm_add_16x16_identity_identity_0_12bpc_c: 3926.2 inv_txfm_add_16x16_identity_identity_0_12bpc_avx2: 135.0 inv_txfm_add_16x16_identity_identity_1_12bpc_c: 3896.7 inv_txfm_add_16x16_identity_identity_1_12bpc_avx2: 134.5 inv_txfm_add_16x16_identity_identity_2_12bpc_c: 3888.0 inv_txfm_add_16x16_identity_identity_2_12bpc_avx2: 230.3
-
Victorien Le Couviour--Tuffet authored
resize_8bpc_c: 542599.0 resize_8bpc_ssse3: 87635.4 resize_8bpc_avx2: 67401.1 resize_8bpc_avx512icl: 50263.6 resize_16bpc_c: 573438.9 resize_16bpc_ssse3: 121505.2 resize_16bpc_avx2: 83293.4 resize_16bpc_avx512icl: 77974.8
-
Victorien Le Couviour--Tuffet authored
-
Victorien Le Couviour--Tuffet authored
Allows to run most of dav1d_decode_frame_init unconditionally by putting the CDF and subsequent initializations in a separate task.
-
- Jan 19, 2022
-
-
Victorien Le Couviour--Tuffet authored
-
Victorien Le Couviour--Tuffet authored
This could cause a desync between first and cur, which results in skipping a frame, halting the decoding. This desync typically doesn't occur "long enough" in the current state of the project to trigger the bug, as some frames would fix this cur back. In order to trigger this, one needs to call reset_task_cur() on the last frame, this would be the call post insertion of the INIT task (during dav1d_task_frame_init). This doesn't happen as we would normally pick a task from a previous frame already in the queue.
-
- Jan 18, 2022
-
-
Ronald S. Bultje authored
-
- Jan 17, 2022
-
-
- Jan 14, 2022
-
-
Ronald S. Bultje authored
(To be used alongside --filmgrain.) Addresses part of #310.
-
- Jan 13, 2022
-
-
-
-
This can't catch out of bounds reads (which is what caused the crash in #380), but as long as reads and writes are properly matched, it should catch the corresponding issues.
-
This is necessary if the dimensions set aren't properly aligned.
-
Before: Cortex A7 A8 A9 A53 A72 A73 mc_8tap_regular_w2_v_16bpc_neon: 384.4 194.0 242.9 193.2 134.1 140.0 mc_8tap_regular_w4_v_16bpc_neon: 578.2 242.2 282.7 263.1 171.2 168.9 After: mc_8tap_regular_w2_v_16bpc_neon: 397.1 207.7 250.6 212.9 136.9 140.8 mc_8tap_regular_w4_v_16bpc_neon: 575.2 240.4 277.9 263.0 171.9 167.4
-
For 8tap, unroll the vertical filters slightly less (by 4 instead of 8 elements) and add a special case trailer that handles only 2 elements (for 2x6 and 4x6). By unrolling less, performance on in-order cores is somewhat impacted. Before: Cortex A7 A8 A9 A53 A72 A73 mc_8tap_regular_w2_v_8bpc_neon: 340.0 305.4 336.5 196.5 160.5 167.8 mc_8tap_regular_w4_v_8bpc_neon: 400.4 319.5 391.5 210.3 189.7 188.8 After: mc_8tap_regular_w2_v_8bpc_neon: 364.6 268.5 340.1 223.7 161.7 175.2 mc_8tap_regular_w4_v_8bpc_neon: 408.7 328.4 380.4 219.8 190.7 183.8
-
Before: Cortex A53 A72 A73 mc_8tap_regular_w2_v_16bpc_neon: 164.0 125.3 122.6 mc_8tap_regular_w4_v_16bpc_neon: 232.5 164.0 166.6 After: mc_8tap_regular_w2_v_16bpc_neon: 192.4 131.0 121.4 mc_8tap_regular_w4_v_16bpc_neon: 235.6 162.9 163.7
-
For 8tap, unroll the vertical filters slightly less (by 4 instead of 8 elements) and add a special case trailer that handles only 2 elements (for 2x6 and 4x6). By unrolling less, performance on in-order cores is somewhat impacted. Before: Cortex A53 A72 A73 mc_8tap_regular_w2_v_8bpc_neon: 146.5 141.3 145.6 mc_8tap_regular_w4_v_8bpc_neon: 175.2 180.3 162.4 After: mc_8tap_regular_w2_v_8bpc_neon: 175.7 142.7 150.5 mc_8tap_regular_w4_v_8bpc_neon: 183.3 176.0 154.6
-
Henrik Gramner authored
-
- Jan 12, 2022
-
-
Victorien Le Couviour--Tuffet authored
mc_scaled_8tap_regular_w2_16bpc_c: 737.7 mc_scaled_8tap_regular_w2_16bpc_ssse3: 151.7 mc_scaled_8tap_regular_w2_16bpc_avx2: 141.2 mc_scaled_8tap_regular_w2_dy1_16bpc_c: 660.3 mc_scaled_8tap_regular_w2_dy1_16bpc_ssse3: 80.8 mc_scaled_8tap_regular_w2_dy1_16bpc_avx2: 73.2 mc_scaled_8tap_regular_w2_dy2_16bpc_c: 884.9 mc_scaled_8tap_regular_w2_dy2_16bpc_ssse3: 101.6 mc_scaled_8tap_regular_w2_dy2_16bpc_avx2: 87.2 mc_scaled_8tap_regular_w4_16bpc_c: 1356.3 mc_scaled_8tap_regular_w4_16bpc_ssse3: 172.3 mc_scaled_8tap_regular_w4_16bpc_avx2: 172.5 mc_scaled_8tap_regular_w4_dy1_16bpc_c: 1244.9 mc_scaled_8tap_regular_w4_dy1_16bpc_ssse3: 125.7 mc_scaled_8tap_regular_w4_dy1_16bpc_avx2: 96.1 mc_scaled_8tap_regular_w4_dy2_16bpc_c: 1665.6 mc_scaled_8tap_regular_w4_dy2_16bpc_ssse3: 150.2 mc_scaled_8tap_regular_w4_dy2_16bpc_avx2: 112.8 mc_scaled_8tap_regular_w8_16bpc_c: 2536.5 mc_scaled_8tap_regular_w8_16bpc_ssse3: 383.4 mc_scaled_8tap_regular_w8_16bpc_avx2: 256.2 mc_scaled_8tap_regular_w8_dy1_16bpc_c: 2331.8 mc_scaled_8tap_regular_w8_dy1_16bpc_ssse3: 350.0 mc_scaled_8tap_regular_w8_dy1_16bpc_avx2: 214.0 mc_scaled_8tap_regular_w8_dy2_16bpc_c: 3169.6 mc_scaled_8tap_regular_w8_dy2_16bpc_ssse3: 395.7 mc_scaled_8tap_regular_w8_dy2_16bpc_avx2: 265.7 mc_scaled_8tap_regular_w16_16bpc_c: 6384.6 mc_scaled_8tap_regular_w16_16bpc_ssse3: 1004.4 mc_scaled_8tap_regular_w16_16bpc_avx2: 665.0 mc_scaled_8tap_regular_w16_dy1_16bpc_c: 6103.4 mc_scaled_8tap_regular_w16_dy1_16bpc_ssse3: 896.3 mc_scaled_8tap_regular_w16_dy1_16bpc_avx2: 544.2 mc_scaled_8tap_regular_w16_dy2_16bpc_c: 8584.5 mc_scaled_8tap_regular_w16_dy2_16bpc_ssse3: 1049.0 mc_scaled_8tap_regular_w16_dy2_16bpc_avx2: 695.1 mc_scaled_8tap_regular_w32_16bpc_c: 19672.8 mc_scaled_8tap_regular_w32_16bpc_ssse3: 3204.3 mc_scaled_8tap_regular_w32_16bpc_avx2: 2109.6 mc_scaled_8tap_regular_w32_dy1_16bpc_c: 15964.6 mc_scaled_8tap_regular_w32_dy1_16bpc_ssse3: 2634.5 mc_scaled_8tap_regular_w32_dy1_16bpc_avx2: 1555.8 mc_scaled_8tap_regular_w32_dy2_16bpc_c: 24156.9 mc_scaled_8tap_regular_w32_dy2_16bpc_ssse3: 3217.3 mc_scaled_8tap_regular_w32_dy2_16bpc_avx2: 2088.8 mc_scaled_8tap_regular_w64_16bpc_c: 74356.3 mc_scaled_8tap_regular_w64_16bpc_ssse3: 11225.9 mc_scaled_8tap_regular_w64_16bpc_avx2: 7434.7 mc_scaled_8tap_regular_w64_dy1_16bpc_c: 60080.9 mc_scaled_8tap_regular_w64_dy1_16bpc_ssse3: 8912.8 mc_scaled_8tap_regular_w64_dy1_16bpc_avx2: 5222.2 mc_scaled_8tap_regular_w64_dy2_16bpc_c: 88891.4 mc_scaled_8tap_regular_w64_dy2_16bpc_ssse3: 10824.8 mc_scaled_8tap_regular_w64_dy2_16bpc_avx2: 7086.3 mc_scaled_8tap_regular_w128_16bpc_c: 171633.3 mc_scaled_8tap_regular_w128_16bpc_ssse3: 27089.3 mc_scaled_8tap_regular_w128_16bpc_avx2: 17998.2 mc_scaled_8tap_regular_w128_dy1_16bpc_c: 164399.9 mc_scaled_8tap_regular_w128_dy1_16bpc_ssse3: 24694.1 mc_scaled_8tap_regular_w128_dy1_16bpc_avx2: 14711.2 mc_scaled_8tap_regular_w128_dy2_16bpc_c: 244865.3 mc_scaled_8tap_regular_w128_dy2_16bpc_ssse3: 30599.1 mc_scaled_8tap_regular_w128_dy2_16bpc_avx2: 20341.1 mct_scaled_8tap_regular_w4_16bpc_c: 946.2 mct_scaled_8tap_regular_w4_16bpc_ssse3: 117.5 mct_scaled_8tap_regular_w4_16bpc_avx2: 112.5 mct_scaled_8tap_regular_w4_dy1_16bpc_c: 886.1 mct_scaled_8tap_regular_w4_dy1_16bpc_ssse3: 100.5 mct_scaled_8tap_regular_w4_dy1_16bpc_avx2: 76.8 mct_scaled_8tap_regular_w4_dy2_16bpc_c: 1170.1 mct_scaled_8tap_regular_w4_dy2_16bpc_ssse3: 117.6 mct_scaled_8tap_regular_w4_dy2_16bpc_avx2: 87.9 mct_scaled_8tap_regular_w8_16bpc_c: 2784.2 mct_scaled_8tap_regular_w8_16bpc_ssse3: 408.5 mct_scaled_8tap_regular_w8_16bpc_avx2: 280.3 mct_scaled_8tap_regular_w8_dy1_16bpc_c: 2530.5 mct_scaled_8tap_regular_w8_dy1_16bpc_ssse3: 358.2 mct_scaled_8tap_regular_w8_dy1_16bpc_avx2: 227.1 mct_scaled_8tap_regular_w8_dy2_16bpc_c: 3525.0 mct_scaled_8tap_regular_w8_dy2_16bpc_ssse3: 425.6 mct_scaled_8tap_regular_w8_dy2_16bpc_avx2: 283.6 mct_scaled_8tap_regular_w16_16bpc_c: 6773.8 mct_scaled_8tap_regular_w16_16bpc_ssse3: 1054.6 mct_scaled_8tap_regular_w16_16bpc_avx2: 696.4 mct_scaled_8tap_regular_w16_dy1_16bpc_c: 6418.0 mct_scaled_8tap_regular_w16_dy1_16bpc_ssse3: 938.7 mct_scaled_8tap_regular_w16_dy1_16bpc_avx2: 584.5 mct_scaled_8tap_regular_w16_dy2_16bpc_c: 9432.4 mct_scaled_8tap_regular_w16_dy2_16bpc_ssse3: 1125.3 mct_scaled_8tap_regular_w16_dy2_16bpc_avx2: 753.1 mct_scaled_8tap_regular_w32_16bpc_c: 26028.8 mct_scaled_8tap_regular_w32_16bpc_ssse3: 4128.4 mct_scaled_8tap_regular_w32_16bpc_avx2: 2748.4 mct_scaled_8tap_regular_w32_dy1_16bpc_c: 21604.3 mct_scaled_8tap_regular_w32_dy1_16bpc_ssse3: 3312.4 mct_scaled_8tap_regular_w32_dy1_16bpc_avx2: 2051.1 mct_scaled_8tap_regular_w32_dy2_16bpc_c: 32844.3 mct_scaled_8tap_regular_w32_dy2_16bpc_ssse3: 4102.9 mct_scaled_8tap_regular_w32_dy2_16bpc_avx2: 2741.6 mct_scaled_8tap_regular_w64_16bpc_c: 49101.8 mct_scaled_8tap_regular_w64_16bpc_ssse3: 8758.9 mct_scaled_8tap_regular_w64_16bpc_avx2: 5822.2 mct_scaled_8tap_regular_w64_dy1_16bpc_c: 53557.7 mct_scaled_8tap_regular_w64_dy1_16bpc_ssse3: 8469.7 mct_scaled_8tap_regular_w64_dy1_16bpc_avx2: 5264.3 mct_scaled_8tap_regular_w64_dy2_16bpc_c: 83379.7 mct_scaled_8tap_regular_w64_dy2_16bpc_ssse3: 10623.7 mct_scaled_8tap_regular_w64_dy2_16bpc_avx2: 7164.0 mct_scaled_8tap_regular_w128_16bpc_c: 163182.2 mct_scaled_8tap_regular_w128_16bpc_ssse3: 26452.9 mct_scaled_8tap_regular_w128_16bpc_avx2: 18402.2 mct_scaled_8tap_regular_w128_dy1_16bpc_c: 148199.8 mct_scaled_8tap_regular_w128_dy1_16bpc_ssse3: 23584.9 mct_scaled_8tap_regular_w128_dy1_16bpc_avx2: 14808.1 mct_scaled_8tap_regular_w128_dy2_16bpc_c: 234702.2 mct_scaled_8tap_regular_w128_dy2_16bpc_ssse3: 29653.8 mct_scaled_8tap_regular_w128_dy2_16bpc_avx2: 20042.4
-
- Jan 11, 2022
-
-
Matthias Dressel authored
Filmgrain is using a lot of `vpgatherdd` instructions which are rather slow on certain chips, making the SSSE3 version faster. Fixes #377
-
Matthias Dressel authored
`vpgather*` instructions seem to be relatively slow on current AMD chips. Intel Haswell is slow as well, but just (barely) fast enough to not cause regressions in our current use cases. Co-authored-by:
Henrik Gramner <gramner@twoorioles.com>
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
-
- Jan 10, 2022
-
-
Ronald S. Bultje authored
This reduces memory usage significantly. Fixes #375.
-
- Jan 09, 2022
-
-
Henrik Gramner authored
-
Henrik Gramner authored
-
- Jan 07, 2022
-
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
Ronald S. Bultje authored
Addresses part of #310.
-
- Jan 06, 2022
-
-
Ronald S. Bultje authored
For per-file yuv/y4m writes, this can be automatically specified using e.g. -o file_%w_%h_%5n.yuv/y4m. --muxer=framemd5 -o - --quiet will accomplish the same for per-frame md5sums. Addresses part of #310.
-