- Apr 08, 2023
-
-
Ronald S. Bultje authored
nop: 39.4 inv_txfm_add_16x64_dct_dct_0_10bpc_c: 2208.0 ( 1.00x) inv_txfm_add_16x64_dct_dct_0_10bpc_sse4: 133.5 (16.54x) inv_txfm_add_16x64_dct_dct_0_10bpc_avx2: 71.3 (30.98x) inv_txfm_add_16x64_dct_dct_0_10bpc_avx512icl: 102.0 (21.66x) inv_txfm_add_16x64_dct_dct_1_10bpc_c: 25757.0 ( 1.00x) inv_txfm_add_16x64_dct_dct_1_10bpc_sse4: 1366.1 (18.85x) inv_txfm_add_16x64_dct_dct_1_10bpc_avx2: 657.6 (39.17x) inv_txfm_add_16x64_dct_dct_1_10bpc_avx512icl: 378.9 (67.98x) inv_txfm_add_16x64_dct_dct_2_10bpc_c: 25771.0 ( 1.00x) inv_txfm_add_16x64_dct_dct_2_10bpc_sse4: 1739.7 (14.81x) inv_txfm_add_16x64_dct_dct_2_10bpc_avx2: 772.1 (33.38x) inv_txfm_add_16x64_dct_dct_2_10bpc_avx512icl: 469.3 (54.92x) inv_txfm_add_16x64_dct_dct_3_10bpc_c: 25775.7 ( 1.00x) inv_txfm_add_16x64_dct_dct_3_10bpc_sse4: 1968.1 (13.10x) inv_txfm_add_16x64_dct_dct_3_10bpc_avx2: 886.5 (29.08x) inv_txfm_add_16x64_dct_dct_3_10bpc_avx512icl: 662.6 (38.90x) inv_txfm_add_16x64_dct_dct_4_10bpc_c: 25745.9 ( 1.00x) inv_txfm_add_16x64_dct_dct_4_10bpc_sse4: 2330.9 (11.05x) inv_txfm_add_16x64_dct_dct_4_10bpc_avx2: 1008.5 (25.53x) inv_txfm_add_16x64_dct_dct_4_10bpc_avx512icl: 662.3 (38.88x)
-
- Apr 06, 2023
-
-
Fixes #421
-
- Mar 31, 2023
-
-
Matthias Dressel authored
-
Matthias Dressel authored
inv_txfm_add_32x32_identity_identity_0_12bpc_c: 5785.8 ( 1.00x) inv_txfm_add_32x32_identity_identity_0_12bpc_avx2: 20.7 (279.65x) inv_txfm_add_32x32_identity_identity_1_12bpc_c: 5896.9 ( 1.00x) inv_txfm_add_32x32_identity_identity_1_12bpc_avx2: 20.7 (285.01x) inv_txfm_add_32x32_identity_identity_2_12bpc_c: 5799.5 ( 1.00x) inv_txfm_add_32x32_identity_identity_2_12bpc_avx2: 68.9 (84.20x) inv_txfm_add_32x32_identity_identity_3_12bpc_c: 5798.1 ( 1.00x) inv_txfm_add_32x32_identity_identity_3_12bpc_avx2: 140.6 (41.25x) inv_txfm_add_32x32_identity_identity_4_12bpc_c: 5803.3 ( 1.00x) inv_txfm_add_32x32_identity_identity_4_12bpc_avx2: 308.2 (18.83x)
-
Matthias Dressel authored
inv_txfm_add_32x16_identity_identity_0_12bpc_c: 4138.7 ( 1.00x) inv_txfm_add_32x16_identity_identity_0_12bpc_avx2: 30.4 (136.26x) inv_txfm_add_32x16_identity_identity_1_12bpc_c: 4147.5 ( 1.00x) inv_txfm_add_32x16_identity_identity_1_12bpc_avx2: 30.7 (135.25x) inv_txfm_add_32x16_identity_identity_2_12bpc_c: 4138.2 ( 1.00x) inv_txfm_add_32x16_identity_identity_2_12bpc_avx2: 98.9 (41.84x) inv_txfm_add_32x16_identity_identity_3_12bpc_c: 4136.6 ( 1.00x) inv_txfm_add_32x16_identity_identity_3_12bpc_avx2: 167.7 (24.67x) inv_txfm_add_32x16_identity_identity_4_12bpc_c: 4156.3 ( 1.00x) inv_txfm_add_32x16_identity_identity_4_12bpc_avx2: 242.1 (17.17x)
-
Matthias Dressel authored
inv_txfm_add_16x32_identity_identity_0_12bpc_c: 4287.9 ( 1.00x) inv_txfm_add_16x32_identity_identity_0_12bpc_avx2: 31.4 (136.66x) inv_txfm_add_16x32_identity_identity_1_12bpc_c: 4293.7 ( 1.00x) inv_txfm_add_16x32_identity_identity_1_12bpc_avx2: 30.9 (139.07x) inv_txfm_add_16x32_identity_identity_2_12bpc_c: 4273.8 ( 1.00x) inv_txfm_add_16x32_identity_identity_2_12bpc_avx2: 97.3 (43.92x) inv_txfm_add_16x32_identity_identity_3_12bpc_c: 4269.0 ( 1.00x) inv_txfm_add_16x32_identity_identity_3_12bpc_avx2: 165.2 (25.83x) inv_txfm_add_16x32_identity_identity_4_12bpc_c: 4284.4 ( 1.00x) inv_txfm_add_16x32_identity_identity_4_12bpc_avx2: 235.2 (18.22x)
-
- Mar 25, 2023
-
-
- Mar 23, 2023
-
-
Victorien Le Couviour--Tuffet authored
-
- Mar 21, 2023
-
-
James Almer authored
If a Metadata OBU appeared right before a Frame Header OBU with show_existing_picture = 1, it was not being attached to it but to the next assembled picture, which was in the following TU. Signed-off-by:
James Almer <jamrial@gmail.com>
-
Martin Storsjö authored
Relative speedup over the C code: Cortex A53 A55 A72 A73 A76 Apple M1 intra_pred_z3_w4_16bpc_neon: 3.06 2.87 2.17 1.97 2.33 7.75 intra_pred_z3_w8_16bpc_neon: 3.90 3.94 2.97 3.16 2.93 4.43 intra_pred_z3_w16_16bpc_neon: 4.08 4.48 3.31 4.68 3.13 5.00 intra_pred_z3_w32_16bpc_neon: 4.43 4.85 3.50 4.02 3.33 5.62 intra_pred_z3_w64_16bpc_neon: 4.68 5.30 3.72 3.96 3.52 5.78
-
Martin Storsjö authored
Relative speedup over the C code: Cortex A53 A55 A72 A73 A76 Apple M1 intra_pred_z1_w4_16bpc_neon: 3.49 2.63 2.83 3.85 3.14 9.00 intra_pred_z1_w8_16bpc_neon: 6.19 4.39 3.65 6.58 4.99 6.50 intra_pred_z1_w16_16bpc_neon: 6.65 4.64 3.97 7.78 4.87 7.00 intra_pred_z1_w32_16bpc_neon: 7.76 5.49 5.17 7.83 5.59 8.24 intra_pred_z1_w64_16bpc_neon: 8.02 5.80 5.33 8.41 5.77 8.70
-
Martin Storsjö authored
For 8 bpc, there's probably not much difference to a decent memset, but for 16 bpc, there might be a bigger difference.
-
Martin Storsjö authored
Add comments explaining the exact dimensions of the gather tables used currently. That reasoning shows that the w=8 case can do with one register less. Before: Cortex A53 A55 A72 A73 A76 Apple M1 intra_pred_z3_w8_8bpc_neon: 356.2 376.2 218.9 246.4 176.1 0.6 After: intra_pred_z3_w8_8bpc_neon: 339.6 357.3 205.6 232.3 160.0 0.5
-
Martin Storsjö authored
Start out the multiplication/accumulation with a register that is available sooner. Before: Cortex A53 A55 A72 A73 A76 Apple M1 intra_pred_z1_w8_8bpc_neon: 266.3 268.9 146.6 155.3 103.9 0.4 intra_pred_z1_w16_8bpc_neon: 528.6 574.4 333.9 364.3 209.1 0.7 intra_pred_z1_w32_8bpc_neon: 1149.3 1245.4 752.3 811.5 503.4 1.3 intra_pred_z1_w64_8bpc_neon: 2198.4 2360.6 1462.9 1575.0 1007.6 2.4 After: intra_pred_z1_w8_8bpc_neon: 266.3 269.1 146.6 155.0 100.1 0.4 intra_pred_z1_w16_8bpc_neon: 528.6 573.3 347.9 352.4 204.3 0.7 intra_pred_z1_w32_8bpc_neon: 1149.2 1245.3 763.4 759.6 474.8 1.3 intra_pred_z1_w64_8bpc_neon: 2198.8 2360.6 1430.0 1417.4 943.5 2.3
-
Martin Storsjö authored
The second register will at most contain one valid pixel, the padding pixel. Thus skip padding the register and just fill it with the padding pixel.
-
Martin Storsjö authored
There were redundant leftovers from copypasting bits when writing this function.
-
Martin Storsjö authored
This is for cases with h >= 16.
-
Martin Storsjö authored
-
Martin Storsjö authored
-
- Mar 16, 2023
-
-
Victorien Le Couviour--Tuffet authored
-
- Mar 13, 2023
-
-
Victorien Le Couviour--Tuffet authored
-
Victorien Le Couviour--Tuffet authored
Process 2 blocks per iteration instead of 4. Credits to gramner@twoorioles.com.
-
Victorien Le Couviour--Tuffet authored
We must reload error just before calling dav1d_decode_frame_exit, as it may have become stale between the last load and that call. This can result in crashes since we signal a seemingly successfully decoded frame, when it's not. Reloading error within the frame done condition's body ensures a non-stale value, as we use 'f->task_thread.task_counter == 0' to ensure all other threads / tasks have already completed when entering it. In other words, only the last thread still working on this frame can execute this code, after all other threads have returned to doing something else.
-
- Mar 07, 2023
-
-
Henrik Gramner authored
-
- Mar 06, 2023
-
-
Victorien Le Couviour--Tuffet authored
-
Victorien Le Couviour--Tuffet authored
-
Victorien Le Couviour--Tuffet authored
Pack the 5 bytes of data to improve memory and perf.
-
- Mar 03, 2023
-
-
Tristan Matthews authored
This fixes a regression from 7409a189
-
- Mar 01, 2023
-
-
Matthias Dressel authored
-
-
Matthias Dressel authored
Co-authored-by:
Henrik Gramner <gramner@twoorioles.com>
-
- Feb 28, 2023
-
-
-
Improves readability.
-
-
- Feb 27, 2023
-
-
It would previously print the full report() info for C functions (with broken horizontal alignment as a side effect).
-
- Feb 26, 2023
-
-
Martin Storsjö authored
98b0c96d added an include of src/ref.h in src/fg_apply_tmpl.c. That template source file is included in tests/checkasm/filmgrain.c. src/ref.h includes <stdatomic.h>. Including this file requires declaring a dependency on stdatomic_dependencies in meson, which provides the fallback implementation of stdatomic.h when building with MSVC.
-
- Feb 25, 2023
-
-
James Almer authored
Create new references instead. Signed-off-by:
James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
- Feb 23, 2023
-
-
Luca Barbato authored
-
- Feb 14, 2023
-
-
Jean-Baptiste Kempf authored
"From VideoLAN with love"
-