- May 02, 2023
-
-
Jean-Baptiste Kempf authored
-
- Apr 28, 2023
-
-
- Apr 27, 2023
-
-
When compiling dav1d for an android app, these architectures are required by default (along with arm and arm64 for which crossfiles already exist).
-
- Apr 25, 2023
-
-
Henrik Gramner authored
-
- Apr 23, 2023
-
-
* `needs_exe_wrapper` is only needed in specific cases when `exe_wrapper` is not set. See https://mesonbuild.com/Cross-compilation.html#properties * "Before 0.56.0, <lang>_args and <lang>_link_args must be put in the properties section instead, else they will be ignored." [https://mesonbuild.com/Machine-files.html#meson-builtin-options] Our minimum version is 0.49.0. Meson >= 0.56.0 prints a deprecation warning.
-
- Apr 20, 2023
-
-
Ronald S. Bultje authored
Also implement "fast3" path for pass2.dct64 (where 1/8th of the coefficients are non-zero), which affects 32x64 as well as 64x64. Before: inv_txfm_add_32x64_dct_dct_1_10bpc_c: 51008.6 ( 1.00x) inv_txfm_add_32x64_dct_dct_1_10bpc_sse4: 3351.9 (15.22x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx2: 1419.5 (35.93x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx512icl: 744.8 (68.49x) After: inv_txfm_add_32x64_dct_dct_1_10bpc_c: 51019.5 ( 1.00x) inv_txfm_add_32x64_dct_dct_1_10bpc_sse4: 3276.1 (15.57x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx2: 1420.7 (35.91x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx512icl: 668.3 (76.34x) (Not sure why the SSE4 speed changed.) And speed for 64x64: inv_txfm_add_64x64_dct_dct_0_10bpc_c: 3506.9 ( 1.00x) inv_txfm_add_64x64_dct_dct_0_10bpc_sse4: 535.6 ( 6.55x) inv_txfm_add_64x64_dct_dct_0_10bpc_avx2: 223.5 (15.69x) inv_txfm_add_64x64_dct_dct_0_10bpc_avx512icl: 252.4 (13.89x) inv_txfm_add_64x64_dct_dct_1_10bpc_c: 108353.7 ( 1.00x) inv_txfm_add_64x64_dct_dct_1_10bpc_sse4: 6551.9 (16.54x) inv_txfm_add_64x64_dct_dct_1_10bpc_avx2: 2876.8 (37.66x) inv_txfm_add_64x64_dct_dct_1_10bpc_avx512icl: 1310.1 (82.70x) inv_txfm_add_64x64_dct_dct_2_10bpc_c: 108347.6 ( 1.00x) inv_txfm_add_64x64_dct_dct_2_10bpc_sse4: 7985.4 (13.57x) inv_txfm_add_64x64_dct_dct_2_10bpc_avx2: 3561.8 (30.42x) inv_txfm_add_64x64_dct_dct_2_10bpc_avx512icl: 1962.6 (55.20x) inv_txfm_add_64x64_dct_dct_3_10bpc_c: 108455.5 ( 1.00x) inv_txfm_add_64x64_dct_dct_3_10bpc_sse4: 9709.0 (11.17x) inv_txfm_add_64x64_dct_dct_3_10bpc_avx2: 4220.5 (25.70x) inv_txfm_add_64x64_dct_dct_3_10bpc_avx512icl: 2991.1 (36.26x) inv_txfm_add_64x64_dct_dct_4_10bpc_c: 108349.9 ( 1.00x) inv_txfm_add_64x64_dct_dct_4_10bpc_sse4: 11048.0 ( 9.81x) inv_txfm_add_64x64_dct_dct_4_10bpc_avx2: 4898.1 (22.12x) inv_txfm_add_64x64_dct_dct_4_10bpc_avx512icl: 3108.1 (34.86x)
-
- Apr 18, 2023
-
-
James Almer authored
Nothing in the spec prevents a Temporal Unit from having more than one Metadata OBU of type ITU-T T.35, so export them as an array instead of only exporting the last one we parse. This is backwards compatible with the previous implementation, as users unaware of this change can ignore the n_itut_t35 field and still access the first (or only) entry in the array as they have been doing until now.
-
Ronald S. Bultje authored
inv_txfm_add_64x32_dct_dct_0_10bpc_c: 1760.6 ( 1.00x) inv_txfm_add_64x32_dct_dct_0_10bpc_sse4: 271.1 ( 6.49x) inv_txfm_add_64x32_dct_dct_0_10bpc_avx2: 121.3 (14.52x) inv_txfm_add_64x32_dct_dct_0_10bpc_avx512icl: 116.3 (15.14x) inv_txfm_add_64x32_dct_dct_1_10bpc_c: 66507.4 ( 1.00x) inv_txfm_add_64x32_dct_dct_1_10bpc_sse4: 3712.4 (17.91x) inv_txfm_add_64x32_dct_dct_1_10bpc_avx2: 1830.5 (36.33x) inv_txfm_add_64x32_dct_dct_1_10bpc_avx512icl: 805.4 (82.58x) inv_txfm_add_64x32_dct_dct_2_10bpc_c: 66491.6 ( 1.00x) inv_txfm_add_64x32_dct_dct_2_10bpc_sse4: 5325.3 (12.49x) inv_txfm_add_64x32_dct_dct_2_10bpc_avx2: 2578.5 (25.79x) inv_txfm_add_64x32_dct_dct_2_10bpc_avx512icl: 1394.5 (47.68x) inv_txfm_add_64x32_dct_dct_3_10bpc_c: 66490.2 ( 1.00x) inv_txfm_add_64x32_dct_dct_3_10bpc_sse4: 6418.5 (10.36x) inv_txfm_add_64x32_dct_dct_3_10bpc_avx2: 3305.6 (20.11x) inv_txfm_add_64x32_dct_dct_3_10bpc_avx512icl: 2571.5 (25.86x) inv_txfm_add_64x32_dct_dct_4_10bpc_c: 66508.6 ( 1.00x) inv_txfm_add_64x32_dct_dct_4_10bpc_sse4: 8671.2 ( 7.67x) inv_txfm_add_64x32_dct_dct_4_10bpc_avx2: 4054.2 (16.40x) inv_txfm_add_64x32_dct_dct_4_10bpc_avx512icl: 2691.6 (24.71x)
-
- Apr 13, 2023
-
-
Ronald S. Bultje authored
inv_txfm_add_64x16_dct_dct_0_10bpc_c: 892.0 ( 1.00x) inv_txfm_add_64x16_dct_dct_0_10bpc_sse4: 131.5 ( 6.78x) inv_txfm_add_64x16_dct_dct_0_10bpc_avx2: 63.4 (14.07x) inv_txfm_add_64x16_dct_dct_0_10bpc_avx512icl: 56.8 (15.71x) inv_txfm_add_64x16_dct_dct_1_10bpc_c: 29253.7 ( 1.00x) inv_txfm_add_64x16_dct_dct_1_10bpc_sse4: 1639.7 (17.84x) inv_txfm_add_64x16_dct_dct_1_10bpc_avx2: 1106.8 (26.43x) inv_txfm_add_64x16_dct_dct_1_10bpc_avx512icl: 532.9 (54.89x) inv_txfm_add_64x16_dct_dct_2_10bpc_c: 29249.8 ( 1.00x) inv_txfm_add_64x16_dct_dct_2_10bpc_sse4: 3065.6 ( 9.54x) inv_txfm_add_64x16_dct_dct_2_10bpc_avx2: 1791.0 (16.33x) inv_txfm_add_64x16_dct_dct_2_10bpc_avx512icl: 1108.0 (26.40x) inv_txfm_add_64x16_dct_dct_3_10bpc_c: 29269.1 ( 1.00x) inv_txfm_add_64x16_dct_dct_3_10bpc_sse4: 3738.2 ( 7.83x) inv_txfm_add_64x16_dct_dct_3_10bpc_avx2: 1790.9 (16.34x) inv_txfm_add_64x16_dct_dct_3_10bpc_avx512icl: 1203.8 (24.31x) inv_txfm_add_64x16_dct_dct_4_10bpc_c: 29337.7 ( 1.00x) inv_txfm_add_64x16_dct_dct_4_10bpc_sse4: 3749.7 ( 7.82x) inv_txfm_add_64x16_dct_dct_4_10bpc_avx2: 1791.0 (16.38x) inv_txfm_add_64x16_dct_dct_4_10bpc_avx512icl: 1203.8 (24.37x)
-
- Apr 12, 2023
-
-
Ronald S. Bultje authored
inv_txfm_add_32x64_dct_dct_0_10bpc_c: 1783.5 ( 1.00x) inv_txfm_add_32x64_dct_dct_0_10bpc_sse4: 243.3 ( 7.33x) inv_txfm_add_32x64_dct_dct_0_10bpc_avx2: 119.1 (14.97x) inv_txfm_add_32x64_dct_dct_0_10bpc_avx512icl: 142.6 (12.50x) inv_txfm_add_32x64_dct_dct_1_10bpc_c: 50422.5 ( 1.00x) inv_txfm_add_32x64_dct_dct_1_10bpc_sse4: 2880.5 (17.50x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx2: 1423.4 (35.43x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx512icl: 741.6 (67.99x) inv_txfm_add_32x64_dct_dct_2_10bpc_c: 50433.6 ( 1.00x) inv_txfm_add_32x64_dct_dct_2_10bpc_sse4: 4015.1 (12.56x) inv_txfm_add_32x64_dct_dct_2_10bpc_avx2: 1767.7 (28.53x) inv_txfm_add_32x64_dct_dct_2_10bpc_avx512icl: 960.8 (52.49x) inv_txfm_add_32x64_dct_dct_3_10bpc_c: 50422.2 ( 1.00x) inv_txfm_add_32x64_dct_dct_3_10bpc_sse4: 4500.5 (11.20x) inv_txfm_add_32x64_dct_dct_3_10bpc_avx2: 2111.7 (23.88x) inv_txfm_add_32x64_dct_dct_3_10bpc_avx512icl: 1777.1 (28.37x) inv_txfm_add_32x64_dct_dct_4_10bpc_c: 50444.2 ( 1.00x) inv_txfm_add_32x64_dct_dct_4_10bpc_sse4: 5592.8 ( 9.02x) inv_txfm_add_32x64_dct_dct_4_10bpc_avx2: 2458.1 (20.52x) inv_txfm_add_32x64_dct_dct_4_10bpc_avx512icl: 1867.2 (27.02x)
-
James Almer authored
-
- Apr 11, 2023
-
-
James Almer authored
-
- Apr 08, 2023
-
-
Dav1dRef is an opaque struct to the API user, and they have no business with these fields at all, so move them to the internal picture struct. Signed-off-by:
James Almer <jamrial@gmail.com>
-
Nothing in the spec prevents a Temporal Unit from having more than one Metadata OBU of type ITU-T T.35, so export them as an array instead of only exporting the last one we parse. This is backwards compatible with the previous implementation, as users unaware of this change can ignore the n_itut_t35 field and still access the first (or only) entry in the array as they have been doing until now. Signed-off-by:
James Almer <jamrial@gmail.com>
-
Ronald S. Bultje authored
nop: 39.4 inv_txfm_add_16x64_dct_dct_0_10bpc_c: 2208.0 ( 1.00x) inv_txfm_add_16x64_dct_dct_0_10bpc_sse4: 133.5 (16.54x) inv_txfm_add_16x64_dct_dct_0_10bpc_avx2: 71.3 (30.98x) inv_txfm_add_16x64_dct_dct_0_10bpc_avx512icl: 102.0 (21.66x) inv_txfm_add_16x64_dct_dct_1_10bpc_c: 25757.0 ( 1.00x) inv_txfm_add_16x64_dct_dct_1_10bpc_sse4: 1366.1 (18.85x) inv_txfm_add_16x64_dct_dct_1_10bpc_avx2: 657.6 (39.17x) inv_txfm_add_16x64_dct_dct_1_10bpc_avx512icl: 378.9 (67.98x) inv_txfm_add_16x64_dct_dct_2_10bpc_c: 25771.0 ( 1.00x) inv_txfm_add_16x64_dct_dct_2_10bpc_sse4: 1739.7 (14.81x) inv_txfm_add_16x64_dct_dct_2_10bpc_avx2: 772.1 (33.38x) inv_txfm_add_16x64_dct_dct_2_10bpc_avx512icl: 469.3 (54.92x) inv_txfm_add_16x64_dct_dct_3_10bpc_c: 25775.7 ( 1.00x) inv_txfm_add_16x64_dct_dct_3_10bpc_sse4: 1968.1 (13.10x) inv_txfm_add_16x64_dct_dct_3_10bpc_avx2: 886.5 (29.08x) inv_txfm_add_16x64_dct_dct_3_10bpc_avx512icl: 662.6 (38.90x) inv_txfm_add_16x64_dct_dct_4_10bpc_c: 25745.9 ( 1.00x) inv_txfm_add_16x64_dct_dct_4_10bpc_sse4: 2330.9 (11.05x) inv_txfm_add_16x64_dct_dct_4_10bpc_avx2: 1008.5 (25.53x) inv_txfm_add_16x64_dct_dct_4_10bpc_avx512icl: 662.3 (38.88x)
-
- Apr 06, 2023
-
-
Fixes #421
-
- Mar 31, 2023
-
-
Matthias Dressel authored
-
Matthias Dressel authored
inv_txfm_add_32x32_identity_identity_0_12bpc_c: 5785.8 ( 1.00x) inv_txfm_add_32x32_identity_identity_0_12bpc_avx2: 20.7 (279.65x) inv_txfm_add_32x32_identity_identity_1_12bpc_c: 5896.9 ( 1.00x) inv_txfm_add_32x32_identity_identity_1_12bpc_avx2: 20.7 (285.01x) inv_txfm_add_32x32_identity_identity_2_12bpc_c: 5799.5 ( 1.00x) inv_txfm_add_32x32_identity_identity_2_12bpc_avx2: 68.9 (84.20x) inv_txfm_add_32x32_identity_identity_3_12bpc_c: 5798.1 ( 1.00x) inv_txfm_add_32x32_identity_identity_3_12bpc_avx2: 140.6 (41.25x) inv_txfm_add_32x32_identity_identity_4_12bpc_c: 5803.3 ( 1.00x) inv_txfm_add_32x32_identity_identity_4_12bpc_avx2: 308.2 (18.83x)
-
Matthias Dressel authored
inv_txfm_add_32x16_identity_identity_0_12bpc_c: 4138.7 ( 1.00x) inv_txfm_add_32x16_identity_identity_0_12bpc_avx2: 30.4 (136.26x) inv_txfm_add_32x16_identity_identity_1_12bpc_c: 4147.5 ( 1.00x) inv_txfm_add_32x16_identity_identity_1_12bpc_avx2: 30.7 (135.25x) inv_txfm_add_32x16_identity_identity_2_12bpc_c: 4138.2 ( 1.00x) inv_txfm_add_32x16_identity_identity_2_12bpc_avx2: 98.9 (41.84x) inv_txfm_add_32x16_identity_identity_3_12bpc_c: 4136.6 ( 1.00x) inv_txfm_add_32x16_identity_identity_3_12bpc_avx2: 167.7 (24.67x) inv_txfm_add_32x16_identity_identity_4_12bpc_c: 4156.3 ( 1.00x) inv_txfm_add_32x16_identity_identity_4_12bpc_avx2: 242.1 (17.17x)
-
Matthias Dressel authored
inv_txfm_add_16x32_identity_identity_0_12bpc_c: 4287.9 ( 1.00x) inv_txfm_add_16x32_identity_identity_0_12bpc_avx2: 31.4 (136.66x) inv_txfm_add_16x32_identity_identity_1_12bpc_c: 4293.7 ( 1.00x) inv_txfm_add_16x32_identity_identity_1_12bpc_avx2: 30.9 (139.07x) inv_txfm_add_16x32_identity_identity_2_12bpc_c: 4273.8 ( 1.00x) inv_txfm_add_16x32_identity_identity_2_12bpc_avx2: 97.3 (43.92x) inv_txfm_add_16x32_identity_identity_3_12bpc_c: 4269.0 ( 1.00x) inv_txfm_add_16x32_identity_identity_3_12bpc_avx2: 165.2 (25.83x) inv_txfm_add_16x32_identity_identity_4_12bpc_c: 4284.4 ( 1.00x) inv_txfm_add_16x32_identity_identity_4_12bpc_avx2: 235.2 (18.22x)
-
- Mar 25, 2023
-
-
- Mar 23, 2023
-
-
Victorien Le Couviour--Tuffet authored
-
- Mar 21, 2023
-
-
James Almer authored
If a Metadata OBU appeared right before a Frame Header OBU with show_existing_picture = 1, it was not being attached to it but to the next assembled picture, which was in the following TU. Signed-off-by:
James Almer <jamrial@gmail.com>
-
Martin Storsjö authored
Relative speedup over the C code: Cortex A53 A55 A72 A73 A76 Apple M1 intra_pred_z3_w4_16bpc_neon: 3.06 2.87 2.17 1.97 2.33 7.75 intra_pred_z3_w8_16bpc_neon: 3.90 3.94 2.97 3.16 2.93 4.43 intra_pred_z3_w16_16bpc_neon: 4.08 4.48 3.31 4.68 3.13 5.00 intra_pred_z3_w32_16bpc_neon: 4.43 4.85 3.50 4.02 3.33 5.62 intra_pred_z3_w64_16bpc_neon: 4.68 5.30 3.72 3.96 3.52 5.78
-
Martin Storsjö authored
Relative speedup over the C code: Cortex A53 A55 A72 A73 A76 Apple M1 intra_pred_z1_w4_16bpc_neon: 3.49 2.63 2.83 3.85 3.14 9.00 intra_pred_z1_w8_16bpc_neon: 6.19 4.39 3.65 6.58 4.99 6.50 intra_pred_z1_w16_16bpc_neon: 6.65 4.64 3.97 7.78 4.87 7.00 intra_pred_z1_w32_16bpc_neon: 7.76 5.49 5.17 7.83 5.59 8.24 intra_pred_z1_w64_16bpc_neon: 8.02 5.80 5.33 8.41 5.77 8.70
-
Martin Storsjö authored
For 8 bpc, there's probably not much difference to a decent memset, but for 16 bpc, there might be a bigger difference.
-
Martin Storsjö authored
Add comments explaining the exact dimensions of the gather tables used currently. That reasoning shows that the w=8 case can do with one register less. Before: Cortex A53 A55 A72 A73 A76 Apple M1 intra_pred_z3_w8_8bpc_neon: 356.2 376.2 218.9 246.4 176.1 0.6 After: intra_pred_z3_w8_8bpc_neon: 339.6 357.3 205.6 232.3 160.0 0.5
-
Martin Storsjö authored
Start out the multiplication/accumulation with a register that is available sooner. Before: Cortex A53 A55 A72 A73 A76 Apple M1 intra_pred_z1_w8_8bpc_neon: 266.3 268.9 146.6 155.3 103.9 0.4 intra_pred_z1_w16_8bpc_neon: 528.6 574.4 333.9 364.3 209.1 0.7 intra_pred_z1_w32_8bpc_neon: 1149.3 1245.4 752.3 811.5 503.4 1.3 intra_pred_z1_w64_8bpc_neon: 2198.4 2360.6 1462.9 1575.0 1007.6 2.4 After: intra_pred_z1_w8_8bpc_neon: 266.3 269.1 146.6 155.0 100.1 0.4 intra_pred_z1_w16_8bpc_neon: 528.6 573.3 347.9 352.4 204.3 0.7 intra_pred_z1_w32_8bpc_neon: 1149.2 1245.3 763.4 759.6 474.8 1.3 intra_pred_z1_w64_8bpc_neon: 2198.8 2360.6 1430.0 1417.4 943.5 2.3
-
Martin Storsjö authored
The second register will at most contain one valid pixel, the padding pixel. Thus skip padding the register and just fill it with the padding pixel.
-
Martin Storsjö authored
There were redundant leftovers from copypasting bits when writing this function.
-
Martin Storsjö authored
This is for cases with h >= 16.
-
Martin Storsjö authored
-
Martin Storsjö authored
-
- Mar 16, 2023
-
-
Victorien Le Couviour--Tuffet authored
-
- Mar 13, 2023
-
-
Victorien Le Couviour--Tuffet authored
-
Victorien Le Couviour--Tuffet authored
Process 2 blocks per iteration instead of 4. Credits to gramner@twoorioles.com.
-
Victorien Le Couviour--Tuffet authored
We must reload error just before calling dav1d_decode_frame_exit, as it may have become stale between the last load and that call. This can result in crashes since we signal a seemingly successfully decoded frame, when it's not. Reloading error within the frame done condition's body ensures a non-stale value, as we use 'f->task_thread.task_counter == 0' to ensure all other threads / tasks have already completed when entering it. In other words, only the last thread still working on this frame can execute this code, after all other threads have returned to doing something else.
-
- Mar 07, 2023
-
-
Henrik Gramner authored
-
- Mar 06, 2023
-
-
Victorien Le Couviour--Tuffet authored
-
Victorien Le Couviour--Tuffet authored
-