- Jun 01, 2023
-
-
Jean-Baptiste Kempf authored
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
- May 31, 2023
-
-
James Almer authored
Simplifies checks for the caller. Signed-off-by:
James Almer <jamrial@gmail.com>
-
James Almer authored
Don't just check that we don't overrun at a byte aligned offset. Also make sure that the parsing was correct and no valid bits are left in the OBU. Signed-off-by:
James Almer <jamrial@gmail.com>
-
This also simplifies overrun checking a fair amount.
-
The default __printf__ format attribute doesn't match what printf functions actually support. Using __gnu_printf__ fixes it.
-
We know that the payload is aligned on a byte boundary and fully contained within the OBU, so using a bitstream reader function to copy the data one byte at a time is a bit redundant.
-
We require the size to be representable as a signed value. This limit already exists in dav1d_data_create().
-
Creating an entire decoder instance just for some bitstream parsing is completely unnecessary. We can instead parse the sequence header directly into the user-provided buffer while ignoring/skipping other OBU types, with zero memory allocations required.
-
- May 29, 2023
-
-
-
It's not required by the API and would only risk masking potential bugs.
-
- May 26, 2023
-
-
It's only used in debug mode, so inlining prevents dead code from being generated in release mode.
-
In many cases it can be combined with the allocation of the data being referenced instead of allocating it separately.
-
It's not used by anything, and the data it references is stack-allocated.
-
- May 25, 2023
-
-
James Almer authored
-
Martin Storsjö authored
Relative speedup over unvectorized C code: Cortex A53 A55 A72 A73 A76 Apple M1 intra_pred_z2_w4_16bpc_neon: 2.98 2.98 2.38 2.77 3.19 7.75 intra_pred_z2_w8_16bpc_neon: 3.91 4.22 2.64 3.29 3.73 4.78 intra_pred_z2_w16_16bpc_neon: 4.43 5.12 2.89 3.90 3.50 4.26 intra_pred_z2_w32_16bpc_neon: 5.08 6.36 3.44 4.40 4.05 4.96 intra_pred_z2_w64_16bpc_neon: 4.68 5.97 3.29 4.40 3.68 5.23
-
Martin Storsjö authored
-
Henrik Gramner authored
Also make some minor tweaks to existing z1/z2/z3 asm
-
- May 24, 2023
-
-
James Almer authored
It is a requirement of bitstream conformance that all of these are greater than 0.
-
James Almer authored
Completes the work started in cb5a095e.
-
- May 14, 2023
-
-
Matthias Dressel authored
-
James Almer authored
Callers using custom picture allocators may ignore the data pointers in the output pictures, resulting in planes for the wrong frames being displayed. This reverts commits 98b0c96d and 92d8b815, and fixes #426. Signed-off-by:
James Almer <jamrial@gmail.com>
-
- May 12, 2023
-
-
Reduces the table size by around 2.5 kB on 64-bit systems.
-
Reduces the table size by around 3 kB.
-
Avoids the need to initialize the tables more than once, and allows for sharing the same data between multiple decoder instances.
-
- May 11, 2023
-
-
Reduces the size of the edge availability tree by around 16 kB, which should have a small positive impact on cache misses.
-
- May 06, 2023
-
-
Andrey Semashev authored
Avoid wrapping external includes in extern "C" blocks. Also wrap all public headers in extern "C" blocks to allow them to be selectively included in C++ projects. Fixes #422.
-
- May 05, 2023
-
-
Relative speedup over C code: Cortex A53 A55 A72 A73 A76 Apple M1 intra_pred_z2_w4_8bpc_neon: 3.91 3.55 3.31 3.94 3.46 8.50 intra_pred_z2_w8_8bpc_neon: 5.68 5.67 4.31 5.31 4.34 5.83 intra_pred_z2_w16_8bpc_neon: 8.39 9.28 5.53 7.04 7.01 9.45 intra_pred_z2_w32_8bpc_neon: 7.01 8.01 5.04 6.32 5.48 7.48 intra_pred_z2_w64_8bpc_neon: 8.73 10.25 5.92 7.61 6.63 10.05
-
- May 04, 2023
-
-
Victorien Le Couviour--Tuffet authored
Fixes a race where the tasks inserted by the init one could all be executed, signaling frame completion, leading to another frame starting before init_done could be set by the aforementioned init task, which then sets it, preventing the init task of the new frame to be executed. This then caused an assert to trigger down the task picking loop. Credits to Oss-Fuzz.
-
- May 02, 2023
-
-
Jean-Baptiste Kempf authored
-
- Apr 28, 2023
-
-
- Apr 27, 2023
-
-
When compiling dav1d for an android app, these architectures are required by default (along with arm and arm64 for which crossfiles already exist).
-
- Apr 25, 2023
-
-
Henrik Gramner authored
-
- Apr 23, 2023
-
-
* `needs_exe_wrapper` is only needed in specific cases when `exe_wrapper` is not set. See https://mesonbuild.com/Cross-compilation.html#properties * "Before 0.56.0, <lang>_args and <lang>_link_args must be put in the properties section instead, else they will be ignored." [https://mesonbuild.com/Machine-files.html#meson-builtin-options] Our minimum version is 0.49.0. Meson >= 0.56.0 prints a deprecation warning.
-
- Apr 20, 2023
-
-
Ronald S. Bultje authored
Also implement "fast3" path for pass2.dct64 (where 1/8th of the coefficients are non-zero), which affects 32x64 as well as 64x64. Before: inv_txfm_add_32x64_dct_dct_1_10bpc_c: 51008.6 ( 1.00x) inv_txfm_add_32x64_dct_dct_1_10bpc_sse4: 3351.9 (15.22x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx2: 1419.5 (35.93x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx512icl: 744.8 (68.49x) After: inv_txfm_add_32x64_dct_dct_1_10bpc_c: 51019.5 ( 1.00x) inv_txfm_add_32x64_dct_dct_1_10bpc_sse4: 3276.1 (15.57x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx2: 1420.7 (35.91x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx512icl: 668.3 (76.34x) (Not sure why the SSE4 speed changed.) And speed for 64x64: inv_txfm_add_64x64_dct_dct_0_10bpc_c: 3506.9 ( 1.00x) inv_txfm_add_64x64_dct_dct_0_10bpc_sse4: 535.6 ( 6.55x) inv_txfm_add_64x64_dct_dct_0_10bpc_avx2: 223.5 (15.69x) inv_txfm_add_64x64_dct_dct_0_10bpc_avx512icl: 252.4 (13.89x) inv_txfm_add_64x64_dct_dct_1_10bpc_c: 108353.7 ( 1.00x) inv_txfm_add_64x64_dct_dct_1_10bpc_sse4: 6551.9 (16.54x) inv_txfm_add_64x64_dct_dct_1_10bpc_avx2: 2876.8 (37.66x) inv_txfm_add_64x64_dct_dct_1_10bpc_avx512icl: 1310.1 (82.70x) inv_txfm_add_64x64_dct_dct_2_10bpc_c: 108347.6 ( 1.00x) inv_txfm_add_64x64_dct_dct_2_10bpc_sse4: 7985.4 (13.57x) inv_txfm_add_64x64_dct_dct_2_10bpc_avx2: 3561.8 (30.42x) inv_txfm_add_64x64_dct_dct_2_10bpc_avx512icl: 1962.6 (55.20x) inv_txfm_add_64x64_dct_dct_3_10bpc_c: 108455.5 ( 1.00x) inv_txfm_add_64x64_dct_dct_3_10bpc_sse4: 9709.0 (11.17x) inv_txfm_add_64x64_dct_dct_3_10bpc_avx2: 4220.5 (25.70x) inv_txfm_add_64x64_dct_dct_3_10bpc_avx512icl: 2991.1 (36.26x) inv_txfm_add_64x64_dct_dct_4_10bpc_c: 108349.9 ( 1.00x) inv_txfm_add_64x64_dct_dct_4_10bpc_sse4: 11048.0 ( 9.81x) inv_txfm_add_64x64_dct_dct_4_10bpc_avx2: 4898.1 (22.12x) inv_txfm_add_64x64_dct_dct_4_10bpc_avx512icl: 3108.1 (34.86x)
-
- Apr 18, 2023
-
-
James Almer authored
Nothing in the spec prevents a Temporal Unit from having more than one Metadata OBU of type ITU-T T.35, so export them as an array instead of only exporting the last one we parse. This is backwards compatible with the previous implementation, as users unaware of this change can ignore the n_itut_t35 field and still access the first (or only) entry in the array as they have been doing until now.
-
Ronald S. Bultje authored
inv_txfm_add_64x32_dct_dct_0_10bpc_c: 1760.6 ( 1.00x) inv_txfm_add_64x32_dct_dct_0_10bpc_sse4: 271.1 ( 6.49x) inv_txfm_add_64x32_dct_dct_0_10bpc_avx2: 121.3 (14.52x) inv_txfm_add_64x32_dct_dct_0_10bpc_avx512icl: 116.3 (15.14x) inv_txfm_add_64x32_dct_dct_1_10bpc_c: 66507.4 ( 1.00x) inv_txfm_add_64x32_dct_dct_1_10bpc_sse4: 3712.4 (17.91x) inv_txfm_add_64x32_dct_dct_1_10bpc_avx2: 1830.5 (36.33x) inv_txfm_add_64x32_dct_dct_1_10bpc_avx512icl: 805.4 (82.58x) inv_txfm_add_64x32_dct_dct_2_10bpc_c: 66491.6 ( 1.00x) inv_txfm_add_64x32_dct_dct_2_10bpc_sse4: 5325.3 (12.49x) inv_txfm_add_64x32_dct_dct_2_10bpc_avx2: 2578.5 (25.79x) inv_txfm_add_64x32_dct_dct_2_10bpc_avx512icl: 1394.5 (47.68x) inv_txfm_add_64x32_dct_dct_3_10bpc_c: 66490.2 ( 1.00x) inv_txfm_add_64x32_dct_dct_3_10bpc_sse4: 6418.5 (10.36x) inv_txfm_add_64x32_dct_dct_3_10bpc_avx2: 3305.6 (20.11x) inv_txfm_add_64x32_dct_dct_3_10bpc_avx512icl: 2571.5 (25.86x) inv_txfm_add_64x32_dct_dct_4_10bpc_c: 66508.6 ( 1.00x) inv_txfm_add_64x32_dct_dct_4_10bpc_sse4: 8671.2 ( 7.67x) inv_txfm_add_64x32_dct_dct_4_10bpc_avx2: 4054.2 (16.40x) inv_txfm_add_64x32_dct_dct_4_10bpc_avx512icl: 2691.6 (24.71x)
-
- Apr 13, 2023
-
-
Ronald S. Bultje authored
inv_txfm_add_64x16_dct_dct_0_10bpc_c: 892.0 ( 1.00x) inv_txfm_add_64x16_dct_dct_0_10bpc_sse4: 131.5 ( 6.78x) inv_txfm_add_64x16_dct_dct_0_10bpc_avx2: 63.4 (14.07x) inv_txfm_add_64x16_dct_dct_0_10bpc_avx512icl: 56.8 (15.71x) inv_txfm_add_64x16_dct_dct_1_10bpc_c: 29253.7 ( 1.00x) inv_txfm_add_64x16_dct_dct_1_10bpc_sse4: 1639.7 (17.84x) inv_txfm_add_64x16_dct_dct_1_10bpc_avx2: 1106.8 (26.43x) inv_txfm_add_64x16_dct_dct_1_10bpc_avx512icl: 532.9 (54.89x) inv_txfm_add_64x16_dct_dct_2_10bpc_c: 29249.8 ( 1.00x) inv_txfm_add_64x16_dct_dct_2_10bpc_sse4: 3065.6 ( 9.54x) inv_txfm_add_64x16_dct_dct_2_10bpc_avx2: 1791.0 (16.33x) inv_txfm_add_64x16_dct_dct_2_10bpc_avx512icl: 1108.0 (26.40x) inv_txfm_add_64x16_dct_dct_3_10bpc_c: 29269.1 ( 1.00x) inv_txfm_add_64x16_dct_dct_3_10bpc_sse4: 3738.2 ( 7.83x) inv_txfm_add_64x16_dct_dct_3_10bpc_avx2: 1790.9 (16.34x) inv_txfm_add_64x16_dct_dct_3_10bpc_avx512icl: 1203.8 (24.31x) inv_txfm_add_64x16_dct_dct_4_10bpc_c: 29337.7 ( 1.00x) inv_txfm_add_64x16_dct_dct_4_10bpc_sse4: 3749.7 ( 7.82x) inv_txfm_add_64x16_dct_dct_4_10bpc_avx2: 1791.0 (16.38x) inv_txfm_add_64x16_dct_dct_4_10bpc_avx512icl: 1203.8 (24.37x)
-
- Apr 12, 2023
-
-
Ronald S. Bultje authored
inv_txfm_add_32x64_dct_dct_0_10bpc_c: 1783.5 ( 1.00x) inv_txfm_add_32x64_dct_dct_0_10bpc_sse4: 243.3 ( 7.33x) inv_txfm_add_32x64_dct_dct_0_10bpc_avx2: 119.1 (14.97x) inv_txfm_add_32x64_dct_dct_0_10bpc_avx512icl: 142.6 (12.50x) inv_txfm_add_32x64_dct_dct_1_10bpc_c: 50422.5 ( 1.00x) inv_txfm_add_32x64_dct_dct_1_10bpc_sse4: 2880.5 (17.50x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx2: 1423.4 (35.43x) inv_txfm_add_32x64_dct_dct_1_10bpc_avx512icl: 741.6 (67.99x) inv_txfm_add_32x64_dct_dct_2_10bpc_c: 50433.6 ( 1.00x) inv_txfm_add_32x64_dct_dct_2_10bpc_sse4: 4015.1 (12.56x) inv_txfm_add_32x64_dct_dct_2_10bpc_avx2: 1767.7 (28.53x) inv_txfm_add_32x64_dct_dct_2_10bpc_avx512icl: 960.8 (52.49x) inv_txfm_add_32x64_dct_dct_3_10bpc_c: 50422.2 ( 1.00x) inv_txfm_add_32x64_dct_dct_3_10bpc_sse4: 4500.5 (11.20x) inv_txfm_add_32x64_dct_dct_3_10bpc_avx2: 2111.7 (23.88x) inv_txfm_add_32x64_dct_dct_3_10bpc_avx512icl: 1777.1 (28.37x) inv_txfm_add_32x64_dct_dct_4_10bpc_c: 50444.2 ( 1.00x) inv_txfm_add_32x64_dct_dct_4_10bpc_sse4: 5592.8 ( 9.02x) inv_txfm_add_32x64_dct_dct_4_10bpc_avx2: 2458.1 (20.52x) inv_txfm_add_32x64_dct_dct_4_10bpc_avx512icl: 1867.2 (27.02x)
-
James Almer authored
-