- Jun 12, 2020
-
-
Nam Se Hyun authored
-
Nam Se Hyun authored
-
Nam Se Hyun authored
-
- Jun 11, 2020
-
-
Matthias Dressel authored
-
Henrik Gramner authored
The struct is already zero-initialized when the function is called except for the checkasm test, so move the zeroing there instead.
-
Matthias Dressel authored
meson.source_root() returns the root of a parent project if dav1d is embedded as a subproject.
-
Victorien Le Couviour--Tuffet authored
--------------------- x86_64: ------------------------------------------ mct_8tap_regular_w4_h_8bpc_c: 302.3 mct_8tap_regular_w4_h_8bpc_sse2: 47.3 mct_8tap_regular_w4_h_8bpc_ssse3: 19.5 --------------------- mct_8tap_regular_w8_h_8bpc_c: 745.5 mct_8tap_regular_w8_h_8bpc_sse2: 235.2 mct_8tap_regular_w8_h_8bpc_ssse3: 70.4 --------------------- mct_8tap_regular_w16_h_8bpc_c: 1844.3 mct_8tap_regular_w16_h_8bpc_sse2: 755.6 mct_8tap_regular_w16_h_8bpc_ssse3: 225.9 --------------------- mct_8tap_regular_w32_h_8bpc_c: 6685.5 mct_8tap_regular_w32_h_8bpc_sse2: 2954.4 mct_8tap_regular_w32_h_8bpc_ssse3: 795.8 --------------------- mct_8tap_regular_w64_h_8bpc_c: 15633.5 mct_8tap_regular_w64_h_8bpc_sse2: 7120.4 mct_8tap_regular_w64_h_8bpc_ssse3: 1900.4 --------------------- mct_8tap_regular_w128_h_8bpc_c: 37772.1 mct_8tap_regular_w128_h_8bpc_sse2: 17698.1 mct_8tap_regular_w128_h_8bpc_ssse3: 4665.5 ------------------------------------------ mct_8tap_regular_w4_v_8bpc_c: 306.5 mct_8tap_regular_w4_v_8bpc_sse2: 71.7 mct_8tap_regular_w4_v_8bpc_ssse3: 37.9 --------------------- mct_8tap_regular_w8_v_8bpc_c: 923.3 mct_8tap_regular_w8_v_8bpc_sse2: 168.7 mct_8tap_regular_w8_v_8bpc_ssse3: 71.3 --------------------- mct_8tap_regular_w16_v_8bpc_c: 3040.1 mct_8tap_regular_w16_v_8bpc_sse2: 505.1 mct_8tap_regular_w16_v_8bpc_ssse3: 199.7 --------------------- mct_8tap_regular_w32_v_8bpc_c: 12354.8 mct_8tap_regular_w32_v_8bpc_sse2: 1942.0 mct_8tap_regular_w32_v_8bpc_ssse3: 714.2 --------------------- mct_8tap_regular_w64_v_8bpc_c: 29427.9 mct_8tap_regular_w64_v_8bpc_sse2: 4637.4 mct_8tap_regular_w64_v_8bpc_ssse3: 1829.2 --------------------- mct_8tap_regular_w128_v_8bpc_c: 72756.9 mct_8tap_regular_w128_v_8bpc_sse2: 11301.0 mct_8tap_regular_w128_v_8bpc_ssse3: 5020.6 ------------------------------------------ mct_8tap_regular_w4_hv_8bpc_c: 876.9 mct_8tap_regular_w4_hv_8bpc_sse2: 171.7 mct_8tap_regular_w4_hv_8bpc_ssse3: 112.2 --------------------- mct_8tap_regular_w8_hv_8bpc_c: 2215.1 mct_8tap_regular_w8_hv_8bpc_sse2: 730.2 mct_8tap_regular_w8_hv_8bpc_ssse3: 330.9 --------------------- mct_8tap_regular_w16_hv_8bpc_c: 6075.5 mct_8tap_regular_w16_hv_8bpc_sse2: 2252.1 mct_8tap_regular_w16_hv_8bpc_ssse3: 973.4 --------------------- mct_8tap_regular_w32_hv_8bpc_c: 22182.7 mct_8tap_regular_w32_hv_8bpc_sse2: 7692.6 mct_8tap_regular_w32_hv_8bpc_ssse3: 3599.8 --------------------- mct_8tap_regular_w64_hv_8bpc_c: 50876.8 mct_8tap_regular_w64_hv_8bpc_sse2: 18499.6 mct_8tap_regular_w64_hv_8bpc_ssse3: 8815.6 --------------------- mct_8tap_regular_w128_hv_8bpc_c: 122926.3 mct_8tap_regular_w128_hv_8bpc_sse2: 45120.0 mct_8tap_regular_w128_hv_8bpc_ssse3: 22085.7 ------------------------------------------
-
Victorien Le Couviour--Tuffet authored
--------------------- x86_64: ------------------------------------------ mct_bilinear_w4_h_8bpc_c: 98.9 mct_bilinear_w4_h_8bpc_sse2: 30.2 mct_bilinear_w4_h_8bpc_ssse3: 11.5 --------------------- mct_bilinear_w8_h_8bpc_c: 175.3 mct_bilinear_w8_h_8bpc_sse2: 57.0 mct_bilinear_w8_h_8bpc_ssse3: 19.7 --------------------- mct_bilinear_w16_h_8bpc_c: 396.2 mct_bilinear_w16_h_8bpc_sse2: 179.3 mct_bilinear_w16_h_8bpc_ssse3: 50.9 --------------------- mct_bilinear_w32_h_8bpc_c: 1311.2 mct_bilinear_w32_h_8bpc_sse2: 718.8 mct_bilinear_w32_h_8bpc_ssse3: 243.9 --------------------- mct_bilinear_w64_h_8bpc_c: 2892.7 mct_bilinear_w64_h_8bpc_sse2: 1746.0 mct_bilinear_w64_h_8bpc_ssse3: 568.0 --------------------- mct_bilinear_w128_h_8bpc_c: 7192.6 mct_bilinear_w128_h_8bpc_sse2: 4339.8 mct_bilinear_w128_h_8bpc_ssse3: 1619.2 ------------------------------------------ mct_bilinear_w4_v_8bpc_c: 129.7 mct_bilinear_w4_v_8bpc_sse2: 26.6 mct_bilinear_w4_v_8bpc_ssse3: 16.7 --------------------- mct_bilinear_w8_v_8bpc_c: 233.3 mct_bilinear_w8_v_8bpc_sse2: 55.0 mct_bilinear_w8_v_8bpc_ssse3: 24.7 --------------------- mct_bilinear_w16_v_8bpc_c: 498.9 mct_bilinear_w16_v_8bpc_sse2: 146.0 mct_bilinear_w16_v_8bpc_ssse3: 54.2 --------------------- mct_bilinear_w32_v_8bpc_c: 1562.2 mct_bilinear_w32_v_8bpc_sse2: 560.6 mct_bilinear_w32_v_8bpc_ssse3: 201.0 --------------------- mct_bilinear_w64_v_8bpc_c: 3221.3 mct_bilinear_w64_v_8bpc_sse2: 1380.6 mct_bilinear_w64_v_8bpc_ssse3: 499.3 --------------------- mct_bilinear_w128_v_8bpc_c: 7357.7 mct_bilinear_w128_v_8bpc_sse2: 3439.0 mct_bilinear_w128_v_8bpc_ssse3: 1489.1 ------------------------------------------ mct_bilinear_w4_hv_8bpc_c: 185.0 mct_bilinear_w4_hv_8bpc_sse2: 54.5 mct_bilinear_w4_hv_8bpc_ssse3: 22.1 --------------------- mct_bilinear_w8_hv_8bpc_c: 377.8 mct_bilinear_w8_hv_8bpc_sse2: 104.3 mct_bilinear_w8_hv_8bpc_ssse3: 35.8 --------------------- mct_bilinear_w16_hv_8bpc_c: 1159.4 mct_bilinear_w16_hv_8bpc_sse2: 311.0 mct_bilinear_w16_hv_8bpc_ssse3: 106.3 --------------------- mct_bilinear_w32_hv_8bpc_c: 4436.2 mct_bilinear_w32_hv_8bpc_sse2: 1230.7 mct_bilinear_w32_hv_8bpc_ssse3: 400.7 --------------------- mct_bilinear_w64_hv_8bpc_c: 10627.7 mct_bilinear_w64_hv_8bpc_sse2: 2934.2 mct_bilinear_w64_hv_8bpc_ssse3: 957.2 --------------------- mct_bilinear_w128_hv_8bpc_c: 26048.9 mct_bilinear_w128_hv_8bpc_sse2: 7590.3 mct_bilinear_w128_hv_8bpc_ssse3: 2947.0 ------------------------------------------
-
- Jun 10, 2020
-
-
Martin Storsjö authored
This allows skipping half of the first transforms if the input coefficients lie within the upper 4x4 (but checkasm only tests in increments of 8x8 at the moment). With checkasm modified to test in smaller increments, the speedup is like this: Before: Cortex A53 A72 A73 inv_txfm_add_16x8_dct_dct_1_10bpc_neon: 874.4 709.0 707.3 After: inv_txfm_add_16x8_dct_dct_1_10bpc_neon: 618.0 479.5 472.9
-
Martin Storsjö authored
-
- Jun 09, 2020
-
-
- Jun 07, 2020
-
-
Blacklisted some files not directly relevant to the codebase (such as tests, tools and debugging functions). The coverage HTML report gets attached as a build artifact, although unfortunately we can't link directly to the `index.html`. We also attach the coverage XML as a cobertura report, although I'm not sure if it does anything.
-
- Jun 04, 2020
-
-
Wan-Teh Chang authored
-
This is currently not used in dav1d (yet), but there's a need for it in rav1e, which shares this header with dav1d.
-
- Jun 01, 2020
-
-
Victorien Le Couviour--Tuffet authored
mc_scaled_8tap_regular_w2_8bpc_c: 764.4 mc_scaled_8tap_regular_w2_8bpc_avx2: 191.3 mc_scaled_8tap_regular_w2_dy1_8bpc_c: 705.8 mc_scaled_8tap_regular_w2_dy1_8bpc_avx2: 89.5 mc_scaled_8tap_regular_w2_dy2_8bpc_c: 964.0 mc_scaled_8tap_regular_w2_dy2_8bpc_avx2: 120.3 mc_scaled_8tap_regular_w4_8bpc_c: 1355.7 mc_scaled_8tap_regular_w4_8bpc_avx2: 180.9 mc_scaled_8tap_regular_w4_dy1_8bpc_c: 1233.2 mc_scaled_8tap_regular_w4_dy1_8bpc_avx2: 115.3 mc_scaled_8tap_regular_w4_dy2_8bpc_c: 1707.6 mc_scaled_8tap_regular_w4_dy2_8bpc_avx2: 117.9 mc_scaled_8tap_regular_w8_8bpc_c: 2483.2 mc_scaled_8tap_regular_w8_8bpc_avx2: 294.8 mc_scaled_8tap_regular_w8_dy1_8bpc_c: 2166.4 mc_scaled_8tap_regular_w8_dy1_8bpc_avx2: 222.0 mc_scaled_8tap_regular_w8_dy2_8bpc_c: 3133.7 mc_scaled_8tap_regular_w8_dy2_8bpc_avx2: 292.6 mc_scaled_8tap_regular_w16_8bpc_c: 5239.2 mc_scaled_8tap_regular_w16_8bpc_avx2: 729.9 mc_scaled_8tap_regular_w16_dy1_8bpc_c: 5156.5 mc_scaled_8tap_regular_w16_dy1_8bpc_avx2: 602.2 mc_scaled_8tap_regular_w16_dy2_8bpc_c: 8018.4 mc_scaled_8tap_regular_w16_dy2_8bpc_avx2: 783.1 mc_scaled_8tap_regular_w32_8bpc_c: 14745.0 mc_scaled_8tap_regular_w32_8bpc_avx2: 2205.0 mc_scaled_8tap_regular_w32_dy1_8bpc_c: 14862.3 mc_scaled_8tap_regular_w32_dy1_8bpc_avx2: 1721.3 mc_scaled_8tap_regular_w32_dy2_8bpc_c: 23607.6 mc_scaled_8tap_regular_w32_dy2_8bpc_avx2: 2325.7 mc_scaled_8tap_regular_w64_8bpc_c: 54891.7 mc_scaled_8tap_regular_w64_8bpc_avx2: 8351.4 mc_scaled_8tap_regular_w64_dy1_8bpc_c: 50249.0 mc_scaled_8tap_regular_w64_dy1_8bpc_avx2: 5864.4 mc_scaled_8tap_regular_w64_dy2_8bpc_c: 79400.1 mc_scaled_8tap_regular_w64_dy2_8bpc_avx2: 8295.7 mc_scaled_8tap_regular_w128_8bpc_c: 121046.8 mc_scaled_8tap_regular_w128_8bpc_avx2: 21809.1 mc_scaled_8tap_regular_w128_dy1_8bpc_c: 133720.4 mc_scaled_8tap_regular_w128_dy1_8bpc_avx2: 16197.8 mc_scaled_8tap_regular_w128_dy2_8bpc_c: 218774.8 mc_scaled_8tap_regular_w128_dy2_8bpc_avx2: 22993.1
-
- May 28, 2020
-
-
Steve Lhomme authored
posix_memalign is defined as a built-in in gcc in msys2 but it's not available when linking with the Universal C Runtime. _aligned_malloc is available in the UCRT. That should only affect builds targeting Windows since _aligned_malloc is a MS thing.
-
- May 26, 2020
-
-
Henrik Gramner authored
Eliminate store forwarding stalls. Use shorter instruction encodings where possible. Misc. tweaks.
-
This one correctly sets the subsampling mode based on whether or not the plane is actually subsampled, and also infers PL_CHROMA_UNKNOWN as PL_CHROMA_TOP_LEFT in such cases.
-
- May 25, 2020
-
-
Niklas Haas authored
libplacebo v66 got helper functions that make preserving the aspect ratio in this case trivial. But we still need to make sure to clear the FBO to black if the image doesn't cover it fully.
-
- May 20, 2020
-
-
Niklas Haas authored
Returning out of this function when pl_render_image() fails is the wrong thing to do, since that leaves the swapchain frame acquired but never submitted. Instead, just clear the target FBO to blank red (to make it clear that something went wrong) and continue on with presentation.
-
- May 19, 2020
-
-
Jean-Baptiste Kempf authored
-
- May 18, 2020
-
-
Niklas Haas authored
Annoying minor differences in this struct layout mean we can't just memcpy the entire thing. Oh well. Note: technically, PL_API_VER 33 added this API, but PL_API_VER 63 is the minimum version of libplacebo that doesn't have glaring bugs when generating chroma grain, so we require that as a minimum instead. (I tested this version on some 4:2:2 and 4:2:0, 8-bit and 10-bit grain samples I had lying around and made sure the output was identical up to differences in rounding / dithering.)
-
Niklas Haas authored
Generalize the code to set the right pl_image metadata based on the values signaled in the Dav1dPictureParameters / Dav1dSequenceHeader. Some values are not mapped, in which case stdout will be spammed. Whatever. Hopefully somebody sees that error spam and opens a bug report for libplacebo to implement it.
-
Niklas Haas authored
Having the pl_image generation live in upload_planes() rather than render() will make it easier to set the correct pl_image metadata based on the Dav1dPicture headers moving forwards. Rename the function to make more sense, semantically. Reduce some code duplication by turning per-plane fields into arrays wherever appropriate. As an aside, also apply the correct chroma location rather than hard-coding it as PL_CHROMA_LEFT.
-
Niklas Haas authored
This is turned into a const array in upstream libplacebo, which generates warnings due to the implicit cast. Rewrite the code to have the mutable array live inside a separate variable `extensions` and only set `iparams.extensions` to this, rather than directly manipulating it.
-
- May 16, 2020
-
-
Signed-off-by:
Marvin Scholz <epirat07@gmail.com>
-
Jean-Baptiste Kempf authored
-
- May 15, 2020
-
-
Henrik Gramner authored
Add code to check that a function doesn't accidentally overwrite anything in the area located just above the current stack frame.
-
Marvin Scholz authored
-
Marvin Scholz authored
This allows selecting at runtime if placebo should use OpenGL or Vulkan for rendering.
-
Marvin Scholz authored
-
Marvin Scholz authored
-
Marvin Scholz authored
-
Marvin Scholz authored
-
- May 14, 2020
-
-
Marvin Scholz authored
-
Marvin Scholz authored
-
Marvin Scholz authored
To un-clutter the main dav1dplay.c, move the fifo to its own file and header.
-
Martin Storsjö authored
If the maximum number of arguments (currently 15) is changed into an even number, and a function actually takes the full number of arguments, we would have the situation where the checked spot on the stack is at the same place as we store an inverted copy of it. We already allocate enough space for two values though (for stack alignment purposes, 16 bytes on arm64 and 8 bytes on arm32) so by storing the reference in the upper half of this, the lower half of it works as canary and isn't overwritten.
-
Martin Storsjö authored
-