- May 20, 2020
-
-
Niklas Haas authored
Returning out of this function when pl_render_image() fails is the wrong thing to do, since that leaves the swapchain frame acquired but never submitted. Instead, just clear the target FBO to blank red (to make it clear that something went wrong) and continue on with presentation.
-
- May 19, 2020
-
-
Jean-Baptiste Kempf authored
-
- May 18, 2020
-
-
Niklas Haas authored
Annoying minor differences in this struct layout mean we can't just memcpy the entire thing. Oh well. Note: technically, PL_API_VER 33 added this API, but PL_API_VER 63 is the minimum version of libplacebo that doesn't have glaring bugs when generating chroma grain, so we require that as a minimum instead. (I tested this version on some 4:2:2 and 4:2:0, 8-bit and 10-bit grain samples I had lying around and made sure the output was identical up to differences in rounding / dithering.)
-
Niklas Haas authored
Generalize the code to set the right pl_image metadata based on the values signaled in the Dav1dPictureParameters / Dav1dSequenceHeader. Some values are not mapped, in which case stdout will be spammed. Whatever. Hopefully somebody sees that error spam and opens a bug report for libplacebo to implement it.
-
Niklas Haas authored
Having the pl_image generation live in upload_planes() rather than render() will make it easier to set the correct pl_image metadata based on the Dav1dPicture headers moving forwards. Rename the function to make more sense, semantically. Reduce some code duplication by turning per-plane fields into arrays wherever appropriate. As an aside, also apply the correct chroma location rather than hard-coding it as PL_CHROMA_LEFT.
-
Niklas Haas authored
This is turned into a const array in upstream libplacebo, which generates warnings due to the implicit cast. Rewrite the code to have the mutable array live inside a separate variable `extensions` and only set `iparams.extensions` to this, rather than directly manipulating it.
-
- May 16, 2020
-
-
Signed-off-by:
Marvin Scholz <epirat07@gmail.com>
-
Jean-Baptiste Kempf authored
-
- May 15, 2020
-
-
Henrik Gramner authored
Add code to check that a function doesn't accidentally overwrite anything in the area located just above the current stack frame.
-
Marvin Scholz authored
-
Marvin Scholz authored
This allows selecting at runtime if placebo should use OpenGL or Vulkan for rendering.
-
Marvin Scholz authored
-
Marvin Scholz authored
-
Marvin Scholz authored
-
Marvin Scholz authored
-
- May 14, 2020
-
-
Marvin Scholz authored
-
Marvin Scholz authored
-
Marvin Scholz authored
To un-clutter the main dav1dplay.c, move the fifo to its own file and header.
-
Martin Storsjö authored
If the maximum number of arguments (currently 15) is changed into an even number, and a function actually takes the full number of arguments, we would have the situation where the checked spot on the stack is at the same place as we store an inverted copy of it. We already allocate enough space for two values though (for stack alignment purposes, 16 bytes on arm64 and 8 bytes on arm32) so by storing the reference in the upper half of this, the lower half of it works as canary and isn't overwritten.
-
Martin Storsjö authored
-
Martin Storsjö authored
-
- May 13, 2020
-
-
Use 'unsigned' instead of 'unsigned int' for consistency. Add 'const' to a few variables. Make proper use of C99 features.
-
Also skip the AVX warmup.
-
If functions return a float value, this value is stored in this register.
-
Martin Storsjö authored
We should just use a normal bl here, and the linker will add the 'x' bit if necessary. This fixes calling the checkasm_fail_func on windows, where the code is built in thumb mode (and the linker doesn't clear the 'x' bit in the blx instruction).
-
- May 12, 2020
-
-
-
-
* The build from 'build-debian' is reused. 'logging' is not disabled since that would trigger an almost full rebuild. * All ASM tests are merged into one job which is expected to seldomly fail, therefore ease of debugging is traded in for efficiency.
-
-
-
Martin Storsjö authored
When benchmarking, the functions are called with a fixed width of 64x32 or 32x16, while the test itself is run with a random size in the range up to 128x32. In 16 bpc mode, the source pixels must be within the valid range, because they otherwise cause accesses out of bounds in the scaling array.
-
- May 11, 2020
-
-
Also avoid integer overflows by using 64-bit intermediate precision.
-
- May 10, 2020
-
-
Allows for macro-op fusion.
-
Eliminates the x86-64 check from the meson configuration file to be consistent with how other x86-64-exclusive code is handled.
-
Allows for constant propagation and tail call elimination in the msac initialization, which is performed in each tile.
-
Utilize the unsigned representation of a signed integer to skip the refill code if the count was already negative to begin with, which saves a few clock cycles at the end of each tile.
-
Martin Storsjö authored
Add an element size specifier to the existing individual transform functions for 8 bpc, naming them e.g. inv_dct_8h_x8_neon, to clarify that they operate on input vectors of 8h, and make the symbols public, to let the 10 bpc case call them from a different object file. The same convention is used in the new itx16.S, like inv_dct_4s_x8_neon. Make the existing itx.S compiled regardless of whether 8 bpc support is enabled. For builds with 8 bpc support disabled, this does include the unused frontend functions though, but this is hopefully tolerable to avoid having to split the file into a sharable file for transforms and a separate one for frontends. This only implements the 10 bpc case, as that case can use transforms operating on 16 bit coefficients in the second pass. Relative speedup vs C for a few functions: Cortex A53 A72 A73 inv_txfm_add_4x4_dct_dct_0_10bpc_neon: 4.14 4.06 4.49 inv_txfm_add_4x4_dct_dct_1_10bpc_neon: 6.51 6.49 6.42 inv_txfm_add_8x8_dct_dct_0_10bpc_neon: 5.02 4.63 6.23 inv_txfm_add_8x8_dct_dct_1_10bpc_neon: 8.54 7.13 11.96 inv_txfm_add_16x16_dct_dct_0_10bpc_neon: 5.52 6.60 8.03 inv_txfm_add_16x16_dct_dct_1_10bpc_neon: 11.27 9.62 12.22 inv_txfm_add_16x16_dct_dct_2_10bpc_neon: 9.60 6.97 8.59 inv_txfm_add_32x32_dct_dct_0_10bpc_neon: 2.60 3.48 3.19 inv_txfm_add_32x32_dct_dct_1_10bpc_neon: 14.65 12.64 16.86 inv_txfm_add_32x32_dct_dct_2_10bpc_neon: 11.57 8.80 12.68 inv_txfm_add_32x32_dct_dct_3_10bpc_neon: 8.79 8.00 9.21 inv_txfm_add_32x32_dct_dct_4_10bpc_neon: 7.58 6.21 7.80 inv_txfm_add_64x64_dct_dct_0_10bpc_neon: 2.41 2.85 2.75 inv_txfm_add_64x64_dct_dct_1_10bpc_neon: 12.91 10.27 12.24 inv_txfm_add_64x64_dct_dct_2_10bpc_neon: 10.96 7.97 10.31 inv_txfm_add_64x64_dct_dct_3_10bpc_neon: 8.95 7.42 9.55 inv_txfm_add_64x64_dct_dct_4_10bpc_neon: 7.97 6.12 7.82
-
Martin Storsjö authored
This matches what is done in C by -fvisibility=hidden. This avoids issues with relocations against other symbols exported from another assembly file.
-
Martin Storsjö authored
-
Martin Storsjö authored
-