Commits · df40d36d84fbdd3aae827b36d1a15739efb9225b · Pranav Kant / dav1d

May 20, 2020

dav1dplay: don't freeze on render errors · df40d36d

Niklas Haas authored 4 years ago

Returning out of this function when pl_render_image() fails is the wrong
thing to do, since that leaves the swapchain frame acquired but never
submitted. Instead, just clear the target FBO to blank red (to make it
clear that something went wrong) and continue on with presentation.

df40d36d

May 19, 2020
- Update NEWS for 0.7.0 · dd1ed29b
  Jean-Baptiste Kempf authored 4 years ago
  
  dd1ed29b
May 18, 2020

dav1dplay: support on-GPU film grain synthesis · cbe05cf4

Niklas Haas authored 4 years ago

Annoying minor differences in this struct layout mean we can't just
memcpy the entire thing. Oh well.

Note: technically, PL_API_VER 33 added this API, but PL_API_VER 63 is
the minimum version of libplacebo that doesn't have glaring bugs when
generating chroma grain, so we require that as a minimum instead.

(I tested this version on some 4:2:2 and 4:2:0, 8-bit and 10-bit grain
samples I had lying around and made sure the output was identical up to
differences in rounding / dithering.)

cbe05cf4

dav1dplay: handle all supported csps/reprs/bitdepths · 7bbebdb4

Niklas Haas authored 4 years ago

Generalize the code to set the right pl_image metadata based on the
values signaled in the Dav1dPictureParameters / Dav1dSequenceHeader.

Some values are not mapped, in which case stdout will be spammed.
Whatever. Hopefully somebody sees that error spam and opens a bug report
for libplacebo to implement it.

7bbebdb4

dav1dplay: move and simplify pl_image generation · f01fd0f1

Niklas Haas authored 4 years ago

Having the pl_image generation live in upload_planes() rather than
render() will make it easier to set the correct pl_image metadata based
on the Dav1dPicture headers moving forwards. Rename the function to make
more sense, semantically.

Reduce some code duplication by turning per-plane fields into arrays
wherever appropriate.

As an aside, also apply the correct chroma location rather than
hard-coding it as PL_CHROMA_LEFT.

f01fd0f1

dav1dplay: don't write directly to iparams.extensions · 3bb0aed1

Niklas Haas authored 4 years ago

This is turned into a const array in upstream libplacebo, which
generates warnings due to the implicit cast. Rewrite the code to have
the mutable array live inside a separate variable `extensions` and only
set `iparams.extensions` to this, rather than directly manipulating it.

3bb0aed1

May 16, 2020
- Fix swapped define guards in dav1dplay’s libplacebo renderer · 239b87f0
  Emmanuel Gil Peyrot authored 4 years ago and Jean-Baptiste Kempf committed 4 years ago
```
Signed-off-by: Marvin Scholz <epirat07@gmail.com>
```
  239b87f0
- Update NEWS for 0.7.0 · da69c3ce
  Jean-Baptiste Kempf authored 4 years ago
  
  da69c3ce
May 15, 2020
- checkasm: x86: Check for stack corruption · a82211aa
  Henrik Gramner authored 4 years ago
```
Add code to check that a function doesn't accidentally overwrite
anything in the area located just above the current stack frame.
```
  a82211aa
- tools: add missing fopen error handling · d3a10dc8
  Marvin Scholz authored 4 years ago
  
  d3a10dc8
- Dav1dPlay: Split placebo renderer into two · e4a4c8c6
  Marvin Scholz authored 4 years ago
```
This allows selecting at runtime if placebo should use OpenGL
or Vulkan for rendering.
```
  e4a4c8c6
- Dav1dPlay: Remove redundant log message · 7f50fc37
  Marvin Scholz authored 4 years ago
  
  7f50fc37
- Dav1dPlay: Remove unused renderer_info member · 2987b78a
  Marvin Scholz authored 4 years ago
  
  2987b78a
- Dav1dPlay: Allow runtime renderer selection · c1c41ff0
  Marvin Scholz authored 4 years ago
  
  c1c41ff0
- Dav1dPlay: Fix renderer selection · 7f5cf34d
  Marvin Scholz authored 4 years ago
  
  7f5cf34d
May 14, 2020
- Dav1dPlay: Split renderers into different files · e8fc62fc
  Marvin Scholz authored 4 years ago
  
  e8fc62fc
- Dav1dPlay: Add support for OpenGL with libplacebo · 41e08199
  Marvin Scholz authored 4 years ago
  
  41e08199
- Dav1dPlay: Split FIFO to different files · 9c56be26
  Marvin Scholz authored 4 years ago
```
To un-clutter the main dav1dplay.c, move the fifo to its own
file and header.
```
  9c56be26
- checkasm: arm: Offset the location of the stack canary reference · b585f051
  Martin Storsjö authored 4 years ago
```
If the maximum number of arguments (currently 15) is changed into
an even number, and a function actually takes the full number of
arguments, we would have the situation where the checked spot on
the stack is at the same place as we store an inverted copy of it.

We already allocate enough space for two values though (for stack
alignment purposes, 16 bytes on arm64 and 8 bytes on arm32) so by
storing the reference in the upper half of this, the lower half of
it works as canary and isn't overwritten.
```
  b585f051
- checkasm: arm32: Take the number of stack arguments into account when checking for stack clobbering · b878d75d
  Martin Storsjö authored 4 years ago
  
  b878d75d
- checkasm: arm64: Take the number of stack arguments into account when checking for stack clobbering · 55cf967b
  Martin Storsjö authored 4 years ago
  
  55cf967b
May 13, 2020
- checkasm: Cosmetics · 7b2e145d
  Henrik Gramner authored 4 years ago and Henrik Gramner committed 4 years ago
```
Use 'unsigned' instead of 'unsigned int' for consistency.
Add 'const' to a few variables.
Make proper use of C99 features.
```
  7b2e145d
- checkasm: Skip printing the seed when using --list-functions · e22a8f32
  Henrik Gramner authored 4 years ago and Henrik Gramner committed 4 years ago
```
Also skip the AVX warmup.
```
  e22a8f32
- checkasm: arm64: Avoid overwriting the v0/q0/d0/s0 register · 4e251db1
  Matthieu Bouron authored 4 years ago and Martin Storsjö committed 4 years ago
```
If functions return a float value, this value is stored in this
register.
```
  4e251db1
- checkasm: arm: Don't use blx to call checkasm_fail_func · ca38f0f6
  Martin Storsjö authored 4 years ago
```
We should just use a normal bl here, and the linker will add the 'x'
bit if necessary.

This fixes calling the checkasm_fail_func on windows, where the
code is built in thumb mode (and the linker doesn't clear the 'x'
bit in the blx instruction).
```
  ca38f0f6
May 12, 2020
- CI: Add 32 bit instruction set test · 0326c060
  Matthias Dressel authored 4 years ago and Jean-Baptiste Kempf committed 4 years ago
  
  0326c060
- CI: Optimise multi-threading tests · b6ee5e01
  Matthias Dressel authored 4 years ago and Jean-Baptiste Kempf committed 4 years ago
  
  b6ee5e01
- CI: Optimise instruction set tests · ccab2224
  Matthias Dressel authored 5 years ago and Jean-Baptiste Kempf committed 4 years ago
```
* The build from 'build-debian' is reused. 'logging' is not disabled
  since that would trigger an almost full rebuild.
* All ASM tests are merged into one job which is expected to
  seldomly fail, therefore ease of debugging is traded in for
  efficiency.
```
  ccab2224
- CI: Add multi-threading to conformance tests · aff854e1
  Matthias Dressel authored 5 years ago and Jean-Baptiste Kempf committed 4 years ago
  
  aff854e1
- CI: Run conformance tests with different instruction sets · 3e31a6ec
  Matthias Dressel authored 5 years ago and Jean-Baptiste Kempf committed 4 years ago
  
  3e31a6ec
- checkasm: filmgrain: Fix benchmarking in 16 bpc mode · 152391b2
  Martin Storsjö authored 4 years ago
```
When benchmarking, the functions are called with a fixed width
of 64x32 or 32x16, while the test itself is run with a random size
in the range up to 128x32.

In 16 bpc mode, the source pixels must be within the valid range,
because they otherwise cause accesses out of bounds in the scaling
array.
```
  152391b2
May 11, 2020
- cli: Reduce fps fraction in ivf parsing · a0678eac
  Henrik Gramner authored 4 years ago and Jean-Baptiste Kempf committed 4 years ago
```
Also avoid integer overflows by using 64-bit intermediate precision.
```
  a0678eac
May 10, 2020

x86: Use 'test' instead of 'or' to compare with zero · 4d97f5a9
Henrik Gramner authored 4 years ago and Henrik Gramner committed 4 years ago
```
Allows for macro-op fusion.
```
4d97f5a9
x86: Unconditionally compile msac_init.c · 28d33357
Henrik Gramner authored 4 years ago and Henrik Gramner committed 4 years ago
```
Eliminates the x86-64 check from the meson configuration file to be
consistent with how other x86-64-exclusive code is handled.
```
28d33357
x86-64: Do msac refill before calling dav1d_msac_init_x86() · 6a6c3528
Henrik Gramner authored 4 years ago and Henrik Gramner committed 4 years ago
```
Allows for constant propagation and tail call elimination in the
msac initialization, which is performed in each tile.
```
6a6c3528

msac: Avoid attempting to refill after eob has already been reached · 631d7720

Henrik Gramner authored 4 years ago and

Henrik Gramner committed 4 years ago

Utilize the unsigned representation of a signed integer to skip
the refill code if the count was already negative to begin with,
which saves a few clock cycles at the end of each tile.

631d7720

arm64: itx: Add NEON implementation of itx for 10 bpc · eaedb95d

Martin Storsjö authored 5 years ago

Add an element size specifier to the existing individual transform
functions for 8 bpc, naming them e.g. inv_dct_8h_x8_neon, to clarify
that they operate on input vectors of 8h, and make the symbols
public, to let the 10 bpc case call them from a different object file.
The same convention is used in the new itx16.S, like inv_dct_4s_x8_neon.

Make the existing itx.S compiled regardless of whether 8 bpc support
is enabled. For builds with 8 bpc support disabled, this does include
the unused frontend functions though, but this is hopefully tolerable
to avoid having to split the file into a sharable file for transforms
and a separate one for frontends.

This only implements the 10 bpc case, as that case can use transforms
operating on 16 bit coefficients in the second pass.

Relative speedup vs C for a few functions:

Cortex A53 A72 A73
inv_txfm_add_4x4_dct_dct_0_10bpc_neon: 4.14 4.06 4.49
inv_txfm_add_4x4_dct_dct_1_10bpc_neon: 6.51 6.49 6.42
inv_txfm_add_8x8_dct_dct_0_10bpc_neon: 5.02 4.63 6.23
inv_txfm_add_8x8_dct_dct_1_10bpc_neon: 8.54 7.13 11.96
inv_txfm_add_16x16_dct_dct_0_10bpc_neon: 5.52 6.60 8.03
inv_txfm_add_16x16_dct_dct_1_10bpc_neon: 11.27 9.62 12.22
inv_txfm_add_16x16_dct_dct_2_10bpc_neon: 9.60 6.97 8.59
inv_txfm_add_32x32_dct_dct_0_10bpc_neon: 2.60 3.48 3.19
inv_txfm_add_32x32_dct_dct_1_10bpc_neon: 14.65 12.64 16.86
inv_txfm_add_32x32_dct_dct_2_10bpc_neon: 11.57 8.80 12.68
inv_txfm_add_32x32_dct_dct_3_10bpc_neon: 8.79 8.00 9.21
inv_txfm_add_32x32_dct_dct_4_10bpc_neon: 7.58 6.21 7.80
inv_txfm_add_64x64_dct_dct_0_10bpc_neon: 2.41 2.85 2.75
inv_txfm_add_64x64_dct_dct_1_10bpc_neon: 12.91 10.27 12.24
inv_txfm_add_64x64_dct_dct_2_10bpc_neon: 10.96 7.97 10.31
inv_txfm_add_64x64_dct_dct_3_10bpc_neon: 8.95 7.42 9.55
inv_txfm_add_64x64_dct_dct_4_10bpc_neon: 7.97 6.12 7.82

eaedb95d

arm: Mark global symbols hidden · ff3054fe

Martin Storsjö authored 4 years ago

This matches what is done in C by -fvisibility=hidden.

This avoids issues with relocations against other symbols exported
from another assembly file.

ff3054fe

arm64: itx: Prepare for other bitdepths · d4002c88
Martin Storsjö authored 5 years ago

d4002c88
itx: Add a bpc parameter to the itx dsp init function · 5f4e28fe
Martin Storsjö authored 5 years ago

5f4e28fe