Commits · master · Burnthard Hunger / dav1d

May 16, 2020
- Update NEWS for 0.7.0 · da69c3ce
  Jean-Baptiste Kempf authored 5 years ago
  
  da69c3ce
May 15, 2020
- checkasm: x86: Check for stack corruption · a82211aa
  Henrik Gramner authored 5 years ago
```
Add code to check that a function doesn't accidentally overwrite
anything in the area located just above the current stack frame.
```
  a82211aa
- tools: add missing fopen error handling · d3a10dc8
  Marvin Scholz authored 5 years ago
  
  d3a10dc8
- Dav1dPlay: Split placebo renderer into two · e4a4c8c6
  Marvin Scholz authored 5 years ago
```
This allows selecting at runtime if placebo should use OpenGL
or Vulkan for rendering.
```
  e4a4c8c6
- Dav1dPlay: Remove redundant log message · 7f50fc37
  Marvin Scholz authored 5 years ago
  
  7f50fc37
- Dav1dPlay: Remove unused renderer_info member · 2987b78a
  Marvin Scholz authored 5 years ago
  
  2987b78a
- Dav1dPlay: Allow runtime renderer selection · c1c41ff0
  Marvin Scholz authored 5 years ago
  
  c1c41ff0
- Dav1dPlay: Fix renderer selection · 7f5cf34d
  Marvin Scholz authored 5 years ago
  
  7f5cf34d
May 14, 2020
- Dav1dPlay: Split renderers into different files · e8fc62fc
  Marvin Scholz authored 5 years ago
  
  e8fc62fc
- Dav1dPlay: Add support for OpenGL with libplacebo · 41e08199
  Marvin Scholz authored 5 years ago
  
  41e08199
- Dav1dPlay: Split FIFO to different files · 9c56be26
  Marvin Scholz authored 5 years ago
```
To un-clutter the main dav1dplay.c, move the fifo to its own
file and header.
```
  9c56be26
- checkasm: arm: Offset the location of the stack canary reference · b585f051
  Martin Storsjö authored 5 years ago
```
If the maximum number of arguments (currently 15) is changed into
an even number, and a function actually takes the full number of
arguments, we would have the situation where the checked spot on
the stack is at the same place as we store an inverted copy of it.

We already allocate enough space for two values though (for stack
alignment purposes, 16 bytes on arm64 and 8 bytes on arm32) so by
storing the reference in the upper half of this, the lower half of
it works as canary and isn't overwritten.
```
  b585f051
- checkasm: arm32: Take the number of stack arguments into account when checking for stack clobbering · b878d75d
  Martin Storsjö authored 5 years ago
  
  b878d75d
- checkasm: arm64: Take the number of stack arguments into account when checking for stack clobbering · 55cf967b
  Martin Storsjö authored 5 years ago
  
  55cf967b
May 13, 2020
- checkasm: Cosmetics · 7b2e145d
  Henrik Gramner authored 5 years ago and Henrik Gramner committed 5 years ago
```
Use 'unsigned' instead of 'unsigned int' for consistency.
Add 'const' to a few variables.
Make proper use of C99 features.
```
  7b2e145d
- checkasm: Skip printing the seed when using --list-functions · e22a8f32
  Henrik Gramner authored 5 years ago and Henrik Gramner committed 5 years ago
```
Also skip the AVX warmup.
```
  e22a8f32
- checkasm: arm64: Avoid overwriting the v0/q0/d0/s0 register · 4e251db1
  Matthieu Bouron authored 5 years ago and Martin Storsjö committed 5 years ago
```
If functions return a float value, this value is stored in this
register.
```
  4e251db1
- checkasm: arm: Don't use blx to call checkasm_fail_func · ca38f0f6
  Martin Storsjö authored 5 years ago
```
We should just use a normal bl here, and the linker will add the 'x'
bit if necessary.

This fixes calling the checkasm_fail_func on windows, where the
code is built in thumb mode (and the linker doesn't clear the 'x'
bit in the blx instruction).
```
  ca38f0f6
May 12, 2020
- CI: Add 32 bit instruction set test · 0326c060
  Matthias Dressel authored 5 years ago and Jean-Baptiste Kempf committed 5 years ago
  
  0326c060
- CI: Optimise multi-threading tests · b6ee5e01
  Matthias Dressel authored 5 years ago and Jean-Baptiste Kempf committed 5 years ago
  
  b6ee5e01
- CI: Optimise instruction set tests · ccab2224
  Matthias Dressel authored 5 years ago and Jean-Baptiste Kempf committed 5 years ago
```
* The build from 'build-debian' is reused. 'logging' is not disabled
  since that would trigger an almost full rebuild.
* All ASM tests are merged into one job which is expected to
  seldomly fail, therefore ease of debugging is traded in for
  efficiency.
```
  ccab2224
- CI: Add multi-threading to conformance tests · aff854e1
  Matthias Dressel authored 5 years ago and Jean-Baptiste Kempf committed 5 years ago
  
  aff854e1
- CI: Run conformance tests with different instruction sets · 3e31a6ec
  Matthias Dressel authored 5 years ago and Jean-Baptiste Kempf committed 5 years ago
  
  3e31a6ec
- checkasm: filmgrain: Fix benchmarking in 16 bpc mode · 152391b2
  Martin Storsjö authored 5 years ago
```
When benchmarking, the functions are called with a fixed width
of 64x32 or 32x16, while the test itself is run with a random size
in the range up to 128x32.

In 16 bpc mode, the source pixels must be within the valid range,
because they otherwise cause accesses out of bounds in the scaling
array.
```
  152391b2
May 11, 2020
- cli: Reduce fps fraction in ivf parsing · a0678eac
  Henrik Gramner authored 5 years ago and Jean-Baptiste Kempf committed 5 years ago
```
Also avoid integer overflows by using 64-bit intermediate precision.
```
  a0678eac
May 10, 2020

x86: Use 'test' instead of 'or' to compare with zero · 4d97f5a9
Henrik Gramner authored 5 years ago and Henrik Gramner committed 5 years ago
```
Allows for macro-op fusion.
```
4d97f5a9
x86: Unconditionally compile msac_init.c · 28d33357
Henrik Gramner authored 5 years ago and Henrik Gramner committed 5 years ago
```
Eliminates the x86-64 check from the meson configuration file to be
consistent with how other x86-64-exclusive code is handled.
```
28d33357
x86-64: Do msac refill before calling dav1d_msac_init_x86() · 6a6c3528
Henrik Gramner authored 5 years ago and Henrik Gramner committed 5 years ago
```
Allows for constant propagation and tail call elimination in the
msac initialization, which is performed in each tile.
```
6a6c3528

msac: Avoid attempting to refill after eob has already been reached · 631d7720

Henrik Gramner authored 5 years ago and

Henrik Gramner committed 5 years ago

Utilize the unsigned representation of a signed integer to skip
the refill code if the count was already negative to begin with,
which saves a few clock cycles at the end of each tile.

631d7720

arm64: itx: Add NEON implementation of itx for 10 bpc · eaedb95d

Martin Storsjö authored 5 years ago

Add an element size specifier to the existing individual transform
functions for 8 bpc, naming them e.g. inv_dct_8h_x8_neon, to clarify
that they operate on input vectors of 8h, and make the symbols
public, to let the 10 bpc case call them from a different object file.
The same convention is used in the new itx16.S, like inv_dct_4s_x8_neon.

Make the existing itx.S compiled regardless of whether 8 bpc support
is enabled. For builds with 8 bpc support disabled, this does include
the unused frontend functions though, but this is hopefully tolerable
to avoid having to split the file into a sharable file for transforms
and a separate one for frontends.

This only implements the 10 bpc case, as that case can use transforms
operating on 16 bit coefficients in the second pass.

Relative speedup vs C for a few functions:

Cortex A53 A72 A73
inv_txfm_add_4x4_dct_dct_0_10bpc_neon: 4.14 4.06 4.49
inv_txfm_add_4x4_dct_dct_1_10bpc_neon: 6.51 6.49 6.42
inv_txfm_add_8x8_dct_dct_0_10bpc_neon: 5.02 4.63 6.23
inv_txfm_add_8x8_dct_dct_1_10bpc_neon: 8.54 7.13 11.96
inv_txfm_add_16x16_dct_dct_0_10bpc_neon: 5.52 6.60 8.03
inv_txfm_add_16x16_dct_dct_1_10bpc_neon: 11.27 9.62 12.22
inv_txfm_add_16x16_dct_dct_2_10bpc_neon: 9.60 6.97 8.59
inv_txfm_add_32x32_dct_dct_0_10bpc_neon: 2.60 3.48 3.19
inv_txfm_add_32x32_dct_dct_1_10bpc_neon: 14.65 12.64 16.86
inv_txfm_add_32x32_dct_dct_2_10bpc_neon: 11.57 8.80 12.68
inv_txfm_add_32x32_dct_dct_3_10bpc_neon: 8.79 8.00 9.21
inv_txfm_add_32x32_dct_dct_4_10bpc_neon: 7.58 6.21 7.80
inv_txfm_add_64x64_dct_dct_0_10bpc_neon: 2.41 2.85 2.75
inv_txfm_add_64x64_dct_dct_1_10bpc_neon: 12.91 10.27 12.24
inv_txfm_add_64x64_dct_dct_2_10bpc_neon: 10.96 7.97 10.31
inv_txfm_add_64x64_dct_dct_3_10bpc_neon: 8.95 7.42 9.55
inv_txfm_add_64x64_dct_dct_4_10bpc_neon: 7.97 6.12 7.82

eaedb95d

arm: Mark global symbols hidden · ff3054fe

Martin Storsjö authored 5 years ago

This matches what is done in C by -fvisibility=hidden.

This avoids issues with relocations against other symbols exported
from another assembly file.

ff3054fe

arm64: itx: Prepare for other bitdepths · d4002c88
Martin Storsjö authored 5 years ago

d4002c88
itx: Add a bpc parameter to the itx dsp init function · 5f4e28fe
Martin Storsjö authored 5 years ago

5f4e28fe
arm64: itx: Share code for the three horz_16x8 functions · 1c88bce6
Martin Storsjö authored 5 years ago

1c88bce6

arm64: itx: Fix the eob checking for dct_dct_64x16 · a6711a5c

Martin Storsjö authored 5 years ago

Before this, we never did the early exit from the first pass.

Before:                               Cortex A53      A72      A73
inv_txfm_add_64x16_dct_dct_1_8bpc_neon:   7275.7   5198.3   5250.9
inv_txfm_add_64x16_dct_dct_2_8bpc_neon:   7276.1   5197.0   5251.3
inv_txfm_add_64x16_dct_dct_3_8bpc_neon:   7275.8   5196.2   5254.5
inv_txfm_add_64x16_dct_dct_4_8bpc_neon:   7273.6   5198.8   5254.2
After:
inv_txfm_add_64x16_dct_dct_1_8bpc_neon:   5187.8   3763.8   3735.0
inv_txfm_add_64x16_dct_dct_2_8bpc_neon:   7280.6   5185.6   5256.3
inv_txfm_add_64x16_dct_dct_3_8bpc_neon:   7270.7   5179.8   5250.3
inv_txfm_add_64x16_dct_dct_4_8bpc_neon:   7271.7   5212.4   5256.4

The other related variants didn't have this bug and properly exited
early when possible.

a6711a5c

arm64: itx: Simplify inv_txfm_horz_dct_32x8 · 39d6c599
Martin Storsjö authored 5 years ago
```
Unify some loads and stores, avoiding some extra pointer moving.
```
39d6c599
arm64: itx: Minor optimizations for the 8x32 functions · b6b1394b
Martin Storsjö authored 5 years ago
```
This gives a couple cycles speedup.
```
b6b1394b
arm64: itx: Cosmetic fix up · 208a2abd
Martin Storsjö authored 5 years ago

208a2abd

arm64: itx: Remove an unused constant · 92669a3e

Martin Storsjö authored 5 years ago

This isn't used for a sqrdmulh in its current form here.

The one left in idct_coeffs[1] isn't used within the idct itself,
but inv_txfm_horz_scale_dct_32x8 relies on it being left there for
use with sqrdmulh scaling later.

92669a3e

arm64: itx: Remove a todo comment about more special cased functions · b4f1c1c6

Martin Storsjö authored 5 years ago

These cases were removed from x86 to save space and simplify the code
in e0b88bd2, as those cases
were essentially unused in real world bitstreams.

b4f1c1c6