Commits · master · Nadine Müller / dav1d

Sep 15, 2022

Don't use gas-preprocessor with clang-cl for arm targets · cc9651f5

Martin Storsjö authored 3 years ago

Since meson 0.58.0 (released in May 2021), meson accepts adding '.S'
assembly files as source files to the clang-cl compiler.

If using an older version of meson, keep using gas-preprocessor
just like for MSVC builds.

cc9651f5

Fix checking the reference dimesions for the projection process · d4a2b75d

David Conrad authored 2 years ago

Section 7.9.2 returns 0 "If RefMiRows[ srcIdx ] is not equal to MiRows,
RefMiCols[ srcIdx ] is not equal to MiCols"

dav1d was comparing pixel width/height, not block width/height,
so conform with the spec

d4a2b75d

Fix calculation of OBMC lap dimensions · eb25f00c

David Conrad authored 2 years ago

Individual OBMC lapped predictions have a max width of 64 pixels
for the top lap and have a max height of 64 for the left laps

This is 7.11.3.9. Overlapped motion compensation process
step4 = Clip3( 2, 16, Num_4x4_Blocks_Wide[ candSz ] )

dav1d wasn't clipping this as needed, which means that with scaled MC, the
interpolation of the 2nd half of a 128 block was incorrect, since mx/my
for subpel filter selection need to be reset at the 64 pixel boundary

eb25f00c

Support film grain application whose only effect is clipping to video range · 10f5ce54

David Conrad authored 2 years ago

This is the parameter combination:
num_y_points == 0 && num_cb_points == 0 && num_cr_points == 0 &&
chroma_scaling_from_luma == 1 && clip_to_restricted_range == 1

Film grain application has two effects: adding noise, and optionally
clipping to video range

For luma, the spec skips film grain application if there's no noise
(num_y_points == 0), but for chroma, it's only skipped if there's no
chroma noise *and* chroma_scaling_from_luma is false

This means it's possible for there to be no noise (num_*_points = 0), but
if clip_to_restricted_range is true then chroma pixels can be clipped to
video range, if chroma_scaling_from_luma is true. Luma pixels, however,
aren't clipped to video range unless there's noise to apply.
dav1d currently skips applying film grain entirely if there is no noise,
regardless of the secondary clipping.

10f5ce54

Ignore T.35 metadata if the OBU contains no payload · 673ee248

David Conrad authored 2 years ago

The syntax of itu_t_t35_payload_bytes is not defined in the AV1
specification, but it does state that decoders should ignore the
entire OBU if they do not understand it.

673ee248

Fix chroma deblock filter size calculation for lossless · 2152826b

David Conrad authored 2 years ago

In section 5.11.34 txSz is always defined to TX_4X4 if Lossless is true

Chroma deblock filter size calculation needs to use this overridden txSz
when lossless is enabled

2152826b

Fix rounding in the calculation of initialSubpelX · e202fa08
David Conrad authored 2 years ago
```
The spec divides err by two, rounding to 0, instead of >>1,
which rounds towards negative infinity
```
e202fa08

Fix overflow when saturating dequantized coefficients clipped to 0 · ee98592b

David Conrad authored 2 years ago

It's possible to encode a large coefficient that becomes 0 after
the clipping in dequant (Abs( dq ) & 0xFFFFFF), e.g. 0x1000000
After that &0xFFFFFF, coeffs are saturated in the range of
[-(1 << (bitdepth+7)), 1 << (bitdepth+7))

dav1d implements this saturation via umin(dq - sign, cf_max), then applies
the sign afterwards via xor. However, for dq = 0 and sign = 1, this step
evaulates to umin(UINT_MAX, cf_max) == cf_max instead of the expected 0.

So instead, do unsigned saturate as umin(dq, cf_max + sign),
then apply sign via (sign ? -dq : dq)
On arm this is the same number of instructions, since cneg exists and is used
On x86 this requires an additional instruction, but this isn't a
latency-critical path

ee98592b

Fix overflow in 8-bit NEON ADST · 1bdb776c

David Conrad authored 2 years ago

In 8-bit adst, it's possible that the final Round2(x[0], 12) can exceed
16-bits signed

Specifically, in 7.13.2.6. Inverse ADST4 process, the precision requirement is:
"It is a requirement of bitstream conformance that all values stored in the
s and x arrays by this process are representable by a signed integer using
r + 12 bits of precision."

For 8 bits, r is 16 for both row and column, so x[] can be 28-bit signed.
For values [134215680, 134217727] (within 2047 of the maximum 28-bit value),
the final Round2(x[0], 12) evaluates to 32768, exceeding 16-bits signed.

So switch to using sqrshrn, which saturates to 16-bits signed

This is a continuation of: Commit b53ff29d
arm: itx: Do clipping in all narrowing downshifts

1bdb776c

Sep 14, 2022

tools: Allocate the priv structs with proper alignment · 08c70801

Martin Storsjö authored 2 years ago

Previously, they could be allocated with any random alignment
matching the end of the MuxerContext/DemuxerContext. The
priv structs themselves can have members that require specific
alignment, or at least the default alignment of malloc()/calloc()
(which is sufficient for native types such as uint64_t and
doubles).

This fixes crashes in some arm builds, where GCC (correctly) wants
to use 64 bit aligned stores to write to MD5Context.

08c70801

Sep 12, 2022
- x86: Fix clipping in 10bpc SSE4.1 IDCT asm · 128a0d89
  Henrik Gramner authored 2 years ago
  
  128a0d89
Sep 10, 2022
- build: Improve Windows linking options · 178681e5
  Henrik Gramner authored 2 years ago
  
  178681e5
Sep 09, 2022

tools: Improve demuxer probing · 52473197

Henrik Gramner authored 2 years ago

Increase the probing size, and change the logic to assume a stream is
valid even if no conclusive decision could be made within the probing
window as long as a sequence header was detected.

52473197

CI: Disable trimming on some tests · 934713e4
Matthias Dressel authored 2 years ago
```
Allow checkasm to run.
```
934713e4
CI: Remove git 'safe.directory' config · 3920bd9d
Matthias Dressel authored 2 years ago
```
It is now handled by the gitlab runner.

Ref: 7d859f9c
```
3920bd9d
gcovr: Ignore parsing errors · ddb3189c
Matthias Dressel authored 2 years ago

ddb3189c

crossfiles: Update Android toolchains · aa3fda78

Matthias Dressel authored 2 years ago

* Android armv7: target API 19 since it's the lowest directly provided
  by the new NDK.
* Newer NDK has generic tools for ar, strip, etc.
* Remove windres as it's only relevant for Windows targets.

aa3fda78

CI: Update images · d92594bd
Matthias Dressel authored 2 years ago
```
Remove experimental since gcc12, clang14, mold are now in unstable.
```
d92594bd

Sep 08, 2022

threading: Limit the progress bitfields to the used size · 6680d26f

Victorien Le Couviour--Tuffet authored 2 years ago

Store the used size instead of the allocated size.

The used size can be smaller than the allocated size, which results in
a wrong computation of the linear progress from the frame_progress
bitfield.

6680d26f

x86: Fix rare crash in chroma film grain asm · fab6427e

Henrik Gramner authored 2 years ago

The width parameter is used directly as a pointer offset, so ensure
that it has an appropriately sized data type.

This has been done previously for luma, but chroma was overlooked.

fab6427e

Sep 07, 2022
- x86: Fix overflows in 12bpc AVX2 identity itx asm · 677129c2
  Henrik Gramner authored 2 years ago and Henrik Gramner committed 2 years ago
  
  677129c2
- x86: Fix an alignment issue in 8-bit AVX-512 loop restoration · 58b15237
  Henrik Gramner authored 2 years ago and Henrik Gramner committed 2 years ago
```
We don't have a separate 8-bit AVX-512 5-tap Wiener filter so the 7-tap
function is used for chroma as well, and in some esoteric edge cases
chroma dst pointers may only have a 32-byte alignment despite having
a width larger than 32, so use an unaligned store as a workaround.
```
  58b15237
Sep 02, 2022
- checkasm: Add short options · 895fed08
  Victorien Le Couviour--Tuffet authored 2 years ago
  
  895fed08
- checkasm: Add pattern matching to --test · 713a4f4e
  Victorien Le Couviour--Tuffet authored 2 years ago
  
  713a4f4e
- checkasm: Remove pattern matching from --bench · a63a7c96
  Victorien Le Couviour--Tuffet authored 2 years ago
```
The pattern matching feature has been improved and is now performed
under the new --function parameter, rendering this one obsolete.
```
  a63a7c96
- checkasm: Add a --function option · d5d37926
  Victorien Le Couviour--Tuffet authored 2 years ago
```
Allows to run checkasm only for functions matching a given pattern.
```
  d5d37926
Aug 30, 2022
- threading: Fix copy_lpf_progress initialization · a3a55b18
  Victorien Le Couviour--Tuffet authored 2 years ago
```
The copy_lpf_progress bitfield might not be fully cleared when size goes
down.

Credit to Oss-Fuzz.
```
  a3a55b18
Aug 19, 2022
- data: don't overwrite the Dav1dDataProps size value · cd5e4152
  James Almer authored 2 years ago
```
Fixes a regression since commit 3d3c51a0.
```
  cd5e4152
Jul 25, 2022

Adjust inlining attributes on some functions · a029d689

Henrik Gramner authored 2 years ago

The code size increase of inlining every call to certain functions
isn't a worthwhile trade-off, and most compilers actually ends up
overriding those particular inlining hints anyway.

In some cases it's also better to split the function into separate
luma and chroma functions.

a029d689

Jul 19, 2022
- x86: Remove leftover instruction in loopfilter AVX2 asm · 0b7a0a2e
  Henrik Gramner authored 2 years ago and Henrik Gramner committed 2 years ago
```
In 0aca76c3 sequences of pand/pandn/por was replaced by pblendvb, but
one instruction (which now acts as a no-op) was accidentally left in.
```
  0b7a0a2e
Jul 13, 2022
- Enable pointer authentication in assembly when building arm64e · 6dc03eee
  David Conrad authored 2 years ago and Martin Storsjö committed 2 years ago
  
  6dc03eee
Jul 11, 2022

Don't trash the return stack buffer in the NEON loop filter · d503bb0c

David Conrad authored 2 years ago

The NEON loop filter's innermost asm function can return to a different
location than the address that called it. This messes up the return stack
predictor, causing returns to be mispredicted

Instead, rework the function to always return to the address that calls it,
and instead return the information needed for the caller to short-circuit
storing pixels

d503bb0c

Jul 06, 2022

CI: Removed snap package generation · 79bc755d

Konstantin Pavlov authored 2 years ago and

Henrik Gramner committed 2 years ago

snapcraft version we use is no longer compatible with authentication
schemes snap store uses. This could be fixed by updating the snapcraft
inside the docker image, but Ubuntu no longer ships an up to date
snapcraft version in their own repositories. The other way to install
snapcraft is to manually fetch the project and core snaps just like we
do in https://code.videolan.org/videolan/docker-images/-/blob/master/vlc-ubuntu-focal/Dockerfile,
but that currently fails on Jammy due to conflict in Python versions
between what is shipped in Jammy and inside snapcraft project.

All in all, it seems snapcraft seems to be abandoned for our CI
use-case, and the usefulness of dav1d snap is disputable, so just drop
it altogether. Packaging is still available in package/snap/ for the
brave souls who want to build it on their own.

79bc755d

Eliminate unused C DSP functions at compile time · bd046635

Henrik Gramner authored 2 years ago and

Henrik Gramner committed 2 years ago

When compiling with asm enabled there's no point in compiling
C versions of DSP functions that have asm implementations using
instruction sets that the compiler can unconditionally use.

E.g. when compiling with -mssse3 we can remove the C version
of all functions with SSSE3 implementations.

This is accomplished using the compiler's dead code elimination
functionality.

Can be configured using the new 'trim_dsp' meson option, which
by default is enabled when compiling in release mode.

bd046635

cpu: Inline dav1d_get_cpu_flags() · 820bf515
Henrik Gramner authored 2 years ago and Henrik Gramner committed 2 years ago

820bf515

Jun 22, 2022
- x86: Add minor loopfilter asm improvements · 233737c9
  Henrik Gramner authored 2 years ago and Henrik Gramner committed 2 years ago
  
  233737c9
Jun 20, 2022

checkasm: Speed up signal handling · 0421f787

Henrik Gramner authored 2 years ago

Enabling/disabling signal handlers is very slow and requires a syscall.

A better approach is to keep the signal handlers enabled all the time,
and use a simple flag variable to determine if a given signal should
be handled or passed on to the default signal handler.

0421f787

checkasm: Improve seed generation on Windows · fa68b036

Henrik Gramner authored 2 years ago

GetTickCount() increases at a very low frequency, >10ms per tick.
When running multiple loops of checkasm instances in parallel
different instances regularly ends up using identical seeds.

Prefer the use of QueryPerformanceCounter() instead, which ticks at
a significantly higher rate, which in turn increases randomness.

fa68b036

ci: Don't specify a specific MacOS version · 0c590fc7
Henrik Gramner authored 2 years ago and Henrik Gramner committed 2 years ago

0c590fc7

Jun 14, 2022
- x86: Add high bit-depth loopfilter AVX-512 (Ice Lake) asm · b0907cf9
  Henrik Gramner authored 2 years ago and Henrik Gramner committed 2 years ago
  
  b0907cf9