Commits · 1.4.x · Pranav Kant / dav1d

May 20, 2024
- tests: Verify dav1d command line in dav1d_argon.bash · bb948769
  Henrik Gramner authored 11 months ago
```
Error out early instead of producing bogus mismatch errors in case
of an incorrect cpu mask for example.
```
  bb948769
May 14, 2024

ARM64: Various optimizations for symbol decode · 7f68f23c

Kyle Siefring authored 1 year ago

Changes stem from redesigning the reduction stage of the multisymbol
decode function.
* No longer use adapt4 for 5 possible symbol values
* Specialize reduction for 4/8/16 decode functions
* Modify control flow

+------------------------+--------------+--------------+---------------+
|                        |  Neoverse V1 |  Neoverse N1 |   Cortex A72  |
|                        | (Graviton 3) | (Graviton 2) |  (Graviton 1) |
+------------------------+-------+------+-------+------+-------+-------+
|                        |  Old  |  New |  Old  |  New |  Old  |  New  |
+------------------------+-------+------+-------+------+-------+-------+
| decode_bool_neon       |  13.0 | 12.9 |  14.9 | 14.0 |  39.3 |  29.0 |
+------------------------+-------+------+-------+------+-------+-------+
| decode_bool_adapt_neon |  15.4 | 15.6 |  17.5 | 16.8 |  41.6 |  33.5 |
+------------------------+-------+------+-------+------+-------+-------+
| decode_bool_equi_neon  |  11.3 | 12.0 |  14.0 | 12.2 |  35.0 |  26.3 |
+------------------------+-------+------+-------+------+-------+-------+
| decode_hi_tok_c        |  73.7 | 57.8 |  73.4 | 60.5 | 130.1 | 103.9 |
+------------------------+-------+------+-------+------+-------+-------+
| decode_hi_tok_neon     |  63.3 | 48.2 |  65.2 | 51.2 | 119.0 | 105.3 |
+------------------------+-------+------+-------+------+-------+-------+
| decode_symbol_\        |  28.6 | 22.5 |  28.4 | 23.5 |  67.8 |  55.1 |
| adapt4_neon            |       |      |       |      |       |       |
+------------------------+-------+------+-------+------+-------+-------+
| decode_symbol_\        |  29.5 | 26.6 |  29.0 | 28.8 |  76.6 |  74.0 |
| adapt8_neon            |       |      |       |      |       |       |
+------------------------+-------+------+-------+------+-------+-------+
| decode_symbol_\        |  31.6 | 31.2 |  33.3 | 33.0 |  77.5 |  68.1 |
| adapt16_neon           |       |      |       |      |       |       |
+------------------------+-------+------+-------+------+-------+-------+

7f68f23c

May 13, 2024

checkasm: Eliminate unreachable code in the Windows exception handler · cc1137c8
Henrik Gramner authored 1 year ago

cc1137c8

checkasm: Avoid UB in setjmp() invocations · 471549f2

Henrik Gramner authored 1 year ago

Both POSIX and the C standard places several environmental limits on
setjmp() invocations, with essentially anything beyond comparing the
return value with a constant as a simple branch condition being UB.

We were previously performing a function call using the setjmp()
return value as an argument, which is technically not allowed
even though it happened to work correctly in practice.

Some systems may loosen those restrictions and allow for more
flexible usage, but we shouldn't be relying on that.

471549f2

May 10, 2024

ppc: Add pwr9 flag · 700c36a6

Luca Barbato authored 1 year ago

Will be used to gate code using vec_absd and other useful instructions.

700c36a6

Apr 15, 2024

ARM64: Port msac improvements to more functions · 37d52435

Kyle Siefring authored 1 year ago and

Henrik Gramner committed 1 year ago

Port improvements from the hi token functions to the rest of the symbol
adaption functions. These weren't originally ported since they didn't
work with arbitrary padding. In practice, zero padding is already used
and only the tests need to be updated.

Results - Neoverse N1

Old:
msac_decode_symbol_adapt4_c:         41.4 ( 1.00x)
msac_decode_symbol_adapt4_neon:      31.0 ( 1.34x)
msac_decode_symbol_adapt8_c:         54.5 ( 1.00x)
msac_decode_symbol_adapt8_neon:      32.2 ( 1.69x)
msac_decode_symbol_adapt16_c:        85.6 ( 1.00x)
msac_decode_symbol_adapt16_neon:     37.5 ( 2.28x)

New:
msac_decode_symbol_adapt4_c:         41.5 ( 1.00x)
msac_decode_symbol_adapt4_neon:      27.7 ( 1.50x)
msac_decode_symbol_adapt8_c:         55.7 ( 1.00x)
msac_decode_symbol_adapt8_neon:      30.1 ( 1.85x)
msac_decode_symbol_adapt16_c:        82.4 ( 1.00x)
msac_decode_symbol_adapt16_neon:     35.2 ( 2.34x)

37d52435

Apr 08, 2024

meson: Enable parallel execution of checkasm in 'meson test' · dc949013

Henrik Gramner authored 1 year ago

It was originally disabled due to older meson versions mixing the output
of 'meson test -v' from different tests, which made the log difficult to
read. Newer versions however caches the output from each test as it runs
and prints it in one contiguous block, so that's no longer an issue.

dc949013

Apr 02, 2024

checkasm: Add support for the private macOS kperf API for benchmarking · 5e31720b

Martin Storsjö authored 1 year ago

On AArch64, the performance counter registers usually are
restricted and not accessible from user space.

On macOS, we currently use mach_absolute_time() as timer on
aarch64. This measures wallclock time but with a very coarse
resolution.

There is a private API, kperf, that one can use for getting
high precision timers though. Unfortunately, it requires running
the checkasm binary as root (e.g. with sudo).

Also, as it is a private, undocumented API, it can potentially
change at any time.

This is handled by adding a new meson build option, for switching
to this timer. If the timer source in checkasm could be changed
at runtime with an option, this wouldn't need to be a build time
option.

This allows getting benchmarks like this:

mc_8tap_regular_w16_hv_8bpc_c:              1522.1 ( 1.00x)
mc_8tap_regular_w16_hv_8bpc_neon:            331.8 ( 4.59x)

Instead of this:

mc_8tap_regular_w16_hv_8bpc_c:                 9.0 ( 1.00x)
mc_8tap...

5e31720b

Mar 04, 2024
- checkasm: aarch64: Print the SVE vector length, if available · fd60097e
  Martin Storsjö authored 1 year ago
  
  fd60097e
Feb 29, 2024

checkasm: Add --list-cpuflags option · 85a10359

Henrik Gramner authored 1 year ago

Prints a list of cpuflags available for the current architecture.

Flags which are supported on the current system will be printed in
green, and flags which are unsupported in red with a ~ prefix.

85a10359

Feb 28, 2024
- Extend Arm and AArch64 run-time CPU feature detection · acc1121d
  Arpad Panyik authored 1 year ago and Martin Storsjö committed 1 year ago
```
Add run-time CPU feature detection for DotProd, i8mm, SVE and SVE2.
SVE and SVE2 are AArch64-only features.
```
  acc1121d
Feb 26, 2024
- riscv/checkasm: Print the RVV vector length, if available · 52948bbf
  Nathan E. Egge authored 1 year ago
  
  52948bbf
Feb 22, 2024

AArch64: Enable benchmarks for 8-tap sharp filters · f1d42ae8

Arpad Panyik authored 1 year ago

The 6-tap sub-pel filter specialisation uses different code paths for
sharp (8-tap) and regular/smooth (6-tap) filtering kernels.

This patch enables benchmarking for the different code paths.

f1d42ae8

Feb 21, 2024

checkasm: Improve msac tests · 83ae3e9a

Henrik Gramner authored 1 year ago

* Process the entire buffer to get better coverage of eob handling.

* Use a more reasonable buffer size.

* Ignore trailing dif bits to allow for more implementation flexibility.

83ae3e9a

Feb 18, 2024
- tests: Automatically determine job count in dav1d_argon.bash · bb26bdca
  Henrik Gramner authored 1 year ago and Henrik Gramner committed 1 year ago
```
Default to using the number of logical cores divided by thread count.
```
  bb26bdca
Jan 31, 2024
- Alphabetize architecture defines and usage · a6878be7
  Nathan E. Egge authored 1 year ago
  
  a6878be7
- checkasm: riscv64: Print modified register names · e67f6306
  Nathan E. Egge authored 1 year ago
  
  e67f6306
- checkasm: riscv64: Add stack canary test · e7660b8b
  Nathan E. Egge authored 1 year ago
  
  e7660b8b
- checkasm: Implement riscv64 checked_call() · 8ee8b9eb
  Nathan E. Egge authored 1 year ago
  
  8ee8b9eb
- riscv64/itx: Add 4-point 8bpc RVV idtx transform · 43ee02a9
  Nathan E. Egge authored 1 year ago
```
inv_txfm_add_4x4_identity_identity_0_8bpc_c:      534.6 ( 1.00x)
inv_txfm_add_4x4_identity_identity_0_8bpc_rvv:     72.2 ( 7.40x)
inv_txfm_add_4x4_identity_identity_1_8bpc_c:      534.7 ( 1.00x)
inv_txfm_add_4x4_identity_identity_1_8bpc_rvv:     72.3 ( 7.40x)
```
  43ee02a9
Jan 30, 2024
- riscv: Add support for checkasm --bench · 7362fcf6
  Nathan E. Egge authored 1 year ago
  
  7362fcf6
Jan 24, 2024
- Use a constant length for progress reporting in dav1d_argon.bash · 227c37f7
  Henrik Gramner authored 1 year ago
  
  227c37f7
- Avoid printing full path names in dav1d_argon.bash · cdb2a1a2
  Henrik Gramner authored 1 year ago
```
Only print the paths relative to the argon directory. This avoids
excessive terminal line wrapping due to long path names which
otherwise interferes with the '\r' usage for progress reporting.
```
  cdb2a1a2
Jan 23, 2024
- meson: Add 'enable_seek_stress' option · 2c9bbb49
  Matthias Dressel authored 1 year ago and Ronald S. Bultje committed 1 year ago
```
Allows to explicitly enable/disable seek-stress tests.
```
  2c9bbb49
Jan 21, 2024

loongarch: Improve the performance of msac series functions · 38bc0084

jinbo authored 1 year ago and

Hecai Yuan committed 1 year ago

Relative speedup over C code:

msac_decode_bool_c:                            0.5 ( 1.00x)
msac_decode_bool_lsx:                          0.5 ( 1.09x)
msac_decode_bool_adapt_c:                      0.7 ( 1.00x)
msac_decode_bool_adapt_lsx:                    0.6 ( 1.20x)
msac_decode_symbol_adapt4_c:                   1.3 ( 1.00x)
msac_decode_symbol_adapt4_lsx:                 1.0 ( 1.30x)
msac_decode_symbol_adapt8_c:                   2.1 ( 1.00x)
msac_decode_symbol_adapt8_lsx:                 1.0 ( 2.05x)
msac_decode_symbol_adapt16_c:                  3.7 ( 1.00x)
msac_decode_symbol_adapt16_lsx:                0.8 ( 4.77x)

38bc0084

Add loongarch support · 2e952f30
Hecai Yuan authored 1 year ago

2e952f30

Jan 11, 2024
- checkasm: Prefer sigsetjmp()/siglongjmp() over SA_NODEFER · d23e87f7
  Henrik Gramner authored 1 year ago
```
Also prefer re-setting the signal handler upon intercept in combination
with SA_RESETHAND over re-raising exceptions with the SIG_DFL handler.
```
  d23e87f7
- checkasm: Make signal handling async-signal-safe · 8501a4b2
  Henrik Gramner authored 1 year ago
  
  8501a4b2
Dec 19, 2023
- checkasm: Fix cdef_dir function prototype · 8ba0df84
  Henrik Gramner authored 1 year ago and Henrik Gramner committed 1 year ago
  
  8ba0df84
Dec 15, 2023
- checkasm: Map SIGBUS to the right error text · 5149b274
  Martin Storsjö authored 1 year ago
```
This was missed in 2ef970a8.

Also print this text for EXCEPTION_IN_PAGE_ERROR on Windows.
```
  5149b274
Nov 12, 2023
- x86: Fix 8bpc AVX2 ipred_z2 filtering with extremely large frame sizes · e47a39ca
  Henrik Gramner authored 1 year ago and Henrik Gramner committed 1 year ago
```
The max_width/max_height values can exceed 16-bit range.
```
  e47a39ca
Nov 01, 2023
- checkasm: Fix catching crashes on Windows on ARM · 2179b30c
  Martin Storsjö authored 1 year ago
```
longjmp on Windows uses SEH to unwind on ARM/ARM64 too, just like on
x86_64, thus use RtlCaptureContext/RtlRestoreContext instead of
setjmp/longjmp on those architectures as well.
```
  2179b30c
- checkasm: Improve DSP trimming error message · d2ee4389
  Henrik Gramner authored 1 year ago and Henrik Gramner committed 1 year ago
  
  d2ee4389
- checkasm: Add missing WINAPI_PARTITION checks on Windows · 611abc20
  Henrik Gramner authored 1 year ago and Henrik Gramner committed 1 year ago
```
Some functionality is only available on WINAPI_PARTITION_DESKTOP systems.
```
  611abc20
- checkasm: Enable virtual terminal processing on Windows · 6bc552eb
  Henrik Gramner authored 1 year ago and Henrik Gramner committed 1 year ago
```
This allows for the use of standard VT100 escape codes for text coloring,
which simplifies things by eliminating a bunch of Windows-specific code.

This is only supported since Windows 10. Things will still run on
older systems, just without colored text output.
```
  6bc552eb
- checkasm: Check for errors in command line parsing · 0f2a877e
  Henrik Gramner authored 1 year ago and Henrik Gramner committed 1 year ago
  
  0f2a877e
Jul 12, 2023
- checkasm: Always bench C-only functions as well · 9278a14c
  Matthias Dressel authored 1 year ago
```
Integrates --bench-c into --bench to simplify benchmarks.
```
  9278a14c
Jul 07, 2023
- checkasm: document '-t' in --help text · fc40a0db
  Matthias Dressel authored 1 year ago
  
  fc40a0db
Jul 06, 2023

Move palette packing/edge-extension into a DSP function · 852cc340
Henrik Gramner authored 1 year ago and Henrik Gramner committed 1 year ago

852cc340

Pack palette indices · 72e9c7c0

Henrik Gramner authored 1 year ago and

Henrik Gramner committed 1 year ago

Pack two indices into each byte instead of storing them separately.

Reduces memory usage by up to 16 kB per sb128 in streams that uses
screen content tools when frame-threading is enabled, at the cost
of some additional computational overhead for packing/unpacking.

72e9c7c0