- Apr 21, 2020
-
-
Purely a cosmetic change.
-
Testing shows that those code paths are essentially never executed with real-world bitstreams so they just add redundant branches, increase code size, and add complexity for no actual benefit.
-
- Apr 16, 2020
-
-
Victorien Le Couviour--Tuffet authored
Those need to be aligned when w*h >= 64, as we will try to load by 64 bytes. (also realigns the 4x4 masks to 16 as a 32-byte alignment is unnecessary)
-
- Apr 11, 2020
-
-
Matthias Dressel authored
-
- Apr 10, 2020
-
-
Janne Grunau authored
Memory sanitizer depends on compiler instrumentation which makes it inherently incompatible with asm DSP functions. Refs #336
-
- Apr 09, 2020
-
-
Luc Trudeau authored
Tested in isolation, this appears to be faster, but hard to tell overall.
-
- Apr 07, 2020
-
-
Luc Trudeau authored
-
Victorien Le Couviour--Tuffet authored
-
Victorien Le Couviour--Tuffet authored
cdef_filter_4x8_8bpc_avx2: 54.0 cdef_filter_4x8_8bpc_avx512icl: 35.5 => +52.1% cdef_filter_8x8_8bpc_avx2: 71.0 cdef_filter_8x8_8bpc_avx512icl: 49.0 => +44.9%
-
- Apr 05, 2020
-
-
Martin Storsjö authored
Relative speedup over C code: Cortex A7 A8 A9 A53 A72 A73 emu_edge_w4_8bpc_neon: 4.23 3.39 2.55 3.58 3.11 3.57 emu_edge_w8_8bpc_neon: 4.02 3.61 2.47 3.74 3.50 3.77 emu_edge_w16_8bpc_neon: 4.56 3.63 2.93 3.97 3.44 4.11 emu_edge_w32_8bpc_neon: 3.82 3.05 2.04 3.79 2.34 3.10 emu_edge_w64_8bpc_neon: 3.27 2.97 1.84 3.70 2.39 1.97 emu_edge_w128_8bpc_neon: 2.58 2.64 1.54 3.04 1.28 1.87
-
Martin Storsjö authored
Relative speedup over C code: Cortex A53 A72 A73 emu_edge_w4_16bpc_neon: 2.49 1.53 1.91 emu_edge_w8_16bpc_neon: 2.27 1.55 1.90 emu_edge_w16_16bpc_neon: 2.46 1.46 2.09 emu_edge_w32_16bpc_neon: 2.20 1.39 1.73 emu_edge_w64_16bpc_neon: 1.65 1.00 1.46 emu_edge_w128_16bpc_neon: 1.55 1.44 1.54
-
- Apr 04, 2020
-
-
Martin Storsjö authored
Relative speedups over C code: Cortex A53 A72 A73 emu_edge_w4_8bpc_neon: 3.82 2.93 2.41 emu_edge_w8_8bpc_neon: 3.28 2.86 2.51 emu_edge_w16_8bpc_neon: 3.58 3.27 2.63 emu_edge_w32_8bpc_neon: 3.04 1.68 2.12 emu_edge_w64_8bpc_neon: 2.58 1.45 1.48 emu_edge_w128_8bpc_neon: 1.79 1.02 1.57 The benchmark numbers for the larger size on A72 fluctuate a whole lot and thus seem very unreliable.
-
- Apr 03, 2020
-
-
Matthias Dressel authored
-
-
-
Victorien Le Couviour--Tuffet authored
Explains how the clipping to the range defined in the spec works.
-
Android uses bionic, not glibc, as the C library. __linux__ is also defined for Android, so also test __GLIBC__ to avoid looking up __pthread_get_minstack in Android bionic. Also, include <dlfcn.h> only if HAVE_DLSYM is defined. In glibc, <dlfcn.h> includes <features.h>, which defines __GLIBC__.
-
- Apr 02, 2020
-
-
Wan-Teh Chang authored
Also, the assertion that 'align' is a power of 2 can be used by all cases in dav1d_alloc_aligned().
-
Martin Storsjö authored
If leftext/rightext are zero, we invoke a version of v_loop with the whole need_left_ext/need_right_ext parts left out altogether, so these checks seem to be redundant.
-
Ronald S. Bultje authored
Fixes crashes in dav1d_resize_{avx2,ssse3} on very small resolutions with super_res enabled but skipped because the width is too small.
-
- Apr 01, 2020
-
-
Ronald S. Bultje authored
fguv_32x32xn_8bpc_420_csfl0_c: 14568.2 fguv_32x32xn_8bpc_420_csfl0_ssse3: 1162.3 fguv_32x32xn_8bpc_420_csfl1_c: 10682.0 fguv_32x32xn_8bpc_420_csfl1_ssse3: 910.3 fguv_32x32xn_8bpc_422_csfl0_c: 16370.5 fguv_32x32xn_8bpc_422_csfl0_ssse3: 1202.6 fguv_32x32xn_8bpc_422_csfl1_c: 11333.8 fguv_32x32xn_8bpc_422_csfl1_ssse3: 958.8 fguv_32x32xn_8bpc_444_csfl0_c: 12950.1 fguv_32x32xn_8bpc_444_csfl0_ssse3: 1133.6 fguv_32x32xn_8bpc_444_csfl1_c: 8806.7 fguv_32x32xn_8bpc_444_csfl1_ssse3: 731.0
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
fguv_32x32xn_8bpc_420_csfl0_c: 14568.2 fguv_32x32xn_8bpc_420_csfl0_avx2: 940.2 fguv_32x32xn_8bpc_420_csfl1_c: 10682.0 fguv_32x32xn_8bpc_420_csfl1_avx2: 783.3 fguv_32x32xn_8bpc_422_csfl0_c: 16370.5 fguv_32x32xn_8bpc_422_csfl0_avx2: 1557.3 fguv_32x32xn_8bpc_422_csfl1_c: 11333.8 fguv_32x32xn_8bpc_422_csfl1_avx2: 902.1 fguv_32x32xn_8bpc_444_csfl0_c: 12950.1 fguv_32x32xn_8bpc_444_csfl0_avx2: 822.9 fguv_32x32xn_8bpc_444_csfl1_c: 8806.7 fguv_32x32xn_8bpc_444_csfl1_avx2: 708.2
-
Ronald S. Bultje authored
-
- Mar 31, 2020
-
-
Ensure that unaligned memory access overhead is avoided.
-
Ronald S. Bultje authored
This is the VEX (AVX) encoded variant for the SSE4 instruction ptest, so emulate it using pmovmskb in the SSSE3 version.
-
Ronald S. Bultje authored
resize_8bpc_c: 1613670.2 resize_8bpc_ssse3: 110469.5 resize_8bpc_avx2: 93580.6
-
Ronald S. Bultje authored
resize_8bpc_c: 1637609.7 resize_8bpc_avx2: 95162.6
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
- Mar 29, 2020
-
-
Luc Trudeau authored
-
- Mar 28, 2020
-
-
Luc Trudeau authored
-
-
- Mar 27, 2020
-
-
Janne Grunau authored
Allows building with nasm < 2.14.
-
Luc Trudeau authored
-
- Mar 26, 2020
-
-
Also contains const correctness changes.
-
Martin Storsjö authored
The FILTER_PRED function is templated and has two separate instantations for 10 and 12 bit separately. (They're switched between using a runtime check on entry to the function.)
-
Martin Storsjö authored
This allows testing cases where this function internally switches between two implementations for those two, without adding a bpc argument to the init function (forcing testing and benchmarking every 16 bpc ipred function twice).
-
Martin Storsjö authored
-