Skip to content
Snippets Groups Projects
  1. Aug 19, 2022
  2. Jul 25, 2022
    • Henrik Gramner's avatar
      Adjust inlining attributes on some functions · a029d689
      Henrik Gramner authored
      The code size increase of inlining every call to certain functions
      isn't a worthwhile trade-off, and most compilers actually ends up
      overriding those particular inlining hints anyway.
      
      In some cases it's also better to split the function into separate
      luma and chroma functions.
      a029d689
  3. Jul 19, 2022
  4. Jul 13, 2022
  5. Jul 11, 2022
    • David Conrad's avatar
      Don't trash the return stack buffer in the NEON loop filter · d503bb0c
      David Conrad authored
      The NEON loop filter's innermost asm function can return to a different
      location than the address that called it. This messes up the return stack
      predictor, causing returns to be mispredicted
      
      Instead, rework the function to always return to the address that calls it,
      and instead return the information needed for the caller to short-circuit
      storing pixels
      d503bb0c
  6. Jul 06, 2022
    • Konstantin Pavlov's avatar
      CI: Removed snap package generation · 79bc755d
      Konstantin Pavlov authored and Henrik Gramner's avatar Henrik Gramner committed
      snapcraft version we use is no longer compatible with authentication
      schemes snap store uses.  This could be fixed by updating the snapcraft
      inside the docker image, but Ubuntu no longer ships an up to date
      snapcraft version in their own repositories.  The other way to install
      snapcraft is to manually fetch the project and core snaps just like we
      do in https://code.videolan.org/videolan/docker-images/-/blob/master/vlc-ubuntu-focal/Dockerfile,
      but that currently fails on Jammy due to conflict in Python versions
      between what is shipped in Jammy and inside snapcraft project.
      
      All in all, it seems snapcraft seems to be abandoned for our CI
      use-case, and the usefulness of dav1d snap is disputable, so just drop
      it altogether.  Packaging is still available in package/snap/ for the
      brave souls who want to build it on their own.
      79bc755d
    • Henrik Gramner's avatar
      Eliminate unused C DSP functions at compile time · bd046635
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      When compiling with asm enabled there's no point in compiling
      C versions of DSP functions that have asm implementations using
      instruction sets that the compiler can unconditionally use.
      
      E.g. when compiling with -mssse3 we can remove the C version
      of all functions with SSSE3 implementations.
      
      This is accomplished using the compiler's dead code elimination
      functionality.
      
      Can be configured using the new 'trim_dsp' meson option, which
      by default is enabled when compiling in release mode.
      bd046635
    • Henrik Gramner's avatar
      cpu: Inline dav1d_get_cpu_flags() · 820bf515
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      820bf515
  7. Jun 22, 2022
  8. Jun 20, 2022
    • Henrik Gramner's avatar
      checkasm: Speed up signal handling · 0421f787
      Henrik Gramner authored
      Enabling/disabling signal handlers is very slow and requires a syscall.
      
      A better approach is to keep the signal handlers enabled all the time,
      and use a simple flag variable to determine if a given signal should
      be handled or passed on to the default signal handler.
      0421f787
    • Henrik Gramner's avatar
      checkasm: Improve seed generation on Windows · fa68b036
      Henrik Gramner authored
      GetTickCount() increases at a very low frequency, >10ms per tick.
      When running multiple loops of checkasm instances in parallel
      different instances regularly ends up using identical seeds.
      
      Prefer the use of QueryPerformanceCounter() instead, which ticks at
      a significantly higher rate, which in turn increases randomness.
      fa68b036
    • Henrik Gramner's avatar
      ci: Don't specify a specific MacOS version · 0c590fc7
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      0c590fc7
  9. Jun 14, 2022
  10. Jun 13, 2022
  11. Jun 03, 2022
  12. Jun 02, 2022
    • Henrik Gramner's avatar
      x86: Add a workaround for quirky AVX-512 hardware behavior · 0cfb03cd
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      On Intel CPUs certain AVX-512 shuffle instructions incorrectly
      flag the upper halves of YMM registers as in use when writing
      to XMM registers, which may cause AVX/SSE state transitions.
      
      This behavior is not documented and only occurs on physical
      hardware, not when using the Intel SDE, so as far as I can tell
      it appears to be a hardware bug.
      
      Work around the issue by using EVEX-only registers. This avoids
      the problem at the cost of a slightly larger code size.
      0cfb03cd
  13. May 31, 2022
  14. May 25, 2022
  15. May 20, 2022
  16. May 18, 2022
  17. May 07, 2022
  18. May 05, 2022
  19. Apr 28, 2022
  20. Apr 24, 2022
    • Matthias Dressel's avatar
      x86/itx: Add 32x8 12bpc AVX2 transforms · ffb59680
      Matthias Dressel authored
      inv_txfm_add_32x8_dct_dct_0_12bpc_c: 286.7
      inv_txfm_add_32x8_dct_dct_0_12bpc_avx2: 20.1
      inv_txfm_add_32x8_dct_dct_1_12bpc_c: 7832.7
      inv_txfm_add_32x8_dct_dct_1_12bpc_avx2: 710.6
      inv_txfm_add_32x8_dct_dct_2_12bpc_c: 7838.1
      inv_txfm_add_32x8_dct_dct_2_12bpc_avx2: 711.6
      inv_txfm_add_32x8_dct_dct_3_12bpc_c: 7818.3
      inv_txfm_add_32x8_dct_dct_3_12bpc_avx2: 710.9
      inv_txfm_add_32x8_dct_dct_4_12bpc_c: 7820.6
      inv_txfm_add_32x8_dct_dct_4_12bpc_avx2: 710.5
      inv_txfm_add_32x8_identity_identity_0_12bpc_c: 1526.6
      inv_txfm_add_32x8_identity_identity_0_12bpc_avx2: 19.3
      inv_txfm_add_32x8_identity_identity_1_12bpc_c: 1519.4
      inv_txfm_add_32x8_identity_identity_1_12bpc_avx2: 19.9
      inv_txfm_add_32x8_identity_identity_2_12bpc_c: 1519.9
      inv_txfm_add_32x8_identity_identity_2_12bpc_avx2: 43.6
      inv_txfm_add_32x8_identity_identity_3_12bpc_c: 1519.4
      inv_txfm_add_32x8_identity_identity_3_12bpc_avx2: 67.8
      inv_txfm_add_32x8_identity_identity_4_12bpc_c: 1523.2
      inv_txfm_add_32x8_identity_identity_4_12bpc_avx2: 91.6
      ffb59680
    • Matthias Dressel's avatar
      x86/itx: Add 8x32 12bpc AVX2 transforms · e67a5000
      Matthias Dressel authored
      inv_txfm_add_8x32_dct_dct_0_12bpc_c: 334.6
      inv_txfm_add_8x32_dct_dct_0_12bpc_avx2: 66.0
      inv_txfm_add_8x32_dct_dct_1_12bpc_c: 7929.7
      inv_txfm_add_8x32_dct_dct_1_12bpc_avx2: 489.3
      inv_txfm_add_8x32_dct_dct_2_12bpc_c: 7925.8
      inv_txfm_add_8x32_dct_dct_2_12bpc_avx2: 547.1
      inv_txfm_add_8x32_dct_dct_3_12bpc_c: 7928.9
      inv_txfm_add_8x32_dct_dct_3_12bpc_avx2: 647.8
      inv_txfm_add_8x32_dct_dct_4_12bpc_c: 7916.1
      inv_txfm_add_8x32_dct_dct_4_12bpc_avx2: 701.0
      inv_txfm_add_8x32_identity_identity_0_12bpc_c: 2413.1
      inv_txfm_add_8x32_identity_identity_0_12bpc_avx2: 28.6
      inv_txfm_add_8x32_identity_identity_1_12bpc_c: 2415.2
      inv_txfm_add_8x32_identity_identity_1_12bpc_avx2: 28.6
      inv_txfm_add_8x32_identity_identity_2_12bpc_c: 2413.7
      inv_txfm_add_8x32_identity_identity_2_12bpc_avx2: 55.1
      inv_txfm_add_8x32_identity_identity_3_12bpc_c: 2415.4
      inv_txfm_add_8x32_identity_identity_3_12bpc_avx2: 85.3
      inv_txfm_add_8x32_identity_identity_4_12bpc_c: 2401.8
      inv_txfm_add_8x32_identity_identity_4_12bpc_avx2: 116.8
      e67a5000
    • Matthias Dressel's avatar
      x86/itx: Deduplicate dconly code · 0c1fbdef
      Matthias Dressel authored
      0c1fbdef
  21. Apr 23, 2022
  22. Apr 08, 2022
Loading