Skip to content
Snippets Groups Projects
  1. Sep 30, 2022
  2. Sep 28, 2022
  3. Sep 26, 2022
  4. Sep 19, 2022
    • Martin Storsjö's avatar
      arm: itx: Add clipping to row_clip_min/max in the 10 bpc codepaths · 345127a7
      Martin Storsjö authored
      This fixes conformance with the argon test samples, in particular
      with these samples:
          profile0_core/streams/test10100_579_8614.obu
          profile0_core/streams/test10218_6914.obu
      
      This gives a pretty notable slowdown to these transforms - some
      examples:
      
      Before:                                 Cortex A53       A72       A73    Apple M1
      inv_txfm_add_8x8_dct_dct_1_10bpc_neon:       365.7     290.2     299.8    0.3
      inv_txfm_add_16x16_dct_dct_2_10bpc_neon:    1865.2    1384.1    1457.5    2.6
      inv_txfm_add_64x64_dct_dct_4_10bpc_neon:   33976.3   26817.0   24864.2   40.4
      After:
      inv_txfm_add_8x8_dct_dct_1_10bpc_neon:       397.7     322.2     335.1    0.4
      inv_txfm_add_16x16_dct_dct_2_10bpc_neon:    2121.9    1336.7    1664.6    2.6
      inv_txfm_add_64x64_dct_dct_4_10bpc_neon:   38569.4   27622.6   28176.0   51.0
      
      Thus, for the transforms alone, it makes them around 10-13% slower
      (the Apple M1 measurements are too noisy to be conclusive here).
      
      Measured on actual full decoding, it makes decoding of 10 bpc
      Chimera around maybe 1% slower on an Apple M1 - close to measurement
      noise anyway.
      345127a7
    • Henrik Gramner's avatar
      9c74a9b0
    • Henrik Gramner's avatar
      x86: Fix overflows in 12bpc AVX2 DC-only IDCT · 49b1c3c5
      Henrik Gramner authored
      Using smaller immediates also results in a small code size reduction in
      some cases, so apply those changes to the (10bpc-only) SSE code as well.
      49b1c3c5
    • Henrik Gramner's avatar
      x86: Fix clipping in high bit-depth AVX2 4x16 IDCT · 0c8a3461
      Henrik Gramner authored
      Certain clips were incorrectly performed on negated values, which
      caused things to be off-by-one in both directions. Correct this by
      negating such values prior to clipping instead of afterwards.
      0c8a3461
  5. Sep 15, 2022
    • Martin Storsjö's avatar
      Don't use gas-preprocessor with clang-cl for arm targets · cc9651f5
      Martin Storsjö authored
      Since meson 0.58.0 (released in May 2021), meson accepts adding '.S'
      assembly files as source files to the clang-cl compiler.
      
      If using an older version of meson, keep using gas-preprocessor
      just like for MSVC builds.
      cc9651f5
    • David Conrad's avatar
      Fix checking the reference dimesions for the projection process · d4a2b75d
      David Conrad authored
      Section 7.9.2 returns 0 "If RefMiRows[ srcIdx ] is not equal to MiRows,
      RefMiCols[ srcIdx ] is not equal to MiCols"
      
      dav1d was comparing pixel width/height, not block width/height,
      so conform with the spec
      d4a2b75d
    • David Conrad's avatar
      Fix calculation of OBMC lap dimensions · eb25f00c
      David Conrad authored
      Individual OBMC lapped predictions have a max width of 64 pixels
      for the top lap and have a max height of 64 for the left laps
      
      This is 7.11.3.9. Overlapped motion compensation process
      step4 = Clip3( 2, 16, Num_4x4_Blocks_Wide[ candSz ] )
      
      dav1d wasn't clipping this as needed, which means that with scaled MC, the
      interpolation of the 2nd half of a 128 block was incorrect, since mx/my
      for subpel filter selection need to be reset at the 64 pixel boundary
      eb25f00c
    • David Conrad's avatar
      Support film grain application whose only effect is clipping to video range · 10f5ce54
      David Conrad authored
      This is the parameter combination:
      num_y_points == 0 && num_cb_points == 0 && num_cr_points == 0 &&
      chroma_scaling_from_luma == 1 && clip_to_restricted_range == 1
      
      Film grain application has two effects: adding noise, and optionally
      clipping to video range
      
      For luma, the spec skips film grain application if there's no noise
      (num_y_points == 0), but for chroma, it's only skipped if there's no
      chroma noise *and* chroma_scaling_from_luma is false
      
      This means it's possible for there to be no noise (num_*_points = 0), but
      if clip_to_restricted_range is true then chroma pixels can be clipped to
      video range, if chroma_scaling_from_luma is true. Luma pixels, however,
      aren't clipped to video range unless there's noise to apply.
      dav1d currently skips applying film grain entirely if there is no noise,
      regardless of the secondary clipping.
      10f5ce54
    • David Conrad's avatar
      Ignore T.35 metadata if the OBU contains no payload · 673ee248
      David Conrad authored
      The syntax of itu_t_t35_payload_bytes is not defined in the AV1
      specification, but it does state that decoders should ignore the
      entire OBU if they do not understand it.
      673ee248
    • David Conrad's avatar
      Fix chroma deblock filter size calculation for lossless · 2152826b
      David Conrad authored
      In section 5.11.34 txSz is always defined to TX_4X4 if Lossless is true
      
      Chroma deblock filter size calculation needs to use this overridden txSz
      when lossless is enabled
      2152826b
    • David Conrad's avatar
      Fix rounding in the calculation of initialSubpelX · e202fa08
      David Conrad authored
      The spec divides err by two, rounding to 0, instead of >>1,
      which rounds towards negative infinity
      e202fa08
    • David Conrad's avatar
      Fix overflow when saturating dequantized coefficients clipped to 0 · ee98592b
      David Conrad authored
      It's possible to encode a large coefficient that becomes 0 after
      the clipping in dequant (Abs( dq ) & 0xFFFFFF), e.g. 0x1000000
      After that &0xFFFFFF, coeffs are saturated in the range of
      [-(1 << (bitdepth+7)), 1 << (bitdepth+7))
      
      dav1d implements this saturation via umin(dq - sign, cf_max), then applies
      the sign afterwards via xor. However, for dq = 0 and sign = 1, this step
      evaulates to umin(UINT_MAX, cf_max) == cf_max instead of the expected 0.
      
      So instead, do unsigned saturate as umin(dq, cf_max + sign),
      then apply sign via (sign ? -dq : dq)
      On arm this is the same number of instructions, since cneg exists and is used
      On x86 this requires an additional instruction, but this isn't a
      latency-critical path
      ee98592b
    • David Conrad's avatar
      Fix overflow in 8-bit NEON ADST · 1bdb776c
      David Conrad authored
      In 8-bit adst, it's possible that the final Round2(x[0], 12) can exceed
      16-bits signed
      
      Specifically, in 7.13.2.6. Inverse ADST4 process, the precision requirement is:
      "It is a requirement of bitstream conformance that all values stored in the
      s and x arrays by this process are representable by a signed integer using
      r + 12 bits of precision."
      
      For 8 bits, r is 16 for both row and column, so x[] can be 28-bit signed.
      For values [134215680, 134217727] (within 2047 of the maximum 28-bit value),
      the final Round2(x[0], 12) evaluates to 32768, exceeding 16-bits signed.
      
      So switch to using sqrshrn, which saturates to 16-bits signed
      
      This is a continuation of: Commit b53ff29d
      arm: itx: Do clipping in all narrowing downshifts
      1bdb776c
  6. Sep 14, 2022
    • Martin Storsjö's avatar
      tools: Allocate the priv structs with proper alignment · 08c70801
      Martin Storsjö authored
      Previously, they could be allocated with any random alignment
      matching the end of the MuxerContext/DemuxerContext. The
      priv structs themselves can have members that require specific
      alignment, or at least the default alignment of malloc()/calloc()
      (which is sufficient for native types such as uint64_t and
      doubles).
      
      This fixes crashes in some arm builds, where GCC (correctly) wants
      to use 64 bit aligned stores to write to MD5Context.
      08c70801
  7. Sep 12, 2022
  8. Sep 10, 2022
  9. Sep 09, 2022
  10. Sep 08, 2022
  11. Sep 07, 2022
  12. Sep 02, 2022
  13. Aug 30, 2022
  14. Aug 19, 2022
  15. Jul 25, 2022
    • Henrik Gramner's avatar
      Adjust inlining attributes on some functions · a029d689
      Henrik Gramner authored
      The code size increase of inlining every call to certain functions
      isn't a worthwhile trade-off, and most compilers actually ends up
      overriding those particular inlining hints anyway.
      
      In some cases it's also better to split the function into separate
      luma and chroma functions.
      a029d689
  16. Jul 19, 2022
Loading