Skip to content
Snippets Groups Projects
  1. May 20, 2020
    • Niklas Haas's avatar
      dav1dplay: don't freeze on render errors · df40d36d
      Niklas Haas authored
      Returning out of this function when pl_render_image() fails is the wrong
      thing to do, since that leaves the swapchain frame acquired but never
      submitted. Instead, just clear the target FBO to blank red (to make it
      clear that something went wrong) and continue on with presentation.
      0.7.0
      df40d36d
  2. May 19, 2020
  3. May 18, 2020
    • Niklas Haas's avatar
      dav1dplay: support on-GPU film grain synthesis · cbe05cf4
      Niklas Haas authored
      Annoying minor differences in this struct layout mean we can't just
      memcpy the entire thing. Oh well.
      
      Note: technically, PL_API_VER 33 added this API, but PL_API_VER 63 is
      the minimum version of libplacebo that doesn't have glaring bugs when
      generating chroma grain, so we require that as a minimum instead.
      
      (I tested this version on some 4:2:2 and 4:2:0, 8-bit and 10-bit grain
      samples I had lying around and made sure the output was identical up to
      differences in rounding / dithering.)
      cbe05cf4
    • Niklas Haas's avatar
      dav1dplay: handle all supported csps/reprs/bitdepths · 7bbebdb4
      Niklas Haas authored
      Generalize the code to set the right pl_image metadata based on the
      values signaled in the Dav1dPictureParameters / Dav1dSequenceHeader.
      
      Some values are not mapped, in which case stdout will be spammed.
      Whatever. Hopefully somebody sees that error spam and opens a bug report
      for libplacebo to implement it.
      7bbebdb4
    • Niklas Haas's avatar
      dav1dplay: move and simplify pl_image generation · f01fd0f1
      Niklas Haas authored
      Having the pl_image generation live in upload_planes() rather than
      render() will make it easier to set the correct pl_image metadata based
      on the Dav1dPicture headers moving forwards. Rename the function to make
      more sense, semantically.
      
      Reduce some code duplication by turning per-plane fields into arrays
      wherever appropriate.
      
      As an aside, also apply the correct chroma location rather than
      hard-coding it as PL_CHROMA_LEFT.
      f01fd0f1
    • Niklas Haas's avatar
      dav1dplay: don't write directly to iparams.extensions · 3bb0aed1
      Niklas Haas authored
      This is turned into a const array in upstream libplacebo, which
      generates warnings due to the implicit cast. Rewrite the code to have
      the mutable array live inside a separate variable `extensions` and only
      set `iparams.extensions` to this, rather than directly manipulating it.
      3bb0aed1
  4. May 16, 2020
  5. May 15, 2020
  6. May 14, 2020
  7. May 13, 2020
  8. May 12, 2020
  9. May 11, 2020
  10. May 10, 2020
    • Henrik Gramner's avatar
      x86: Use 'test' instead of 'or' to compare with zero · 4d97f5a9
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      Allows for macro-op fusion.
      4d97f5a9
    • Henrik Gramner's avatar
      x86: Unconditionally compile msac_init.c · 28d33357
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      Eliminates the x86-64 check from the meson configuration file to be
      consistent with how other x86-64-exclusive code is handled.
      28d33357
    • Henrik Gramner's avatar
      x86-64: Do msac refill before calling dav1d_msac_init_x86() · 6a6c3528
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      Allows for constant propagation and tail call elimination in the
      msac initialization, which is performed in each tile.
      6a6c3528
    • Henrik Gramner's avatar
      msac: Avoid attempting to refill after eob has already been reached · 631d7720
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      Utilize the unsigned representation of a signed integer to skip
      the refill code if the count was already negative to begin with,
      which saves a few clock cycles at the end of each tile.
      631d7720
    • Martin Storsjö's avatar
      arm64: itx: Add NEON implementation of itx for 10 bpc · eaedb95d
      Martin Storsjö authored
      Add an element size specifier to the existing individual transform
      functions for 8 bpc, naming them e.g. inv_dct_8h_x8_neon, to clarify
      that they operate on input vectors of 8h, and make the symbols
      public, to let the 10 bpc case call them from a different object file.
      The same convention is used in the new itx16.S, like inv_dct_4s_x8_neon.
      
      Make the existing itx.S compiled regardless of whether 8 bpc support
      is enabled. For builds with 8 bpc support disabled, this does include
      the unused frontend functions though, but this is hopefully tolerable
      to avoid having to split the file into a sharable file for transforms
      and a separate one for frontends.
      
      This only implements the 10 bpc case, as that case can use transforms
      operating on 16 bit coefficients in the second pass.
      
      Relative speedup vs C for a few functions:
      
                                           Cortex A53    A72    A73
      inv_txfm_add_4x4_dct_dct_0_10bpc_neon:     4.14   4.06   4.49
      inv_txfm_add_4x4_dct_dct_1_10bpc_neon:     6.51   6.49   6.42
      inv_txfm_add_8x8_dct_dct_0_10bpc_neon:     5.02   4.63   6.23
      inv_txfm_add_8x8_dct_dct_1_10bpc_neon:     8.54   7.13  11.96
      inv_txfm_add_16x16_dct_dct_0_10bpc_neon:   5.52   6.60   8.03
      inv_txfm_add_16x16_dct_dct_1_10bpc_neon:  11.27   9.62  12.22
      inv_txfm_add_16x16_dct_dct_2_10bpc_neon:   9.60   6.97   8.59
      inv_txfm_add_32x32_dct_dct_0_10bpc_neon:   2.60   3.48   3.19
      inv_txfm_add_32x32_dct_dct_1_10bpc_neon:  14.65  12.64  16.86
      inv_txfm_add_32x32_dct_dct_2_10bpc_neon:  11.57   8.80  12.68
      inv_txfm_add_32x32_dct_dct_3_10bpc_neon:   8.79   8.00   9.21
      inv_txfm_add_32x32_dct_dct_4_10bpc_neon:   7.58   6.21   7.80
      inv_txfm_add_64x64_dct_dct_0_10bpc_neon:   2.41   2.85   2.75
      inv_txfm_add_64x64_dct_dct_1_10bpc_neon:  12.91  10.27  12.24
      inv_txfm_add_64x64_dct_dct_2_10bpc_neon:  10.96   7.97  10.31
      inv_txfm_add_64x64_dct_dct_3_10bpc_neon:   8.95   7.42   9.55
      inv_txfm_add_64x64_dct_dct_4_10bpc_neon:   7.97   6.12   7.82
      eaedb95d
    • Martin Storsjö's avatar
      arm: Mark global symbols hidden · ff3054fe
      Martin Storsjö authored
      This matches what is done in C by -fvisibility=hidden.
      
      This avoids issues with relocations against other symbols exported
      from another assembly file.
      ff3054fe
    • Martin Storsjö's avatar
      arm64: itx: Prepare for other bitdepths · d4002c88
      Martin Storsjö authored
      d4002c88
    • Martin Storsjö's avatar
Loading