Skip to content
Snippets Groups Projects
  1. Jul 22, 2020
    • Niklas Haas's avatar
      shaders: revise sh_lut method logic · ffd4f666
      Niklas Haas authored
      This is required to support GLSL ES 1.0 and GLSL 110, which forbid the
      use of literal arrays in shaders. Since SH_LUT_LITERAL is now no longer
      a safe fallback, we instead always fallback to SH_LUT_UNIFORM.
      
      This is technically an API break, since in the past, the naked pl_shader
      API would always generate literal shaders, but now they may have arrays
      attached as uniforms - to prevent this, users can still set small LUT
      sizes (which is what e.g. VLC does anyway)
      ffd4f666
    • Niklas Haas's avatar
      shaders: prefer SH_LUT_LITERAL for small linear LUTs · 44a80e80
      Niklas Haas authored
      This pretty much only really affects the polar sampling code, which uses
      a small linear LUT. I found that the performance gain depends on whether
      or not we're using compute shaders, with the non-compute shader path
      being the only one to really benefit from this change.
      44a80e80
    • Niklas Haas's avatar
      shaders: remove SH_LUT_LINEAR, make a bool instead · b4d96813
      Niklas Haas authored
      By providing fallback code to linearly interpolate between array values
      on the GPU. The motivating use case here is not just a concern of
      semantics/correctness, but more importantly, because doing so might
      actually be faster than going through a texture sample, for small LUTs.
      b4d96813
    • Niklas Haas's avatar
      gpu: use host-imported pointers for pl_tex_download_pbo · 2e04963f
      Niklas Haas authored
      This allows such tex transfers to avoid an extra memcpy in most cases,
      except where the pointer happens to be horrifically misaligned with
      respect to the texel size - but in these cases, the alignment-fixing
      memcpy will happen inside VRAM (PL_BUF_MEM_DEVICE), which should still
      be faster than doing an extra memcpy in RAM.
      
      Also, I realized it makes no sense to have tex_download_pbo use a buffer
      pool at all, because it's synchronous anyway - there can only ever be
      one buffer. And doing it this way avoids code duplication between the
      import branch and the non-import branch.
      
      Side note: We could do the same for pl_tex_upload_pbo with the same
      justification, but I decided to test the waters with this commit first.
      2e04963f
    • Niklas Haas's avatar
      vulkan/malloc: round imported pointers to page boundaries · ae9f4166
      Niklas Haas authored
      This allows us to bypass the page-alignment restriction on host pointer
      imports, by simply sufficiently extending the host pointer base, the
      buffer offset, and the memory size in the respective direction. Thus
      ensuring that our memory import is always page-aligned.
      
      This *should* technically be safe, because the MMU can only enforce
      virtual memory access safety on a per-page granularity, and our code
      should never end up reading outside the bounds of a vk_memslice. But on
      the other hand, what we're doing is absolutely insane. Beware nasal
      demons. I only wrote this logic because I enjoy sharing an address space
      with a malevolent agent of chaos.
      
      As an aside, also fix some errors related to imported buffer size
      calculation and alignment validation that I noticed along the way.
      ae9f4166
    • Niklas Haas's avatar
      context: add ability to temporarily cap log verbosity · a02084cc
      Niklas Haas authored
      This is intended for stuff like probing functions, to avoid generating
      bogus error messages. We directly make use of this function to clean up
      the format probing code, which is notoriously prone to generating error
      spam.
      a02084cc
    • Niklas Haas's avatar
      tests/bench: add pl_tex transfer benchmark · c20b0eb4
      Niklas Haas authored
      Mostly so I can test the improvements that leveraging host-mapped
      pointers will give us.
      c20b0eb4
  2. Jul 19, 2020
    • Niklas Haas's avatar
      shaders/colorspace: clip before tone-mapping functions · c8bfe345
      Niklas Haas authored
      To prevent logic errors when overflowing e.g. the BT.2390 function, and
      also make functions behave more predictably on overflow in general.
      
      This ensures no function will ever see something larger than sig_peak.
      Requires changes to `clip` and `linear` to make them work properly
      again.
      c8bfe345
  3. Jul 16, 2020
  4. Jul 15, 2020
  5. Jul 14, 2020
    • Niklas Haas's avatar
      gpu: add preliminary API support for DRM format modifiers · 025c5dcb
      Niklas Haas authored
      This is still a pretty bad hack-patch as of currently, because no driver
      actually implements the drm format modifier extension. But this way of
      doing it at least allows us to differentiate between linear and
      non-linear, which we assume (blindly) is equal to optimal, and is needed
      to get vaapi hwdec working on AMD.
      
      We also get rid of the plane offset check because this also conflicts
      with the requirements of drm format modifiers, which we again can't
      respect properly. We already suppress validation errors for the image
      bind, and it works in practice.
      025c5dcb
  6. Jul 13, 2020
    • Niklas Haas's avatar
      include: add _COUNT members to all public enums · 5e517936
      Niklas Haas authored
      For consistency, and because these technically serve a useful purpose
      (e.g. allowing static array sizing or bounds checks).
      5e517936
    • Niklas Haas's avatar
      colorspace: rename pl_color_levels · e4c03d0f
      Niklas Haas authored
      I was growing unhappy by the use of the non-explanatory, confusing and
      misleading 'TV' and 'PC' enum names. Replace them by the more
      descriptive terms 'LIMITED' and 'FULL', respectively.
      
      No API bump because this is not a breaking change, as the old enum names
      are still defined.
      e4c03d0f
  7. Jul 12, 2020
    • Niklas Haas's avatar
      vulkan: remove FIXME comments on buffer sharing mode · 996e2b58
      Niklas Haas authored
      1. VkBuffer sharing mode doesn't actually affect anything in real-world
         drivers (e.g. RADV, ANV, AMDVLK).
      2. VkBuffers are not part of the interop API so we don't care about
         having to communicate this to the user.
      3. Having to somehow transition all buffers would be a pain anyway
      996e2b58
  8. Jul 11, 2020
  9. Jul 09, 2020
    • Niklas Haas's avatar
      opengl: refactor pl_opengl_wrap · 4a5ce5bc
      Niklas Haas authored
      This combines the function with the previously hidden pl_opengl_wrap_fb,
      allowing users to either provide their own framebuffers (in addition to
      the texture) or just wrap a plain framebuffer directly.
      
      In addition to merging these two functions, we also significantly
      overhaul the `gl_fb_query` function for inferring `pl_fmt` details from
      an opaque framebuffer. In particular, our wrapped framebuffers can now
      support PL_FMT_CAP_HOST_READABLE.
      
      Closes https://github.com/haasn/libplacebo/issues/81
      4a5ce5bc
    • Niklas Haas's avatar
      opengl: fix typo in comment · 80e862b1
      Niklas Haas authored
      80e862b1
  10. Jul 06, 2020
    • Niklas Haas's avatar
      shaders/colorspace: read detected peak directly from ssbo · b48c81cb
      Niklas Haas authored
      With the recent series of refactors to the vulkan malloc layer,
      host-visible device-local memory types exist and are allocatable, so we
      can directly serve host-readable uniform buffers.
      
      For the scenarios in which it's not possible, working around it should
      probably be done inside the pl_gpu, not the application code. (i.e.
      'host visibility emulation')
      b48c81cb
    • Niklas Haas's avatar
      vulkan: slightly revise buffer requirements/placement logic · 055cdc0a
      Niklas Haas authored
      Now that we support the existence of 'optimal' memory type properties,
      we can make device-local memory be the 'optimal' type by default. We can
      also split up `host_mapped` into scenarios where it's required and
      scenarios where it's merely recommended.
      055cdc0a
    • Niklas Haas's avatar
      vulkan/malloc: invalidate mapped noncoherent memory · 2f2ba1a6
      Niklas Haas authored
      Imported noncoherent memory is not implicitly invalidated.
      2f2ba1a6
    • Niklas Haas's avatar
      vulkan/malloc: misc fixes related to host pointer import · 870cb541
      Niklas Haas authored
      1. Log the proper pointer on unimport
      2. Add missing test case
      870cb541
    • Philip Langdale's avatar
      vulkan: implement support for dedicated imported allocations · d5b23f61
      Philip Langdale authored and Niklas Haas's avatar Niklas Haas committed
      
      Dedicated allocations are ones where memory is allocated with
      a single image or buffer specified at allocation time, and only
      that buffer or image can be bound to the memory.
      
      Our first use-case for supporting it is to handle importing dma_bufs
      on AMD hardware, where the driver says dedicated allocations are
      required.
      
      I've tested this on Intel hardware, which doesn't require dedicated
      allocations, but works fine if you force them.
      
      Modified-by: default avatarNiklas Haas <git@haasn.xyz>
      
      Rebased on top of the vulkan malloc API refactor, and also added support
      for allocating dedicated slabs directly - which allows us to also
      allocate dedicated memory for images which advertise preferring
      dedicated allocations. Finally, add some extra verification.
      
      Closes: !72
      d5b23f61
    • Niklas Haas's avatar
      vulkan/malloc: major API refactor · aea6f237
      Niklas Haas authored
      Major refactor, accomplishing the following:
      
      - group args into a params struct
      - unified API for importing, generic and buffers
      - move buffer importing boilerplate to the malloc layer
      - split up the property flags into required and optimal properties
      - better memory type scoring
      - enforce heap size when allocating large slabs
      - fix some buggy checks for optionally visible/coherent memory
      
      And probably more that I'm forgetting.
      aea6f237
    • Niklas Haas's avatar
      vulkan/malloc: only require host-cached memory for large buffers · 20014f11
      Niklas Haas authored
      Uncached reads are extremely slow, but for small buffers it shouldn't
      matter, since they're only used to readback small bits of state
      information and other non bandwidth-sensitive things.
      20014f11
    • Niklas Haas's avatar
      ci: disable parallel testing · 41bb87db
      Niklas Haas authored
      Parallel tests make errors much more confusing and hard to find.
      41bb87db
    • Niklas Haas's avatar
      tests: make errors more findable · 7176bf1f
      Niklas Haas authored
      Prefix the require() failure case to let me ctrl+f for them.
      7176bf1f
    • Niklas Haas's avatar
      shaders/av1: overhaul and fix grain reusability test · 5cc2e2a4
      Niklas Haas authored
      A lot of these fields were either redundant, too aggressively checked,
      not checked aggressively enough, or simply leftovers.
      
      Clean up this logic and bring it into the (hopefully) intended form.
      5cc2e2a4
    • Niklas Haas's avatar
      shaders/av1: avoid memcmp() on padded structs · ab8bd2f1
      Niklas Haas authored
      This can end up comparing undefined memory regions, because unpadded
      areas of structs may not be initialized with anything particular.
      ab8bd2f1
  11. Jul 05, 2020
  12. Jul 01, 2020
    • Niklas Haas's avatar
      vulkan: fix build on non-UNIX platforms · 175ac74d
      Niklas Haas authored
      ffs() and close() don't exist on non-UNIX. Fix close() by ifdefing it
      out, and replace ffs() by __builtin_ffsll(), similar to what the rest of
      the code is doing already. This also fixes potential overflow issues due
      to the implicit downcast from uint32_t to int.
      
      Fixes https://github.com/m-ab-s/media-autobuild_suite/issues/1728
      175ac74d
    • Niklas Haas's avatar
      vulkan: correctly create imported memory buffers · 249690e1
      Niklas Haas authored
      These require VkExternalMemorybufferCreateInfoKHR, same as when
      exporting buffers.
      249690e1
    • Niklas Haas's avatar
      vulkan: also generate boilerplate for extended enums · 8f36b8a7
      Niklas Haas authored
      VkResult in particular can return errors defined by extensions, like
      VK_ERROR_INVALID_EXTERNAL_HANDLE. These are included as enum extension
      in the registry.
      
      As a side note, replace `findall` by `iterfind` for performance.
      8f36b8a7
    • Niklas Haas's avatar
      vulkan: fix host pointer import alignment checks · 2fdeb140
      Niklas Haas authored
      This alignment requirement also needs to be checked for the
      allocationSize. Improve error reporting while we're at it.
      2fdeb140
    • Niklas Haas's avatar
      gpu: add support for generic vertex buffers · 596de21c
      Niklas Haas authored
      Because of annoying API dependency issues (vertex buffers depend on
      pl_buf which depends on pl_buf_read/pl_tex_upload, which depends on PBO
      support), we can't use pl_pass_run_vbo for OpenGL, so instead we have to
      manually create and update the vertex buffer. (Not a huge deal since
      this is what the code did previously)
      
      It also means we miss out on vertex buffer reuse for OpenGL, but oh
      well. I guess the driver could already be doing this internally.
      596de21c
    • Niklas Haas's avatar
      gpu: remove outdated comment · 181638d7
      Niklas Haas authored
      The whole `pl_buf_poll` / "different type of operation" thing is no
      longer really true.
      181638d7
    • Niklas Haas's avatar
      vulkan: enforce alignment requirements on imported buffers · a7031878
      Niklas Haas authored
      To prevent users from e.g. trying to import uniform buffers with illegal
      alignment for uniform usage.
      
      To not over-verify, we split up `align` into two separate fields for
      optimal and mandatory alignment.
      a7031878
    • Niklas Haas's avatar
      gpu: relax buffer offset alignment restrictions · 0cfb1a10
      Niklas Haas authored
      We can drop the texel alignment requirement using the same `unaligned`
      fallback code that already exists. Furthermore, I have no idea where the
      "multiple of 4" check came from. I can't find any reference to this
      being required for a VkBufferImageCopy in the vulkan spec.
      0cfb1a10
    • Niklas Haas's avatar
      gpu: nuke pl_buf_params.type and make all buffers generic · 2cf48cdd
      Niklas Haas authored
      This is a major change in operation and means all buffers now
      effectively become generic "buffers", with individual usage flags
      controlling what can and can't be done, similar to how `pl_tex` works.
      
      Rather than introducing a usage flag for buffer<->texture copies, we
      introduce a GPU-wide capability instead, and assume all buffers can be
      used for buffer<->texture copies. This maps more cleanly to what
      graphics APIs actually support, and mirrors the fact that we don't
      require any special usage flag for buffer<->buffer copies.
      
      Involves quite a lot of annoying refactoring, but I did the change in a
      way that should hopefully be very backwards compatible and result in no
      major degradation in performance or breaking change in logic.
      
      Notably, this commit also introduces parts of the public API allowing
      for use of arbitrary buffers as vertex buffers, but to avoid making this
      commit too big, this isn't actually exposed in the `pl_pass` API yet.
      The `drawable` field currently only exists for internal use.
      2cf48cdd
    • Niklas Haas's avatar
      gpu: minor code refactoring · a5cc3e08
      Niklas Haas authored
      Merge a few redundant functions into one API, and also avoid
      unnecessary duplication of the log level everywhere.
      a5cc3e08
Loading