- Jul 22, 2020
-
-
Niklas Haas authored
This is required to support GLSL ES 1.0 and GLSL 110, which forbid the use of literal arrays in shaders. Since SH_LUT_LITERAL is now no longer a safe fallback, we instead always fallback to SH_LUT_UNIFORM. This is technically an API break, since in the past, the naked pl_shader API would always generate literal shaders, but now they may have arrays attached as uniforms - to prevent this, users can still set small LUT sizes (which is what e.g. VLC does anyway)
-
Niklas Haas authored
This pretty much only really affects the polar sampling code, which uses a small linear LUT. I found that the performance gain depends on whether or not we're using compute shaders, with the non-compute shader path being the only one to really benefit from this change.
-
Niklas Haas authored
By providing fallback code to linearly interpolate between array values on the GPU. The motivating use case here is not just a concern of semantics/correctness, but more importantly, because doing so might actually be faster than going through a texture sample, for small LUTs.
-
Niklas Haas authored
This allows such tex transfers to avoid an extra memcpy in most cases, except where the pointer happens to be horrifically misaligned with respect to the texel size - but in these cases, the alignment-fixing memcpy will happen inside VRAM (PL_BUF_MEM_DEVICE), which should still be faster than doing an extra memcpy in RAM. Also, I realized it makes no sense to have tex_download_pbo use a buffer pool at all, because it's synchronous anyway - there can only ever be one buffer. And doing it this way avoids code duplication between the import branch and the non-import branch. Side note: We could do the same for pl_tex_upload_pbo with the same justification, but I decided to test the waters with this commit first.
-
Niklas Haas authored
This allows us to bypass the page-alignment restriction on host pointer imports, by simply sufficiently extending the host pointer base, the buffer offset, and the memory size in the respective direction. Thus ensuring that our memory import is always page-aligned. This *should* technically be safe, because the MMU can only enforce virtual memory access safety on a per-page granularity, and our code should never end up reading outside the bounds of a vk_memslice. But on the other hand, what we're doing is absolutely insane. Beware nasal demons. I only wrote this logic because I enjoy sharing an address space with a malevolent agent of chaos. As an aside, also fix some errors related to imported buffer size calculation and alignment validation that I noticed along the way.
-
Niklas Haas authored
This is intended for stuff like probing functions, to avoid generating bogus error messages. We directly make use of this function to clean up the format probing code, which is notoriously prone to generating error spam.
-
Niklas Haas authored
Mostly so I can test the improvements that leveraging host-mapped pointers will give us.
-
- Jul 19, 2020
-
-
Niklas Haas authored
To prevent logic errors when overflowing e.g. the BT.2390 function, and also make functions behave more predictably on overflow in general. This ensures no function will ever see something larger than sig_peak. Requires changes to `clip` and `linear` to make them work properly again.
-
- Jul 16, 2020
-
-
Niklas Haas authored
These are not included as part of the GCC visibility pragma.
-
- Jul 15, 2020
-
-
Niklas Haas authored
A hilariously awkward mix-up. This probably meant the ICC profile code never actually produced correct results in practice unless the two color spaces happened to coincide. Good thing nobody used it in production yet (tm) Fixes https://github.com/haasn/libplacebo/issues/82
-
- Jul 14, 2020
-
-
Niklas Haas authored
This is still a pretty bad hack-patch as of currently, because no driver actually implements the drm format modifier extension. But this way of doing it at least allows us to differentiate between linear and non-linear, which we assume (blindly) is equal to optimal, and is needed to get vaapi hwdec working on AMD. We also get rid of the plane offset check because this also conflicts with the requirements of drm format modifiers, which we again can't respect properly. We already suppress validation errors for the image bind, and it works in practice.
-
- Jul 13, 2020
-
-
Niklas Haas authored
For consistency, and because these technically serve a useful purpose (e.g. allowing static array sizing or bounds checks).
-
Niklas Haas authored
I was growing unhappy by the use of the non-explanatory, confusing and misleading 'TV' and 'PC' enum names. Replace them by the more descriptive terms 'LIMITED' and 'FULL', respectively. No API bump because this is not a breaking change, as the old enum names are still defined.
-
- Jul 12, 2020
-
-
Niklas Haas authored
1. VkBuffer sharing mode doesn't actually affect anything in real-world drivers (e.g. RADV, ANV, AMDVLK). 2. VkBuffers are not part of the interop API so we don't care about having to communicate this to the user. 3. Having to somehow transition all buffers would be a pain anyway
-
- Jul 11, 2020
-
-
Niklas Haas authored
Fixes #85
-
- Jul 09, 2020
-
-
Niklas Haas authored
This combines the function with the previously hidden pl_opengl_wrap_fb, allowing users to either provide their own framebuffers (in addition to the texture) or just wrap a plain framebuffer directly. In addition to merging these two functions, we also significantly overhaul the `gl_fb_query` function for inferring `pl_fmt` details from an opaque framebuffer. In particular, our wrapped framebuffers can now support PL_FMT_CAP_HOST_READABLE. Closes https://github.com/haasn/libplacebo/issues/81
-
Niklas Haas authored
-
- Jul 06, 2020
-
-
Niklas Haas authored
With the recent series of refactors to the vulkan malloc layer, host-visible device-local memory types exist and are allocatable, so we can directly serve host-readable uniform buffers. For the scenarios in which it's not possible, working around it should probably be done inside the pl_gpu, not the application code. (i.e. 'host visibility emulation')
-
Niklas Haas authored
Now that we support the existence of 'optimal' memory type properties, we can make device-local memory be the 'optimal' type by default. We can also split up `host_mapped` into scenarios where it's required and scenarios where it's merely recommended.
-
Niklas Haas authored
Imported noncoherent memory is not implicitly invalidated.
-
Niklas Haas authored
1. Log the proper pointer on unimport 2. Add missing test case
-
Dedicated allocations are ones where memory is allocated with a single image or buffer specified at allocation time, and only that buffer or image can be bound to the memory. Our first use-case for supporting it is to handle importing dma_bufs on AMD hardware, where the driver says dedicated allocations are required. I've tested this on Intel hardware, which doesn't require dedicated allocations, but works fine if you force them. Modified-by:
Niklas Haas <git@haasn.xyz> Rebased on top of the vulkan malloc API refactor, and also added support for allocating dedicated slabs directly - which allows us to also allocate dedicated memory for images which advertise preferring dedicated allocations. Finally, add some extra verification. Closes: !72
-
Niklas Haas authored
Major refactor, accomplishing the following: - group args into a params struct - unified API for importing, generic and buffers - move buffer importing boilerplate to the malloc layer - split up the property flags into required and optimal properties - better memory type scoring - enforce heap size when allocating large slabs - fix some buggy checks for optionally visible/coherent memory And probably more that I'm forgetting.
-
Niklas Haas authored
Uncached reads are extremely slow, but for small buffers it shouldn't matter, since they're only used to readback small bits of state information and other non bandwidth-sensitive things.
-
Niklas Haas authored
Parallel tests make errors much more confusing and hard to find.
-
Niklas Haas authored
Prefix the require() failure case to let me ctrl+f for them.
-
Niklas Haas authored
A lot of these fields were either redundant, too aggressively checked, not checked aggressively enough, or simply leftovers. Clean up this logic and bring it into the (hopefully) intended form.
-
Niklas Haas authored
This can end up comparing undefined memory regions, because unpadded areas of structs may not be initialized with anything particular.
-
- Jul 05, 2020
-
-
Niklas Haas authored
Avoid uninitialized average[2]
-
Niklas Haas authored
This is needed primarily because some OpenGL/Vulkan drivers will end up calling the pl_msg functions from random threads. Prevent any issues when doing this by making sure these functions are thread safe.
-
- Jul 01, 2020
-
-
Niklas Haas authored
ffs() and close() don't exist on non-UNIX. Fix close() by ifdefing it out, and replace ffs() by __builtin_ffsll(), similar to what the rest of the code is doing already. This also fixes potential overflow issues due to the implicit downcast from uint32_t to int. Fixes https://github.com/m-ab-s/media-autobuild_suite/issues/1728
-
Niklas Haas authored
These require VkExternalMemorybufferCreateInfoKHR, same as when exporting buffers.
-
Niklas Haas authored
VkResult in particular can return errors defined by extensions, like VK_ERROR_INVALID_EXTERNAL_HANDLE. These are included as enum extension in the registry. As a side note, replace `findall` by `iterfind` for performance.
-
Niklas Haas authored
This alignment requirement also needs to be checked for the allocationSize. Improve error reporting while we're at it.
-
Niklas Haas authored
Because of annoying API dependency issues (vertex buffers depend on pl_buf which depends on pl_buf_read/pl_tex_upload, which depends on PBO support), we can't use pl_pass_run_vbo for OpenGL, so instead we have to manually create and update the vertex buffer. (Not a huge deal since this is what the code did previously) It also means we miss out on vertex buffer reuse for OpenGL, but oh well. I guess the driver could already be doing this internally.
-
Niklas Haas authored
The whole `pl_buf_poll` / "different type of operation" thing is no longer really true.
-
Niklas Haas authored
To prevent users from e.g. trying to import uniform buffers with illegal alignment for uniform usage. To not over-verify, we split up `align` into two separate fields for optimal and mandatory alignment.
-
Niklas Haas authored
We can drop the texel alignment requirement using the same `unaligned` fallback code that already exists. Furthermore, I have no idea where the "multiple of 4" check came from. I can't find any reference to this being required for a VkBufferImageCopy in the vulkan spec.
-
Niklas Haas authored
This is a major change in operation and means all buffers now effectively become generic "buffers", with individual usage flags controlling what can and can't be done, similar to how `pl_tex` works. Rather than introducing a usage flag for buffer<->texture copies, we introduce a GPU-wide capability instead, and assume all buffers can be used for buffer<->texture copies. This maps more cleanly to what graphics APIs actually support, and mirrors the fact that we don't require any special usage flag for buffer<->buffer copies. Involves quite a lot of annoying refactoring, but I did the change in a way that should hopefully be very backwards compatible and result in no major degradation in performance or breaking change in logic. Notably, this commit also introduces parts of the public API allowing for use of arbitrary buffers as vertex buffers, but to avoid making this commit too big, this isn't actually exposed in the `pl_pass` API yet. The `drawable` field currently only exists for internal use.
-
Niklas Haas authored
Merge a few redundant functions into one API, and also avoid unnecessary duplication of the log level everywhere.
-