Commits · master · Vasilis Liaskovitis / libplacebo

Jul 22, 2020

shaders: revise sh_lut method logic · ffd4f666

Niklas Haas authored 4 years ago

This is required to support GLSL ES 1.0 and GLSL 110, which forbid the
use of literal arrays in shaders. Since SH_LUT_LITERAL is now no longer
a safe fallback, we instead always fallback to SH_LUT_UNIFORM.

This is technically an API break, since in the past, the naked pl_shader
API would always generate literal shaders, but now they may have arrays
attached as uniforms - to prevent this, users can still set small LUT
sizes (which is what e.g. VLC does anyway)

ffd4f666

shaders: prefer SH_LUT_LITERAL for small linear LUTs · 44a80e80

Niklas Haas authored 4 years ago

This pretty much only really affects the polar sampling code, which uses
a small linear LUT. I found that the performance gain depends on whether
or not we're using compute shaders, with the non-compute shader path
being the only one to really benefit from this change.

44a80e80

shaders: remove SH_LUT_LINEAR, make a bool instead · b4d96813

Niklas Haas authored 4 years ago

By providing fallback code to linearly interpolate between array values
on the GPU. The motivating use case here is not just a concern of
semantics/correctness, but more importantly, because doing so might
actually be faster than going through a texture sample, for small LUTs.

b4d96813

gpu: use host-imported pointers for pl_tex_download_pbo · 2e04963f

Niklas Haas authored 4 years ago

This allows such tex transfers to avoid an extra memcpy in most cases,
except where the pointer happens to be horrifically misaligned with
respect to the texel size - but in these cases, the alignment-fixing
memcpy will happen inside VRAM (PL_BUF_MEM_DEVICE), which should still
be faster than doing an extra memcpy in RAM.

Also, I realized it makes no sense to have tex_download_pbo use a buffer
pool at all, because it's synchronous anyway - there can only ever be
one buffer. And doing it this way avoids code duplication between the
import branch and the non-import branch.

Side note: We could do the same for pl_tex_upload_pbo with the same
justification, but I decided to test the waters with this commit first.

2e04963f

vulkan/malloc: round imported pointers to page boundaries · ae9f4166

Niklas Haas authored 4 years ago

This allows us to bypass the page-alignment restriction on host pointer
imports, by simply sufficiently extending the host pointer base, the
buffer offset, and the memory size in the respective direction. Thus
ensuring that our memory import is always page-aligned.

This *should* technically be safe, because the MMU can only enforce
virtual memory access safety on a per-page granularity, and our code
should never end up reading outside the bounds of a vk_memslice. But on
the other hand, what we're doing is absolutely insane. Beware nasal
demons. I only wrote this logic because I enjoy sharing an address space
with a malevolent agent of chaos.

As an aside, also fix some errors related to imported buffer size
calculation and alignment validation that I noticed along the way.

ae9f4166

context: add ability to temporarily cap log verbosity · a02084cc

Niklas Haas authored 4 years ago

This is intended for stuff like probing functions, to avoid generating
bogus error messages. We directly make use of this function to clean up
the format probing code, which is notoriously prone to generating error
spam.

a02084cc

tests/bench: add pl_tex transfer benchmark · c20b0eb4
Niklas Haas authored 4 years ago
```
Mostly so I can test the improvements that leveraging host-mapped
pointers will give us.
```
c20b0eb4

Jul 19, 2020

shaders/colorspace: clip before tone-mapping functions · c8bfe345

Niklas Haas authored 4 years ago

To prevent logic errors when overflowing e.g. the BT.2390 function, and
also make functions behave more predictably on overflow in general.

This ensures no function will ever see something larger than sig_peak.
Requires changes to `clip` and `linear` to make them work properly
again.

c8bfe345

Jul 16, 2020
- common: properly expose pl_fix_ver / pl_version · b23afab6
  Niklas Haas authored 4 years ago
```
These are not included as part of the GCC visibility pragma.
```
  b23afab6
Jul 15, 2020

lcms: fix accidental swap of dst/src colorspace · 48377bfe

Niklas Haas authored 4 years ago

A hilariously awkward mix-up. This probably meant the ICC profile code
never actually produced correct results in practice unless the two color
spaces happened to coincide.

Good thing nobody used it in production yet (tm)

Fixes https://github.com/haasn/libplacebo/issues/82

48377bfe

Jul 14, 2020

gpu: add preliminary API support for DRM format modifiers · 025c5dcb

Niklas Haas authored 4 years ago

This is still a pretty bad hack-patch as of currently, because no driver
actually implements the drm format modifier extension. But this way of
doing it at least allows us to differentiate between linear and
non-linear, which we assume (blindly) is equal to optimal, and is needed
to get vaapi hwdec working on AMD.

We also get rid of the plane offset check because this also conflicts
with the requirements of drm format modifiers, which we again can't
respect properly. We already suppress validation errors for the image
bind, and it works in practice.

025c5dcb

Jul 13, 2020

include: add _COUNT members to all public enums · 5e517936

Niklas Haas authored 4 years ago

For consistency, and because these technically serve a useful purpose
(e.g. allowing static array sizing or bounds checks).

5e517936

colorspace: rename pl_color_levels · e4c03d0f

Niklas Haas authored 4 years ago

I was growing unhappy by the use of the non-explanatory, confusing and
misleading 'TV' and 'PC' enum names. Replace them by the more
descriptive terms 'LIMITED' and 'FULL', respectively.

No API bump because this is not a breaking change, as the old enum names
are still defined.

e4c03d0f

Jul 12, 2020

vulkan: remove FIXME comments on buffer sharing mode · 996e2b58

Niklas Haas authored 4 years ago

1. VkBuffer sharing mode doesn't actually affect anything in real-world
   drivers (e.g. RADV, ANV, AMDVLK).
2. VkBuffers are not part of the interop API so we don't care about
   having to communicate this to the user.
3. Having to somehow transition all buffers would be a pain anyway

996e2b58

Jul 11, 2020
- renderer: respect `pl_render_target.repr` · a4aa8a32
  Niklas Haas authored 4 years ago
```
Fixes #85
```
  a4aa8a32
Jul 09, 2020

opengl: refactor pl_opengl_wrap · 4a5ce5bc

Niklas Haas authored 4 years ago

This combines the function with the previously hidden pl_opengl_wrap_fb,
allowing users to either provide their own framebuffers (in addition to
the texture) or just wrap a plain framebuffer directly.

In addition to merging these two functions, we also significantly
overhaul the `gl_fb_query` function for inferring `pl_fmt` details from
an opaque framebuffer. In particular, our wrapped framebuffers can now
support PL_FMT_CAP_HOST_READABLE.

Closes https://github.com/haasn/libplacebo/issues/81

4a5ce5bc

opengl: fix typo in comment · 80e862b1
Niklas Haas authored 4 years ago

80e862b1

Jul 06, 2020

shaders/colorspace: read detected peak directly from ssbo · b48c81cb

Niklas Haas authored 4 years ago

With the recent series of refactors to the vulkan malloc layer,
host-visible device-local memory types exist and are allocatable, so we
can directly serve host-readable uniform buffers.

For the scenarios in which it's not possible, working around it should
probably be done inside the pl_gpu, not the application code. (i.e.
'host visibility emulation')

b48c81cb

vulkan: slightly revise buffer requirements/placement logic · 055cdc0a

Niklas Haas authored 4 years ago

Now that we support the existence of 'optimal' memory type properties,
we can make device-local memory be the 'optimal' type by default. We can
also split up `host_mapped` into scenarios where it's required and
scenarios where it's merely recommended.

055cdc0a

vulkan/malloc: invalidate mapped noncoherent memory · 2f2ba1a6
Niklas Haas authored 4 years ago
```
Imported noncoherent memory is not implicitly invalidated.
```
2f2ba1a6
vulkan/malloc: misc fixes related to host pointer import · 870cb541
Niklas Haas authored 4 years ago
```
1. Log the proper pointer on unimport
2. Add missing test case
```
870cb541

vulkan: implement support for dedicated imported allocations · d5b23f61

Philip Langdale authored 4 years ago and

Niklas Haas committed 4 years ago


Dedicated allocations are ones where memory is allocated with
a single image or buffer specified at allocation time, and only
that buffer or image can be bound to the memory.

Our first use-case for supporting it is to handle importing dma_bufs
on AMD hardware, where the driver says dedicated allocations are
required.

I've tested this on Intel hardware, which doesn't require dedicated
allocations, but works fine if you force them.

Modified-by: Niklas Haas <git@haasn.xyz>

Rebased on top of the vulkan malloc API refactor, and also added support
for allocating dedicated slabs directly - which allows us to also
allocate dedicated memory for images which advertise preferring
dedicated allocations. Finally, add some extra verification.

Closes: !72

d5b23f61

vulkan/malloc: major API refactor · aea6f237

Niklas Haas authored 4 years ago

Major refactor, accomplishing the following:

- group args into a params struct
- unified API for importing, generic and buffers
- move buffer importing boilerplate to the malloc layer
- split up the property flags into required and optimal properties
- better memory type scoring
- enforce heap size when allocating large slabs
- fix some buggy checks for optionally visible/coherent memory

And probably more that I'm forgetting.

aea6f237

vulkan/malloc: only require host-cached memory for large buffers · 20014f11

Niklas Haas authored 4 years ago

Uncached reads are extremely slow, but for small buffers it shouldn't
matter, since they're only used to readback small bits of state
information and other non bandwidth-sensitive things.

20014f11

ci: disable parallel testing · 41bb87db
Niklas Haas authored 4 years ago
```
Parallel tests make errors much more confusing and hard to find.
```
41bb87db
tests: make errors more findable · 7176bf1f
Niklas Haas authored 4 years ago
```
Prefix the require() failure case to let me ctrl+f for them.
```
7176bf1f

shaders/av1: overhaul and fix grain reusability test · 5cc2e2a4

Niklas Haas authored 4 years ago

A lot of these fields were either redundant, too aggressively checked,
not checked aggressively enough, or simply leftovers.

Clean up this logic and bring it into the (hopefully) intended form.

5cc2e2a4

shaders/av1: avoid memcmp() on padded structs · ab8bd2f1

Niklas Haas authored 4 years ago

This can end up comparing undefined memory regions, because unpadded
areas of structs may not be initialized with anything particular.

ab8bd2f1

Jul 05, 2020

shaders/colorspace: defensive programming · 83f40365
Niklas Haas authored 4 years ago
```
Avoid uninitialized average[2]
```
83f40365

context: make thread safe · 723827d7

Niklas Haas authored 5 years ago

This is needed primarily because some OpenGL/Vulkan drivers will end up
calling the pl_msg functions from random threads. Prevent any issues
when doing this by making sure these functions are thread safe.

723827d7

Jul 01, 2020

vulkan: fix build on non-UNIX platforms · 175ac74d

Niklas Haas authored 4 years ago

ffs() and close() don't exist on non-UNIX. Fix close() by ifdefing it
out, and replace ffs() by __builtin_ffsll(), similar to what the rest of
the code is doing already. This also fixes potential overflow issues due
to the implicit downcast from uint32_t to int.

Fixes https://github.com/m-ab-s/media-autobuild_suite/issues/1728

175ac74d

vulkan: correctly create imported memory buffers · 249690e1
Niklas Haas authored 4 years ago
```
These require VkExternalMemorybufferCreateInfoKHR, same as when
exporting buffers.
```
249690e1

vulkan: also generate boilerplate for extended enums · 8f36b8a7

Niklas Haas authored 4 years ago

VkResult in particular can return errors defined by extensions, like
VK_ERROR_INVALID_EXTERNAL_HANDLE. These are included as enum extension
in the registry.

As a side note, replace `findall` by `iterfind` for performance.

8f36b8a7

vulkan: fix host pointer import alignment checks · 2fdeb140

Niklas Haas authored 4 years ago

This alignment requirement also needs to be checked for the
allocationSize. Improve error reporting while we're at it.

2fdeb140

gpu: add support for generic vertex buffers · 596de21c

Niklas Haas authored 4 years ago

Because of annoying API dependency issues (vertex buffers depend on
pl_buf which depends on pl_buf_read/pl_tex_upload, which depends on PBO
support), we can't use pl_pass_run_vbo for OpenGL, so instead we have to
manually create and update the vertex buffer. (Not a huge deal since
this is what the code did previously)

It also means we miss out on vertex buffer reuse for OpenGL, but oh
well. I guess the driver could already be doing this internally.

596de21c

gpu: remove outdated comment · 181638d7

Niklas Haas authored 4 years ago

The whole `pl_buf_poll` / "different type of operation" thing is no
longer really true.

181638d7

vulkan: enforce alignment requirements on imported buffers · a7031878

Niklas Haas authored 4 years ago

To prevent users from e.g. trying to import uniform buffers with illegal
alignment for uniform usage.

To not over-verify, we split up `align` into two separate fields for
optimal and mandatory alignment.

a7031878

gpu: relax buffer offset alignment restrictions · 0cfb1a10

Niklas Haas authored 4 years ago

We can drop the texel alignment requirement using the same `unaligned`
fallback code that already exists. Furthermore, I have no idea where the
"multiple of 4" check came from. I can't find any reference to this
being required for a VkBufferImageCopy in the vulkan spec.

0cfb1a10

gpu: nuke pl_buf_params.type and make all buffers generic · 2cf48cdd

Niklas Haas authored 4 years ago

This is a major change in operation and means all buffers now
effectively become generic "buffers", with individual usage flags
controlling what can and can't be done, similar to how `pl_tex` works.

Rather than introducing a usage flag for buffer<->texture copies, we
introduce a GPU-wide capability instead, and assume all buffers can be
used for buffer<->texture copies. This maps more cleanly to what
graphics APIs actually support, and mirrors the fact that we don't
require any special usage flag for buffer<->buffer copies.

Involves quite a lot of annoying refactoring, but I did the change in a
way that should hopefully be very backwards compatible and result in no
major degradation in performance or breaking change in logic.

Notably, this commit also introduces parts of the public API allowing
for use of arbitrary buffers as vertex buffers, but to avoid making this
commit too big, this isn't actually exposed in the `pl_pass` API yet.
The `drawable` field currently only exists for internal use.

2cf48cdd

gpu: minor code refactoring · a5cc3e08

Niklas Haas authored 4 years ago

Merge a few redundant functions into one API, and also avoid
unnecessary duplication of the log level everywhere.

a5cc3e08