Render YUV 4:2:2 pictures at full definition in OpenGL

Context

It is straightforward to upload YUV 4:2:0 pictures to OpenGL textures and use them. There are 2 or 3 textures:

I420: 1 plane Y at full definition, 1 plane U at half definition, 1 plane V at half definition
NV12: 1 plane Y at full definition, 1 plane UV (packed) at half definition

YUY2 pictures (YUV 4:2:2) are more problematic. As reminded by #26682 (closed) (!1590 (merged)), in VLC, such pictures are rendered at half the horizontal definition in OpenGL.

Semantically, the picture contains 3 planes Y U V, with U and V at half horizontal definition:

# This represents the YUV plane components semantically, this is not how they are stored
  Y plane      U plane   V plane
Y0 Y1 Y2 Y3     U0 U2     V0 V2
Y4 Y5 Y6 Y7     U4 U6     V4 V6

These 3 components are stored in a single packed plane:

# YUY2 (packed)
Y0 U0 Y1 V0 Y2 U2 Y3 V2
Y4 U4 Y5 V4 Y6 U6 Y7 V6

Therefore, this single plane is uploaded to a single OpenGL texture.

Problems

This causes problems to extract the YUV values for a given location.

Currently, the texture is uploaded as GL_RGBA (so [Y1 U Y2 V] are mapped to [r g b a]), and the Y2 value is ignored (so half the horizontal resolution is lost): https://code.videolan.org/rom1v/vlc/-/blob/2e44d338763f72a30a7f5631f86d73c2fc58397e/modules/video_output/opengl/interop.c#L271-288

As a consequence, the rendering of a YUV 4:2:2 picture is worse than YUV 4:2:0. We should fix it.

If we upload the texture as GL_RGBA, then every texel contains two pixels ([Y1 U V] and [Y2 U V]). If we upload the texture as GL_RG, then every texel represents 1 pixel, but the chroma information is split over two texels ([Y1 U] and [Y2 V]).

In both case, native OpenGL interpolation could not work:

in the first case, Y1 and Y2 would be interpolated separately (instead of together)
in the second case, U and V would be interpolated together (instead of separately)

Solution?

I think we should add a specific case for packed YUV 4:2:2 (YUY2 and UYVY) (yet another vlc_gl_sampler_ops instance):

the texture GL_TEXTURE_*_FILTER will be set to GL_NEAREST (instead of GL_LINEAR) to disable native interpolation
the texture will be GL_RGBA (so each texel contains [Y1 U Y2 V])
the generated vlc_texture(vec2 tex_coords) GLSL function would access 4 pixels (i.e. 2 texels) and perform the linear interpolation "manually"

What do you think?

Alternatively, we could upload the same picture twice (😱), once in GL_RG to access the Y components, once in GL_RGBA to access the UV components, and keep the native OpenGL interpolation.

Edited Mar 16, 2022 by Romain Vimont

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information