Render YUV 4:2:2 pictures at full definition in OpenGL
Context
It is straightforward to upload YUV 4:2:0 pictures to OpenGL textures and use them. There are 2 or 3 textures:
- I420: 1 plane Y at full definition, 1 plane U at half definition, 1 plane V at half definition
- NV12: 1 plane Y at full definition, 1 plane UV (packed) at half definition
YUY2 pictures (YUV 4:2:2) are more problematic. As reminded by #26682 (closed) (!1590 (merged)), in VLC, such pictures are rendered at half the horizontal definition in OpenGL.
Semantically, the picture contains 3 planes Y U V, with U and V at half horizontal definition:
# This represents the YUV plane components semantically, this is not how they are stored
Y plane U plane V plane
Y0 Y1 Y2 Y3 U0 U2 V0 V2
Y4 Y5 Y6 Y7 U4 U6 V4 V6
These 3 components are stored in a single packed plane:
# YUY2 (packed)
Y0 U0 Y1 V0 Y2 U2 Y3 V2
Y4 U4 Y5 V4 Y6 U6 Y7 V6
Therefore, this single plane is uploaded to a single OpenGL texture.
Problems
This causes problems to extract the YUV values for a given location.
Currently, the texture is uploaded as GL_RGBA
(so [Y1 U Y2 V]
are mapped to [r g b a]
), and the Y2 value is ignored (so half the horizontal resolution is lost): https://code.videolan.org/rom1v/vlc/-/blob/2e44d338763f72a30a7f5631f86d73c2fc58397e/modules/video_output/opengl/interop.c#L271-288
As a consequence, the rendering of a YUV 4:2:2 picture is worse than YUV 4:2:0. We should fix it.
If we upload the texture as GL_RGBA
, then every texel contains two pixels ([Y1 U V]
and [Y2 U V]
).
If we upload the texture as GL_RG
, then every texel represents 1 pixel, but the chroma information is split over two texels ([Y1 U]
and [Y2 V]
).
In both case, native OpenGL interpolation could not work:
- in the first case, Y1 and Y2 would be interpolated separately (instead of together)
- in the second case, U and V would be interpolated together (instead of separately)
Solution?
I think we should add a specific case for packed YUV 4:2:2 (YUY2 and UYVY) (yet another vlc_gl_sampler_ops
instance):
- the texture
GL_TEXTURE_*_FILTER
will be set toGL_NEAREST
(instead ofGL_LINEAR
) to disable native interpolation - the texture will be
GL_RGBA
(so each texel contains[Y1 U Y2 V]
) - the generated
vlc_texture(vec2 tex_coords)
GLSL function would access 4 pixels (i.e. 2 texels) and perform the linear interpolation "manually"
What do you think?
Alternatively, we could upload the same picture twice (GL_RG
to access the Y components, once in GL_RGBA
to access the UV components, and keep the native OpenGL interpolation.