shaders/colorspace: use rgba16s for gamut 3DLUT
Instead of rgba32f. Halves size of 3DLUTs, in exchange for slightly more overhead during generation.
In theory we could somehow do this conversion already inside pl_gamut_map_generate, but I can't be bothered breaking API just for this. Maybe if somebody else cares enough to micro-optimize this, feel free, but since this code only runs on cache miss it isn't super critical.