shaders/colorspace: improve peak detection performance
Substantially improves performance on systems that are constrained by global atomic pressure, like intel iGPUs.
A value of 12 was experimentally determined to maximize throughput on an UHD770. More powerful discrete GPU systems don't benefit from this either way.