Skip to content

Rework the usage of noskip_mask

Remove half of the masks since they are only used for cdef on a 8x8 level of granularity.

Load the mask and combine the 16-bit sections into the 32-bit sections outside of the inner cdef loop. This should save some registers.

Results in mild performance improvements.

Merge request reports