Skip to content

x86: Add cdef_filter SSE optimizations

                              old 32 new      old 64 new

cdef_filter_4x4_8bpc_sse2:   205.8  130.5    189.1  128.5
cdef_filter_4x4_8bpc_ssse3:  163.3  103.7    142.5  103.3
cdef_filter_4x4_8bpc_sse4:   150.3   99.5    130.6   98.8

cdef_filter_4x8_8bpc_sse2:   377.2  222.8    336.7  222.1
cdef_filter_4x8_8bpc_ssse3:  291.6  171.4    245.7  164.6
cdef_filter_4x8_8bpc_sse4:   264.7  163.2    218.7  157.2

cdef_filter_8x8_8bpc_sse2:   668.5  369.9    567.4  365.0
cdef_filter_8x8_8bpc_ssse3:  509.5  271.8    399.6  250.6
cdef_filter_8x8_8bpc_sse4:   461.6  258.5    341.0  234.3

Most performance gain is from having separate code paths for !pri_strength and !sec_strength, but there's various small optimizations everywhere.

The 32-bit PIC handling is also cleaned up and simplified.

Merge request reports

Loading